Quick Start

Transcribe your first audio file in under 5 minutes.

Step 1: Upload Audio

Drag and drop an audio file onto the dashboard, or click to browse. Shuole supports:

  • Audio: WAV, MP3, FLAC
  • Video: MP4, MOV, AVI (audio is extracted for transcription)

Note: M4A and OGG files may work but are not officially supported due to inconsistent codec handling. For best results, use WAV, MP3, or FLAC. Video formats work for basic transcription; for advanced stages (align, diarize), we recommend converting to audio first.

Shuole upload area

Step 2: Configure

For each file, select a language from the dropdown (English or Chinese). The default pipeline includes only the Transcribe stage — perfect for quick results.

Language selection dropdown

Advanced Pipeline (Optional)

Need more than basic transcription? You can configure the pipeline in two ways:

  • Click the small configure icon next to a file to configure that file individually
  • Click Advanced config (all) at the bottom to apply the same settings to all files

In the configure popover, you can add additional stages:

  • Align — Word-level timestamps with precise timing
  • Polish — LLM-powered punctuation and grammar fixes
  • Diarize — Identify different speakers in the audio

Default settings work well for most use cases. See each stage's documentation page for detailed configuration options.

Note: When you start a job, Shuole checks if the required models are installed. If not, a progress modal appears to download them. This is a one-time download per stage.

Configure popover with all stages enabled

Step 3: Transcribe

Click Start all. Your files are added to a queue and processed sequentially — this keeps memory usage low and ensures stable performance.

The Jobs Page

Navigate to the Jobs page from the sidebar to monitor your transcriptions. On desktop, it features a two-column layout:

  • Left column: A table listing all your jobs with status indicators (queued, processing, completed, failed)
  • Right column: A sticky details panel showing the selected job's information

Job Details Panel

Click any job row to view its details. The panel shows:

  • Pipeline summary: Which stages are included (transcribe → align → polish → diarize)
  • Job status: Current stage being processed and real-time progress percentage
  • Full config: The complete job configuration for reference
Jobs page with queue and details panel

Tip: On mobile and tablet, the details panel appears as a slide-up overlay instead of a side column.

Processing time: ~1 minute per 6 minutes of audio on Apple Silicon.

Step 4: Review & Export

Once a job completes, click the View result button in the job details panel to open the transcription result page.

Click to Seek

Click any word in the transcript to jump to that exact moment in the audio. The current word is highlighted as the audio plays.

Note: To enable audio playback, first bind the audio file on the Audios page. Once bound, an audio player appears at the bottom of the page.

Click any word to seek to that position

View Modes

Switch between different views using the tabs:

  • Wall: All words displayed continuously for quick reading
  • Sentence: Text broken into sentences with timestamps
  • Speaker: Segments grouped by speaker (only available when diarization is included in the pipeline)
Wall, Sentence, and Speaker view modes

Tip: Click on a speaker label to filter and highlight only that speaker's segments. The audio player will automatically seek and play only the selected speaker's parts.

Speaker Mapping

When using diarization, Shuole labels speakers as "Speaker 1", "Speaker 2", etc. You can map these to real names from your speaker database.

Note: First add speakers on the Speaker Database. Then you can select them from the dropdown to replace generic labels.

Map detected speakers to real names

Copy & Export

Click the Copy button to copy transcript text to your clipboard. A popover lets you customize the output with two options:

Scope

Choose what to copy:

  • Current line: Only the line you clicked on
  • Current speaker: All segments from the selected speaker (available when diarization is enabled)
  • Whole file: The entire transcript

Format

Choose what metadata to include:

  • Include timestamps: Add start and end times before each segment
  • Include speaker: Add the speaker label (available when diarization is enabled)

Example outputs:

Text only:

It's just the foreplay to the something bigger back to yeah

With timestamps:

00:05:12 - 00:05:24
It's just the foreplay to the something bigger back to yeah

With timestamps and speaker:

00:05:12 - 00:05:24 Speaker 2
It's just the foreplay to the something bigger back to yeah

Export

Click Export to download the transcript as SRT (subtitles) or JSON (structured data). You can export the whole file or just the current speaker, and optionally include speaker labels.

What's Next?