Word-Level Alignment
Refine timestamps to word-level accuracy for precise navigation and highlighting.
What It Does
The Align stage takes the transcript from the Transcribe stage and refines the timestamps to word-level precision. This enables:
- Click-to-seek — Click any word to jump to that exact moment in the audio
- Real-time highlighting — Words highlight as the audio plays
- Precise editing — Know exactly when each word starts and ends
When to Use
Enable the Align stage when you need:
- Interactive transcript navigation (click any word to seek)
- Karaoke-style word highlighting during playback
- Accurate word boundaries for subtitle editing
Note: The Align stage uses CTC forced alignment to generate accurate word timestamps independently — it does not require the "Word Timestamps" option in the Transcribe stage. It produces more accurate results than whisper.cpp, especially for CJK languages like Chinese.
Note: The alignment model (~1.2 GB) is downloaded automatically when you start your first alignment job.
Configuration Options
Default settings work for most cases. Click the Advanced button to reveal additional options.
Default Options
Language
Default: auto. Language code (ISO 639-3) for alignment. When set to auto, it inherits from the Transcribe stage or defaults to eng (English).
Common language codes:
eng— Englishcmn— Chinese (Mandarin)
Note: The Align stage uses ISO 639-3 language codes (e.g., eng, cmn), which differ from the Transcribe stage's ISO 639-1 codes (e.g., en, zh).
Advanced Options
Device
Default: auto. Processing device to use for alignment. Options:
auto— Automatically selects the best available devicemps— Apple Silicon GPU (recommended for Mac)cuda— NVIDIA GPU (not available on Mac)cpu— CPU only (slower)
Batch Size
Default: 1. Batch size for inference. Increasing this may speed up processing but uses more memory.
Exclude Time
Time ranges to skip during alignment. You can enter ranges manually (e.g., 1:30-2:45) or use the shared exclusion block. See Timeline Exclusions for details.
Extra Args
Default: empty. Additional arguments passed directly to the ctc_forced_aligner. For advanced users who need to fine-tune alignment behavior not exposed in the UI.
Related
- Transcription — The prerequisite stage for alignment
- Click to Seek — How to use word-level navigation
- Timeline Exclusions — Skip sections during alignment