Word-Level Alignment

Refine timestamps to word-level accuracy for precise navigation and highlighting.

What It Does

The Align stage takes the transcript from the Transcribe stage and refines the timestamps to word-level precision. This enables:

  • Click-to-seek — Click any word to jump to that exact moment in the audio
  • Real-time highlighting — Words highlight as the audio plays
  • Precise editing — Know exactly when each word starts and ends

When to Use

Enable the Align stage when you need:

  • Interactive transcript navigation (click any word to seek)
  • Karaoke-style word highlighting during playback
  • Accurate word boundaries for subtitle editing

Note: The Align stage uses CTC forced alignment to generate accurate word timestamps independently — it does not require the "Word Timestamps" option in the Transcribe stage. It produces more accurate results than whisper.cpp, especially for CJK languages like Chinese.

Note: The alignment model (~1.2 GB) is downloaded automatically when you start your first alignment job.

Configuration Options

Default settings work for most cases. Click the Advanced button to reveal additional options.

Default Options

Language

Default: auto. Language code (ISO 639-3) for alignment. When set to auto, it inherits from the Transcribe stage or defaults to eng (English).

Common language codes:

  • eng — English
  • cmn — Chinese (Mandarin)

Note: The Align stage uses ISO 639-3 language codes (e.g., eng, cmn), which differ from the Transcribe stage's ISO 639-1 codes (e.g., en, zh).

Advanced Options

Device

Default: auto. Processing device to use for alignment. Options:

  • auto — Automatically selects the best available device
  • mps — Apple Silicon GPU (recommended for Mac)
  • cuda — NVIDIA GPU (not available on Mac)
  • cpu — CPU only (slower)

Batch Size

Default: 1. Batch size for inference. Increasing this may speed up processing but uses more memory.

Exclude Time

Time ranges to skip during alignment. You can enter ranges manually (e.g., 1:30-2:45) or use the shared exclusion block. See Timeline Exclusions for details.

Extra Args

Default: empty. Additional arguments passed directly to the ctc_forced_aligner. For advanced users who need to fine-tune alignment behavior not exposed in the UI.

Related