LLM Polish
Use AI to add punctuation to your transcripts for better readability.
The Problem
Whisper.cpp sometimes produces transcripts without proper punctuation, especially for conversational audio or when speakers don't pause clearly between sentences. For example:
Raw whisper output (no punctuation):
hello how are you today I am fine thanks for asking what about you
Without punctuation, the transcript is harder to read and understand.
The Solution
The Polish stage uses a Large Language Model (LLM) to analyze the text and add punctuation semantically — based on meaning and context, not just pauses.
After LLM polish:
Hello, how are you today? I am fine, thanks for asking. What about you?
Current Scope: The Polish stage currently only adds punctuation. It does not remove filler words (um, uh, like) or restructure sentences.
Configuration Options
Default settings work for most cases. Click the Advanced button to reveal additional options.
Default Options
Model
Default: google:gemini-2.5-flash-lite. LLM model to use, specified in provider:model format. Choose between cloud-based or local processing:
Cloud Models:
google:gemini-2.5-flash-lite— Fast and cost-effective (default)openai:gpt-4o-mini— High quality results
Cloud models require an internet connection and send your transcript to the provider's API.
Local Model:
llamacpp:qwen3-8b— Runs entirely on your Mac
The local model (~5.4 GB) is downloaded on first use. Processing is slower than cloud but keeps your data completely private.
Privacy: When using the local Qwen3-8B model, your transcript never leaves your Mac. All processing happens on-device.
Advanced Options
Custom Model String
In advanced mode, you can enter a custom model string instead of using the dropdown. This is useful for testing different model versions or configurations. Format: provider:model.
Overwrite
Disabled by default. When enabled, overwrites the original transcription file with the polished result.
Extra Args
Default: --align-with-word-segments --print-result. Additional arguments passed to split_sentence for controlling text chunking behavior before LLM processing.
LLM Server Status
When using the local model, Shuole runs a llama.cpp server in the background. You can monitor and control it via the LLM status chip in the app header:
- Green dot — Server is running and ready
- Gray dot — Server is stopped
Click the chip to start or stop the server manually. The server automatically shuts down when you quit Shuole.

Related
- Transcription — The prerequisite stage
- System Requirements — 16 GB RAM recommended for local LLM