Local LLM

Run LLM models entirely on your Mac — your data never leaves your device.

What It Does

Shuole includes a built-in llama.cpp server that runs large language models locally. This powers features like LLM Polish without sending your data to external services.

Model Download

The local model (Qwen3-8B, ~5.4 GB) is downloaded automatically on first use. A progress modal appears to show the download status. This is a one-time download.

Privacy: When using the local model, your transcript never leaves your Mac. All processing happens on-device.

Server Status

The LLM status chip in the app header shows the current server state:

  • Green dot — Server is running and ready
  • Gray dot — Server is stopped

Click the chip to start or stop the server manually. The server automatically shuts down when you quit Shuole.

LLM server status chip

System Requirements

For optimal performance with the local LLM, we recommend:

  • 16 GB RAM or more
  • Apple Silicon Mac (M1/M2/M3/M4) for best speed

The model will run on Intel Macs but processing will be significantly slower.

Related