Streaming offline speech recognition via the Kaldi-based Vosk engine, compiled to WebAssembly.
Truly local. Audio is processed by vosk-browser running entirely in WASM in this tab. No audio is sent to any server. Models download once and are cached.
1. Pick a language and load the model
not loaded
downloading…
2. Listen
idle
mic level
—
Transcript
transcript will appear here. partial results appear in yellow as you speak; final results lock in at pauses.
0 words
Diagnostic log
How this differs from the Whisper version
Streaming. Vosk processes audio as it arrives and emits partial results word-by-word. You see text appear while you're still speaking.
Smaller models. The default English-small is ~40 MB vs Whisper's 75–500 MB.
Faster on weak hardware. Vosk runs realtime on phones; Whisper-tiny barely does.