Runs OpenAI's Whisper model entirely in your browser. After the model downloads once, audio never leaves the device.
Truly local. Audio is processed on this page using transformers.js running ONNX models in WebAssembly + WebGPU.
No audio is sent to any server. The model itself is downloaded once from Hugging Face's CDN, then cached by your browser.
1. Pick a model and load it
not loaded
2. Listen
idle
mic level
silentqueue: 0
Transcript
transcript will appear here. each chunk arrives 1–4 s after the speaker pauses.
0 words
How this works
Audio is captured at 16 kHz mono (Whisper's native rate).
A simple voice-activity detector chunks audio at natural pauses (700 ms of silence ends a segment).
Each segment is fed to Whisper running locally. Results append to the transcript.
Latency = segment length + Whisper inference time. On a recent phone, expect 2–5× realtime; a 5 s utterance shows up 1–4 s after you stop talking.
The first model load is slow (75–500 MB download). Subsequent loads use the browser cache.