SpeechRecognition API.
On Chrome (desktop and Android) audio is sent to Google's servers for transcription.
On Safari (iOS/macOS) audio goes to Apple's servers. The browser does this — the page does not.
No audio leaves your device through this page itself, but the browser does call out for the
recognition step. If that matters to you, this isn't the right tool. See offline alternatives at the bottom.
If you need recognition that never leaves your device, the realistic options are whisper.cpp via WebAssembly (75–500 MB models, 5–20× slower than realtime on phones) or Vosk-Browser (50 MB models, faster but less accurate). Both require a heavier setup than a single HTML file and a long initial download. The browser's built-in API is what almost every "speech to text" web demo actually uses — it's just rare for them to admit it.