Free Audio Transcription

Drop an audio file and Whisper AI transcribes it — entirely in your browser. No sign-up. No upload. Your audio never leaves your device.

🎙️
Click to choose or drag & drop here
MP3 · WAV · M4A · WebM · OGG · FLAC · AAC · OPUS · MP4

Terms are prepended as an initial prompt so Whisper spells them correctly.

Loading model…

Transcript

🔒
Your audio is processed 100% inside your browser using WebAssembly + ONNX Runtime. Nothing is sent to any server.

How it works

This tool runs OpenAI's Whisper speech recognition model entirely in your browser using Transformers.js and ONNX Runtime Web. No audio ever leaves your machine.

Drop your file Any common audio format is accepted — MP3, WAV, M4A, WebM, OGG, FLAC. The Web Audio API decodes it to raw PCM samples.
On-device Whisper The Whisper model weights (~40–245 MB depending on size) are downloaded once from Hugging Face CDN and cached in your browser. No API key needed.
Custom glossary Technical terms, names, or brand words you enter are prepended as Whisper's "initial prompt", biasing the model to spell those words correctly.
Copy or download Export the plain transcript as .txt, or grab a .srt subtitle file with timestamps — both generated entirely in the browser.

First run: the model weights download from Hugging Face (40–245 MB). After that they are cached — subsequent transcriptions of new files are instant to start. Best performance in Chrome or Edge (WebGPU-accelerated), but Firefox also works via WASM fallback.

Frequently asked questions

Is this really free with no sign-up?
Yes. The Whisper model runs directly in your browser via WebAssembly and ONNX Runtime — there is no backend, no account system, and no server receiving your audio. The only network activity is the one-time download of model weights from Hugging Face's CDN, which are then cached locally. After the first download you can even use this tool offline.
How accurate is it compared to services like Otter.ai or Otter?
Whisper is one of the most capable open-source speech recognition models available. The "Small" model achieves word-error-rates comparable to many paid cloud services on clear audio. Accuracy drops with heavy accents, low-quality microphones, or overlapping speakers. For best results use a quiet recording at a reasonable bit rate, and provide a glossary of unusual names or jargon. The browser models are slightly less accurate than the full Whisper large-v3 used by cloud APIs, but there is no cost or privacy trade-off.
What file formats and lengths are supported?
The tool accepts any format the browser's Web Audio API can decode: MP3, WAV, M4A/AAC, WebM (Opus or Vorbis), OGG, FLAC, and more. File length is limited only by available RAM — a typical 30-minute meeting recording at 128 kbps (about 30 MB) transcribes comfortably. Very long files (2+ hours) may be slow on low-end devices; use the Tiny model and ensure you have at least 2 GB of free RAM.
What does the "Custom glossary" field do?
Whisper accepts an optional "initial prompt" — a short text it reads before processing your audio to prime its vocabulary expectations. This tool collects your glossary terms, joins them into a prompt such as "Words to use: Kubernetes, CRISPR, Knackpad, Nguyen.", and passes it to the model. This significantly reduces mis-spellings of product names, medical terms, technical jargon, and unusual proper nouns. Enter one term per line, up to around 50 terms.
Which model size should I choose?
Tiny (~40 MB): best for quick notes, single speaker, clear audio — loads fastest. Base (~75 MB): a good all-rounder for most voice recordings. Small (~245 MB): recommended for technical content, non-native speakers, noisy recordings, or multiple speakers — slowest to download but most accurate. Models are cached after the first download, so you only pay the bandwidth cost once.
Can I transcribe video files?
Yes, if the browser can decode the audio track. MP4, WebM, and MOV files containing AAC or Opus audio will usually work — the tool reads only the audio track and ignores video. If your file does not load, strip the audio first with a free tool like HandBrake or ffmpeg and save as MP3 or WAV.