1. Upload audio file

🎙

Click or drag an MP3, WAV, or M4A file here

Max ~5 minutes / 50 MB · Transcription runs in your browser via Whisper WASM

Privacy: Your audio never leaves your device. Whisper-tiny runs entirely via WebAssembly in your browser. First run downloads ~75 MB of model weights from Hugging Face CDN (cached after that).

2. Correction dictionary

Choose a preset or add your own "misrecognized → correct" pairs. All replacements are case-insensitive and applied globally.

Misrecognized (from)	Correct term (to)

→

3. Transcribe & fix

Select an audio file above to enable. First run downloads model (~75 MB, one time).

Results

Corrected transcript

Change log

How it works

1Audio decoded in-browser Your file is decoded to raw PCM audio by the Web Audio API — no bytes leave your device.

2Whisper WASM transcription Transformers.js runs OpenAI Whisper-tiny via WebAssembly. The model (~75 MB) is cached after the first load.

3Dictionary find-replace Every "from" phrase is searched in the transcript (case-insensitive, whole-word-aware) and replaced with the "to" term.

4Highlighted output + change log Corrections are highlighted in the transcript, and a full log shows each substitution and how many times it was applied.

Whisper-tiny handles English well but may struggle with heavy accents or very low-quality audio. For better accuracy, use a larger model like whisper-base (selected automatically when available).

Frequently asked questions

Is my audio file uploaded anywhere?: No. Everything — decoding, transcription, and jargon correction — runs entirely in your browser. The only network request is the one-time download of the Whisper model weights from Hugging Face's CDN, which are then cached in your browser. Your audio file is never sent to any server.
Why does Whisper mis-transcribe medical and legal terms?: Whisper was trained on general internet audio, so it knows common words much better than domain-specific terminology. It often phonetically approximates specialized terms — for example, "hypertension" might come out as "high per tension", "dyspnea" as "dis knee ah", or "amicus curiae" as "a micas curia". The correction dictionary lets you capture these patterns once and fix them every time.
Can I save my custom dictionary for next time?: Yes — use the Download change log button to save your session. Your custom entries remain in the page for the current session. Browser refresh will clear manual entries, so note down important pairs. A future update may add localStorage persistence — for now, keep a copy of your pairs in a text file.
What audio formats and lengths are supported?: MP3, WAV, and M4A files up to approximately 50 MB or 5 minutes work best. Longer files can be processed but may take several minutes on mid-range hardware since Whisper runs on your CPU. If your browser tab becomes unresponsive, try trimming the audio to the relevant portion before transcribing.
How accurate is the transcription?: Whisper-tiny achieves roughly 10–15% word error rate on clear English speech in quiet conditions. Accuracy drops with background noise, heavy accents, or multiple overlapping speakers. The dictionary correction layer on top helps recover domain terms that the model consistently mispronounces — capturing just 10–20 frequent jargon entries often eliminates the majority of domain-specific errors in a given recording.

Transcript Jargon Fixer