- Is my audio file uploaded anywhere?
- No. Everything β decoding, transcription, and jargon correction β runs entirely in your browser. The only network request is the one-time download of the Whisper model weights from Hugging Face's CDN, which are then cached in your browser. Your audio file is never sent to any server.
- Why does Whisper mis-transcribe medical and legal terms?
- Whisper was trained on general internet audio, so it knows common words much better than domain-specific terminology. It often phonetically approximates specialized terms β for example, "hypertension" might come out as "high per tension", "dyspnea" as "dis knee ah", or "amicus curiae" as "a micas curia". The correction dictionary lets you capture these patterns once and fix them every time.
- Can I save my custom dictionary for next time?
- Yes β use the Download change log button to save your session. Your custom entries remain in the page for the current session. Browser refresh will clear manual entries, so note down important pairs. A future update may add localStorage persistence β for now, keep a copy of your pairs in a text file.
- What audio formats and lengths are supported?
- MP3, WAV, and M4A files up to approximately 50 MB or 5 minutes work best. Longer files can be processed but may take several minutes on mid-range hardware since Whisper runs on your CPU. If your browser tab becomes unresponsive, try trimming the audio to the relevant portion before transcribing.
- How accurate is the transcription?
- Whisper-tiny achieves roughly 10β15% word error rate on clear English speech in quiet conditions. Accuracy drops with background noise, heavy accents, or multiple overlapping speakers. The dictionary correction layer on top helps recover domain terms that the model consistently mispronounces β capturing just 10β20 frequent jargon entries often eliminates the majority of domain-specific errors in a given recording.