Free Deposition Transcription Software 100% Private

Upload any deposition or legal proceeding audio (MP3, MP4, WAV, M4A) and get a formatted transcript with speaker turns and timestamps — powered by on-device Whisper AI. Nothing is ever uploaded to a server.

🎙️
Drop audio file here or click to browse
MP3 · MP4 · WAV · M4A  |  Max ~2 GB  |  Processed locally in your browser
Loading model…

Transcript

Edit speaker names

Speaker labels are approximate. Whisper detects natural speech pauses to estimate speaker changes. For certified legal transcripts, review and correct speaker labels using "Edit speaker names" above before exporting.

How it works

This tool runs entirely inside your browser — no audio ever leaves your device.

  1. Load model — on first use, the browser downloads a Whisper model (40–240 MB) from Hugging Face and caches it locally. Subsequent runs are instant.
  2. Decode audio — the Web Audio API decodes your MP3/WAV/MP4/M4A file into raw PCM at 16 kHz, the format Whisper expects.
  3. Transcribe with timestamps — transformers.js runs Whisper inference in a Web Worker, producing word-level timestamps for every segment.
  4. Detect speaker changes — gaps and pauses between segments are used to group text into speaker turns. Turns are labelled Examiner / Witness (or Q / A, etc.) alternating, which matches the typical deposition call-response pattern.
  5. Export — download as a plain-text transcript or a formatted PDF via pdf-lib, both generated in the browser.

Frequently asked questions

Is this deposition transcription software really free and private?
Yes. The tool runs entirely in your browser using WebAssembly and Web Workers. No audio file, no transcript text, and no metadata is ever sent to any server. The Whisper model weights are downloaded once from Hugging Face's CDN and cached in your browser's local cache. After that, the tool works completely offline.
How accurate is the Whisper transcription for legal audio?
OpenAI Whisper (Base English model) achieves around 5–10% word error rate on clean, clearly-spoken audio. Deposition recordings with a single or two speakers in a quiet room typically produce very accurate results. Accuracy drops with strong accents, overlapping speech, legal jargon, or low-quality recordings. The Small model is more accurate but takes longer to load. Always review the transcript before use — this tool is not a substitute for a certified court reporter.
Can this tool identify which speaker is which (speaker diarization)?
Full neural speaker diarization (identifying "that voice belongs to Person A") requires a separate speaker embedding model that is too large to run efficiently in the browser today. Instead, this tool uses Whisper's natural segment timestamps: a significant pause between segments is treated as a speaker change, and labels alternate between Examiner and Witness (or Q / A). In a standard two-party deposition with a question-and-answer format, this produces correct speaker assignments in the majority of cases. You can edit the labels in the "Edit speaker names" panel before exporting.
What audio formats are supported?
The tool accepts MP3, MP4, M4A, and WAV files. The browser's Web Audio API handles decoding, so any format your browser supports natively will also work (this includes most common audio and video containers). Files are processed in memory; very large files (>1 hour of audio at high bitrate) may require a device with at least 4 GB of RAM free.
Can I use this for court hearings, depositions, or arbitration transcripts?
Yes — the transcript output is formatted in the standard deposition style with numbered speaker turns, timestamps, and labelled speakers (Examiner / Witness or Q / A). The PDF export uses a monospaced legal-style font layout. However, this tool does not produce certified transcripts. Only a licensed court reporter can certify a transcript for official legal proceedings. Use this tool for draft preparation, review, quick reference, or cases where uncertified transcripts are permissible.