Remove Silence & Filler Words from Video

Upload an MP4, MOV, or WebM — silence is cut automatically, and (in supported browsers) AI detects filler words like uh, um, you know, like. Everything runs offline in your browser. No file is ever uploaded.

1. Drop your video

🎬
Click or drag & drop a video file here
MP4 · MOV · WebM · up to several GB — processed locally

How it works

All processing runs inside your browser using WebAssembly and AI — no file ever leaves your device.

Step 1 — Extract audio ffmpeg.wasm extracts the audio track from your video entirely in the browser.
Step 2 — Detect silence Web Audio API decodes the audio and scans amplitude sample-by-sample to find segments below your chosen dB threshold for at least the minimum duration you set.
Step 3 — Detect fillers (AI) On Chrome 113+ / Edge with WebGPU, Whisper Base runs fully on your GPU to transcribe speech and pinpoint filler word timestamps. Other browsers skip this step and do silence-only removal.
Step 4 — Cut & export ffmpeg.wasm assembles a concat list of the segments you want to keep, splices the video streams, and hands you a download — no re-encoding of video frames needed (stream copy where possible).

Privacy: your video is never sent to any server. Everything runs client-side using ffmpeg.wasm compiled to WebAssembly and (optionally) transformers.js Whisper Base running on WebGPU. The Whisper model (~145 MB) is downloaded from Hugging Face on first use and cached in your browser.

Frequently asked questions

Is my video actually private — does anything get uploaded?
Nothing is uploaded. Your video file is read directly by the browser's File API and processed by ffmpeg compiled to WebAssembly (ffmpeg.wasm). Audio waveform data stays in JavaScript memory. The only external network requests are the one-time download of the Whisper AI model from Hugging Face CDN on first use. After that, even the AI works offline.
What's the difference between silence removal and filler word removal?
Silence removal uses audio amplitude: any span of audio below your chosen dB threshold for at least the minimum duration gets flagged and cut. Filler word removal uses AI speech recognition (Whisper) to find specific spoken words — "uh", "um", "you know", "like" etc. — even when they're not truly silent, so it catches hedging and verbal pauses that amplitude detection would miss.
Why does the filler word AI only work in Chrome/Edge?
Whisper inference via transformers.js uses the WebGPU API for speed. WebGPU is currently available in Chrome 113+, Edge 113+, and some Chromium-based browsers. Firefox has WebGPU behind a flag. Safari's support is limited. On unsupported browsers the tool falls back to silence-removal-only mode — which is still very effective for cutting dead air. You can still get great results without the AI step.
How long does processing take?
Silence detection over the extracted audio waveform is near-instant (a few seconds for a 30-minute video). The Whisper AI model download takes 1–3 minutes on a typical connection the first time — subsequent runs are cached. Whisper transcription itself runs at roughly 4–10× real-time on WebGPU (a 5-minute video takes about 30–60 seconds). The final ffmpeg cut-and-export step is also fast because video frames are stream-copied rather than re-encoded.
What file formats and sizes are supported?
Input: MP4 (H.264/H.265), MOV, and WebM. File size is limited only by your device RAM and browser limits — in practice files up to a few gigabytes work on modern computers with 16 GB+ RAM. For very long videos (>60 min) consider splitting them first. Output is always MP4 (H.264 video stream-copied, AAC audio).
Can I undo or review the cuts before exporting?
Yes — after detection you see a full timeline and a scrollable list of every detected cut. Each segment shows its timestamp, duration, and type (silence or filler word). You can uncheck any segment to keep it in the final video. Only checked segments are removed when you click "Export cleaned video".