- Is my video actually private — does anything get uploaded?
- Nothing is uploaded. Your video file is read directly by the browser's File API and processed by ffmpeg compiled to WebAssembly (ffmpeg.wasm). Audio waveform data stays in JavaScript memory. The only external network requests are the one-time download of the Whisper AI model from Hugging Face CDN on first use. After that, even the AI works offline.
- What's the difference between silence removal and filler word removal?
- Silence removal uses audio amplitude: any span of audio below your chosen dB threshold for at least the minimum duration gets flagged and cut. Filler word removal uses AI speech recognition (Whisper) to find specific spoken words — "uh", "um", "you know", "like" etc. — even when they're not truly silent, so it catches hedging and verbal pauses that amplitude detection would miss.
- Why does the filler word AI only work in Chrome/Edge?
- Whisper inference via transformers.js uses the WebGPU API for speed. WebGPU is currently available in Chrome 113+, Edge 113+, and some Chromium-based browsers. Firefox has WebGPU behind a flag. Safari's support is limited. On unsupported browsers the tool falls back to silence-removal-only mode — which is still very effective for cutting dead air. You can still get great results without the AI step.
- How long does processing take?
- Silence detection over the extracted audio waveform is near-instant (a few seconds for a 30-minute video). The Whisper AI model download takes 1–3 minutes on a typical connection the first time — subsequent runs are cached. Whisper transcription itself runs at roughly 4–10× real-time on WebGPU (a 5-minute video takes about 30–60 seconds). The final ffmpeg cut-and-export step is also fast because video frames are stream-copied rather than re-encoded.
- What file formats and sizes are supported?
- Input: MP4 (H.264/H.265), MOV, and WebM. File size is limited only by your device RAM and browser limits — in practice files up to a few gigabytes work on modern computers with 16 GB+ RAM. For very long videos (>60 min) consider splitting them first. Output is always MP4 (H.264 video stream-copied, AAC audio).
- Can I undo or review the cuts before exporting?
- Yes — after detection you see a full timeline and a scrollable list of every detected cut. Each segment shows its timestamp, duration, and type (silence or filler word). You can uncheck any segment to keep it in the final video. Only checked segments are removed when you click "Export cleaned video".