Strip hiss, hum, fan noise & HVAC from any recording — then A/B preview original vs. cleaned before you download. 100% in-browser, nothing ever uploaded.
Attenuates hiss, hum, fan & HVAC noise while preserving speech clarity.
Strength — lower for music, higher for voiceScales the track so the loudest peak hits −1 dBFS — consistent volume across files.
Removes gaps longer than the min gap — tightens speech, removes dead air between takes.
Threshold — frames below this RMS are silent Min gap — only cut silences longer than this16 kHz is RNNoise's native rate — ideal for podcasts, voice memos, calls.
OfflineAudioContext downmixes stereo to mono and resamples to the 16 kHz rate RNNoise requires. Stereo width is preserved in the 44.1 kHz output mode via a separate resampled copy.| Preset | Strength | Silence thresh | Min gap |
|---|---|---|---|
| 🎙 Podcast | 80% | −45 dB | 500 ms |
| 📱 Voice Memo | 90% | −40 dB | 400 ms |
| 🖥 Screencast | 70% | −50 dB | 800 ms |
| 🎤 Interview | 75% | −48 dB | 600 ms |
| 🎸 Music (light) | 35% | −55 dB | 1000 ms |
No. Every step — decoding, noise removal, normalization, silence cutting, WAV encoding — runs entirely inside your browser tab using WebAssembly and the Web Audio API. Your audio files never leave your device. This makes the tool safe for confidential recordings, interviews, medical dictation, and legal proceedings where uploading to a cloud service would be a compliance risk.
RNNoise is an open-source recurrent neural network noise suppressor originally developed by Jean-Marc Valin at Mozilla/Xiph (the Opus codec team). It was trained on thousands of hours of clean speech mixed with real-world noise. For each 30 ms audio frame it predicts per-frequency-band spectral gains that suppress noise while preserving speech formants and consonants. Unlike classic spectral subtraction or Wiener filters — which treat each frame independently — the RNN captures temporal context across frames, making it much more effective on irregular noise (fan hum, HVAC, crowd murmur). The WASM build (@jitsi/rnnoise-wasm) is the same engine Jitsi Meet uses in production video conferencing.
RNNoise was trained to distinguish speech from noise, not music from noise, so background music under a voice will be partially attenuated. For mixed content, use the Music (light) preset which sets strength to 35%, applying a gentle touch that removes hiss without damaging the musical signal. If you only want normalization or silence cutting, uncheck "Noise removal" and leave the other options on — they are fully independent.
After processing, each file gets an inline player with "Before" and "After" tabs. Toggle between the original decoded audio and the cleaned output at any point during playback — you hear the difference instantly, at the same playhead position. If you dislike the result, adjust the sliders and re-process. The original is always kept in memory so you can compare as many times as you like without re-uploading.
16 kHz is RNNoise's native processing sample rate — the denoised signal is already at this rate and is encoded directly, producing a smaller file that is ideal for speech (podcasts, voice memos, video call recordings). Human speech contains almost no useful content above 8 kHz. 44.1 kHz output resamples the denoised 16 kHz mono signal back up to CD quality — useful when you want to re-import the cleaned stem into a DAW that expects 44.1 kHz. For music production, use the Music (light) preset with 44.1 kHz output.
The default threshold is −45 dB RMS per 30 ms frame. Lower values (e.g. −55 dB) cut only very deep silences; higher values (e.g. −30 dB) aggressively remove quiet breaths and soft syllables. The "min gap" controls how long a quiet stretch must last before it is removed — 500 ms is a natural inter-sentence pause for speech. Set 1000 ms or more to remove only long dead air, or 200 ms to tighten a lively conversation. The Music (light) preset uses −55 dB / 1000 ms to avoid accidentally cutting musical rests.