Audio Noise Remover

Strip hiss, hum, fan noise & HVAC from any recording — then A/B preview original vs. cleaned before you download. 100% in-browser, nothing ever uploaded.

Loading AI… MP3 · WAV · M4A · OGG · FLAC Private
🎙️
Click or drag & drop audio files
MP3 · WAV · M4A · OGG · FLAC · WebM · up to 200 MB each · batch OK
Processing options

Noise removal (RNNoise neural net)

Attenuates hiss, hum, fan & HVAC noise while preserving speech clarity.

Strength — lower for music, higher for voice
80%
Loudness normalization (target −14 LUFS)

Scales the track so the loudest peak hits −1 dBFS — consistent volume across files.

Cut silence

Removes gaps longer than the min gap — tightens speech, removes dead air between takes.

Threshold — frames below this RMS are silent
−45 dB
Min gap — only cut silences longer than this
500 ms
Output format

16 kHz is RNNoise's native rate — ideal for podcasts, voice memos, calls.


How it works

Preset guide

Preset Strength Silence thresh Min gap
🎙 Podcast 80% −45 dB 500 ms
📱 Voice Memo 90% −40 dB 400 ms
🖥 Screencast 70% −50 dB 800 ms
🎤 Interview 75% −48 dB 600 ms
🎸 Music (light) 35% −55 dB 1000 ms

Frequently asked questions

Is my audio uploaded to a server?

No. Every step — decoding, noise removal, normalization, silence cutting, WAV encoding — runs entirely inside your browser tab using WebAssembly and the Web Audio API. Your audio files never leave your device. This makes the tool safe for confidential recordings, interviews, medical dictation, and legal proceedings where uploading to a cloud service would be a compliance risk.

What is RNNoise and why is it better than spectral subtraction?

RNNoise is an open-source recurrent neural network noise suppressor originally developed by Jean-Marc Valin at Mozilla/Xiph (the Opus codec team). It was trained on thousands of hours of clean speech mixed with real-world noise. For each 30 ms audio frame it predicts per-frequency-band spectral gains that suppress noise while preserving speech formants and consonants. Unlike classic spectral subtraction or Wiener filters — which treat each frame independently — the RNN captures temporal context across frames, making it much more effective on irregular noise (fan hum, HVAC, crowd murmur). The WASM build (@jitsi/rnnoise-wasm) is the same engine Jitsi Meet uses in production video conferencing.

Will noise removal affect music or intentional background audio?

RNNoise was trained to distinguish speech from noise, not music from noise, so background music under a voice will be partially attenuated. For mixed content, use the Music (light) preset which sets strength to 35%, applying a gentle touch that removes hiss without damaging the musical signal. If you only want normalization or silence cutting, uncheck "Noise removal" and leave the other options on — they are fully independent.

What does the A/B preview do?

After processing, each file gets an inline player with "Before" and "After" tabs. Toggle between the original decoded audio and the cleaned output at any point during playback — you hear the difference instantly, at the same playhead position. If you dislike the result, adjust the sliders and re-process. The original is always kept in memory so you can compare as many times as you like without re-uploading.

What is the difference between 16 kHz and 44.1 kHz output?

16 kHz is RNNoise's native processing sample rate — the denoised signal is already at this rate and is encoded directly, producing a smaller file that is ideal for speech (podcasts, voice memos, video call recordings). Human speech contains almost no useful content above 8 kHz. 44.1 kHz output resamples the denoised 16 kHz mono signal back up to CD quality — useful when you want to re-import the cleaned stem into a DAW that expects 44.1 kHz. For music production, use the Music (light) preset with 44.1 kHz output.

How do I tune the silence-cut threshold?

The default threshold is −45 dB RMS per 30 ms frame. Lower values (e.g. −55 dB) cut only very deep silences; higher values (e.g. −30 dB) aggressively remove quiet breaths and soft syllables. The "min gap" controls how long a quiet stretch must last before it is removed — 500 ms is a natural inter-sentence pause for speech. Set 1000 ms or more to remove only long dead air, or 200 ms to tighten a lively conversation. The Music (light) preset uses −55 dB / 1000 ms to avoid accidentally cutting musical rests.