🎙️

Click or drag & drop audio files

MP3 · WAV · M4A · OGG · FLAC · WebM · up to 200 MB each · batch OK

Processing options

Noise removal (RNNoise neural net)

Attenuates hiss, hum, fan & HVAC noise while preserving speech clarity.

Strength — lower for music, higher for voice

80%

Loudness normalization (target −14 LUFS)

Scales the track so the loudest peak hits −1 dBFS — consistent volume across files.

Cut silence

Removes gaps longer than the min gap — tightens speech, removes dead air between takes.

Threshold — frames below this RMS are silent

−45 dB

Min gap — only cut silences longer than this

500 ms

Output format

16 kHz is RNNoise's native rate — ideal for podcasts, voice memos, calls.

Results

How it works

Decode — Your browser decodes MP3, WAV, M4A, OGG, FLAC or WebM to raw PCM using the Web Audio API. No file ever touches a server.
Resample to 16 kHz mono — An OfflineAudioContext downmixes stereo to mono and resamples to the 16 kHz rate RNNoise requires. Stereo width is preserved in the 44.1 kHz output mode via a separate resampled copy.
Denoise (RNNoise) — A recurrent neural network (originally by Jean-Marc Valin at Mozilla/Xiph, same engine as Jitsi Meet) processes audio in 480-sample / 30 ms frames. Each frame gets a per-band spectral mask that suppresses noise while preserving speech formants. The strength slider blends the denoised signal with the original (1.0 = full denoise, 0.1 = barely touched).
Normalize — Peak amplitude is measured across the entire track. A linear gain is applied so the loudest sample reaches −1 dBFS — roughly −14 LUFS for typical speech, the broadcast standard used by Spotify, Apple Podcasts and YouTube.
Cut silence — RMS energy is measured per 30 ms frame. Consecutive frames below the threshold dB that form a run longer than the min-gap setting are removed. Short dips (breathing, consonant stops) below the min-gap are kept to preserve naturalness.
A/B preview & export — Both the original decoded audio and the processed audio are held in memory so you can toggle between them in the browser player. Output is encoded as 16-bit WAV (16 kHz or 44.1 kHz). Multiple files are zipped for one-click batch download.

Preset guide

Preset	Strength	Silence thresh	Min gap
🎙 Podcast	80%	−45 dB	500 ms
📱 Voice Memo	90%	−40 dB	400 ms
🖥 Screencast	70%	−50 dB	800 ms
🎤 Interview	75%	−48 dB	600 ms
🎸 Music (light)	35%	−55 dB	1000 ms

Frequently asked questions

Is my audio uploaded to a server?

No. Every step — decoding, noise removal, normalization, silence cutting, WAV encoding — runs entirely inside your browser tab using WebAssembly and the Web Audio API. Your audio files never leave your device. This makes the tool safe for confidential recordings, interviews, medical dictation, and legal proceedings where uploading to a cloud service would be a compliance risk.

What is RNNoise and why is it better than spectral subtraction?

RNNoise is an open-source recurrent neural network noise suppressor originally developed by Jean-Marc Valin at Mozilla/Xiph (the Opus codec team). It was trained on thousands of hours of clean speech mixed with real-world noise. For each 30 ms audio frame it predicts per-frequency-band spectral gains that suppress noise while preserving speech formants and consonants. Unlike classic spectral subtraction or Wiener filters — which treat each frame independently — the RNN captures temporal context across frames, making it much more effective on irregular noise (fan hum, HVAC, crowd murmur). The WASM build (@jitsi/rnnoise-wasm) is the same engine Jitsi Meet uses in production video conferencing.

Will noise removal affect music or intentional background audio?

RNNoise was trained to distinguish speech from noise, not music from noise, so background music under a voice will be partially attenuated. For mixed content, use the Music (light) preset which sets strength to 35%, applying a gentle touch that removes hiss without damaging the musical signal. If you only want normalization or silence cutting, uncheck "Noise removal" and leave the other options on — they are fully independent.

What does the A/B preview do?

After processing, each file gets an inline player with "Before" and "After" tabs. Toggle between the original decoded audio and the cleaned output at any point during playback — you hear the difference instantly, at the same playhead position. If you dislike the result, adjust the sliders and re-process. The original is always kept in memory so you can compare as many times as you like without re-uploading.

What is the difference between 16 kHz and 44.1 kHz output?

16 kHz is RNNoise's native processing sample rate — the denoised signal is already at this rate and is encoded directly, producing a smaller file that is ideal for speech (podcasts, voice memos, video call recordings). Human speech contains almost no useful content above 8 kHz. 44.1 kHz output resamples the denoised 16 kHz mono signal back up to CD quality — useful when you want to re-import the cleaned stem into a DAW that expects 44.1 kHz. For music production, use the Music (light) preset with 44.1 kHz output.

How do I tune the silence-cut threshold?

The default threshold is −45 dB RMS per 30 ms frame. Lower values (e.g. −55 dB) cut only very deep silences; higher values (e.g. −30 dB) aggressively remove quiet breaths and soft syllables. The "min gap" controls how long a quiet stretch must last before it is removed — 500 ms is a natural inter-sentence pause for speech. Set 1000 ms or more to remove only long dead air, or 200 ms to tighten a lively conversation. The Music (light) preset uses −55 dB / 1000 ms to avoid accidentally cutting musical rests.

Audio Noise Remover

How it works

Preset guide

Frequently asked questions