Voice Similarity Checker

Compare two audio files to see how similar they sound. Uses MFCC spectral analysis and cosine similarity — everything runs in your browser. No audio is ever uploaded.

AUDIO FILE A
🎙️
Choose or drop audioWAV · MP3 · OGG · M4A · FLAC
AUDIO FILE B
🎵
Choose or drop audioWAV · MP3 · OGG · M4A · FLAC
Analyzing…
similarity
MFCC Cosine Similarity
Spectral Centroid Diff
Duration A / B
Zero-Crossing Rate Diff
File A waveform
File B waveform

How it works

The checker analyses the spectral envelope of each audio file — the distribution of energy across frequencies over time. This captures the characteristic timbre of a voice far more reliably than raw waveform comparison.

1
Decode
Web Audio API decodes the file to raw PCM samples at 22 050 Hz.
2
Frame
Audio is split into 25 ms overlapping frames with a Hamming window.
3
FFT
Fast Fourier Transform converts each frame to the frequency domain.
4
Mel bank
26 triangular filters spaced on the Mel scale mimic human hearing.
5
DCT → MFCCs
Discrete Cosine Transform gives 13 compact cepstral coefficients per frame.
6
Cosine score
Mean MFCC vectors are compared with cosine similarity, scaled 0–100 %.

Additional signals — spectral centroid and zero-crossing rate — add robustness. The final similarity score is a weighted blend of all three metrics. Processing is 100 % local: your audio never leaves the browser tab.

Frequently asked questions

What does "voice similarity" actually measure?
It measures how closely the spectral envelope of two recordings matches. The spectral envelope captures the resonant shape of a voice — its timbre, formants, and characteristic frequency balance — which varies between speakers even when saying the same words. A score above ~75 % suggests the recordings share very similar vocal characteristics; below ~50 % suggests clearly different voices or recording conditions. The tool is not a certified speaker-verification system and should not be used for identity or legal purposes.
What are MFCCs and why are they used for voice comparison?
MFCC stands for Mel Frequency Cepstral Coefficients. They compress the spectral shape of a sound into ~13 numbers per time frame by (1) running the audio through a bank of 26 triangular filters spaced on the Mel scale — a logarithmic frequency scale that matches human perception — then (2) taking the log, and (3) applying a Discrete Cosine Transform to decorrelate the filter outputs. MFCCs have been the backbone of automatic speech and speaker recognition for decades because they are compact, robust to microphone differences, and correlate well with perceived vocal identity.
Why might two recordings of the same person score lower than expected?
Several factors reduce the score even for a single speaker: background noise (adds energy across all frequencies), different microphones (different frequency responses), room reverb, shouting vs. whispering, or significant pitch changes (e.g., singing vs. speaking). For best results use clean recordings made in similar conditions. The tool scores spectral similarity, not just "same person or not".
What audio formats and lengths are supported?
Any format your browser can decode — WAV, MP3, OGG Vorbis, FLAC, AAC / M4A, WebM Opus, and more. The full audio is processed regardless of length; very long files (over ~10 minutes) may take a few seconds on slower devices. Mono and stereo are both supported (stereo is averaged to mono before analysis).
Is my audio uploaded anywhere?
No. All processing happens entirely inside your browser using the Web Audio API and JavaScript. The files never leave your device. No data is sent to any server, and nothing is stored after you close the page.