AI Voice Clone Signal Analyzer

Drop an audio file to detect AI voice clone deepfake signals. Analysis runs entirely in your browser — no audio is ever uploaded. Uses spectral feature analysis (MFCC, RMS, spectral flatness) to compute an AI-risk score.

Upload or Record Audio

Drop MP3, WAV, or M4A here
or click to choose a file

How it works

The analyzer decodes your audio file entirely in the browser using the Web Audio API, then uses Meyda.js to extract five spectral features from overlapping frames (2048 samples, 50% hop):

RMS Energy variance Human speech has large energy swings (consonants vs vowels, pauses). AI voice clones tend to have suspiciously stable RMS — low coefficient of variation flags synthesis.

Spectral Flatness Measures how noise-like vs tonal a signal is (0 = pure tone, 1 = white noise). TTS systems often produce unnaturally high or eerily consistent flatness values across frames.

MFCC distribution Mel-frequency cepstral coefficients capture the timbral "fingerprint" of a voice. AI voices show lower inter-frame variance in MFCCs — the voice stays unnaturally consistent.

Spectral Centroid & Rolloff variance Human speech shifts its brightness (centroid) and high-frequency content (rolloff) dramatically. AI-generated voices show flatter, more repetitive spectral shape over time.

Each signal is scored on its deviation from typical human speech statistics. Scores are combined into a 0–100 AI-risk index. Thresholds: 0–35 = LIKELY HUMAN, 36–60 = UNCERTAIN, 61–100 = LIKELY AI-GENERATED.

Frequently asked questions

Is my audio file uploaded to any server?: No. The entire analysis runs inside your browser using the Web Audio API and Meyda.js. Your audio file never leaves your device. This is by design — deepfake detection often involves sensitive recordings (voicemails, phone calls, interviews) that should not be shared with third-party servers.
How accurate is this detector?: This tool uses classical digital signal processing heuristics, not a trained neural network. It can identify common statistical signatures of text-to-speech (TTS) and voice conversion (VC) systems — but it is not definitive. Heavily compressed audio (MP3 at 64 kbps), phone codec artifacts (G.711, AMR), and reverberant recordings all affect the score. A high score suggests AI-like statistical properties; it does not prove a clip is fake. Treat the result as one data point, not a verdict. For forensic or legal purposes, consult a specialist.
What audio formats and lengths work best?: MP3, WAV, and M4A files are all supported. For best results, use an uncompressed or lightly compressed clip of at least 5 seconds and ideally 10–30 seconds — very short clips give fewer frames to analyze, reducing confidence. Files with only music or background noise (no speech) will produce unreliable scores since the heuristics are calibrated for human speech patterns.
What kinds of AI voice clones does this detect?: The spectral heuristics are most sensitive to neural TTS systems (e.g., those using WaveNet, VITS, or similar vocoders) and real-time voice conversion tools (e.g., RVC, SVC). They are less effective against high-quality diffusion-based models trained on large multi-speaker datasets, and may miss clones that have been post-processed with noise addition or room simulation to mimic natural recording conditions.
Why does MFCC variance matter for deepfake detection?: MFCCs (Mel-frequency cepstral coefficients) are a compact representation of the spectral shape of sound, roughly matching how the human cochlea perceives timbre. When you speak naturally, your voice changes constantly — pitch, breathiness, speed, and resonance all fluctuate frame to frame, producing high MFCC variance. Neural vocoders generate audio from a compact latent space; unless deliberately perturbed, they produce a voice that is "too smooth" — lower inter-frame MFCC variance than any real human speaker would produce. This is one of the most stable signals across different TTS architectures.

AI Voice Clone Signal Analyzer

Upload or Record Audio

Spectrogram

Detection Result

Feature Graphs

How it works

Frequently asked questions