Record or Upload Audio

Microphone level

or upload an audio file

Drop an audio file here or click to browse
WAV, MP3, OGG, M4A, FLAC — mono or stereo, any sample rate

Detection settings

Onset sensitivity

Frame threshold

Min note length

Ready — click Start Recording or drop an audio file.

Transcribing…

Loading model…

The neural network runs entirely in your browser. First run downloads the model (~10 MB, cached after that).

Sheet Music

How it works

1

Record or upload a melody — hum, sing, whistle, or play a single-line instrument.

2

On-device AI (Spotify's Basic Pitch neural network via TensorFlow.js) detects pitch frames and note onsets — no audio ever leaves your device.

3

Notes are assembled into a melody, displayed as sheet music (via ABC notation + abcjs), and packaged as MIDI and MusicXML.

4

Download your files to import into Sibelius, MuseScore, GarageBand, Logic Pro, Finale, or any DAW.

The pitch-detection model is a compact convolutional neural network trained by Spotify Research on a large dataset of musical audio. It estimates, for each 11.6 ms frame, the probability that each of 88 MIDI pitches is present — enabling polyphonic (multiple simultaneous notes) transcription, though hum-to-sheet results are best for single-note melodies.

Frequently asked questions

What kind of audio gives the best results?: Clear single-note melodies work best: humming, whistling, singing a melody on "la" or "na", or playing a flute, violin, or other monophonic instrument. Chords and complex harmonics make transcription harder — for polyphonic recordings, try the High onset sensitivity setting and accept that some notes may merge or split. Background noise and reverb reduce accuracy significantly, so record in a quiet room and keep the microphone close.
Does my audio get sent to a server?: No. The entire transcription pipeline runs inside your browser using WebAssembly and TensorFlow.js. The only network request is a one-time download of the ~10 MB model weights (served from a CDN, cached in your browser after the first use). Your microphone audio or uploaded file never leaves your device.
What do MIDI and MusicXML downloads give me?: The MIDI file (.mid) can be imported into any DAW or notation software — GarageBand, Logic Pro, Ableton Live, FL Studio, Cubase, or MuseScore. It carries the exact pitch and timing of each detected note. The MusicXML file (.xml) is the standard interchange format for notation software: Sibelius, Finale, Dorico, MuseScore, and Flat.io all import it, giving you an editable score with proper staves, clefs, and note values. The ABC file (.abc) is a lightweight text format supported by MuseScore, abcjs, and various online tools.
Why does the first run take longer?: The neural network model (~10 MB of weights) is downloaded from a CDN on your first visit and stored in your browser's cache. Subsequent uses are instant. Transcription itself takes roughly one second of processing per second of audio on a modern device.
What is the maximum recording length?: There is no hard limit, but transcription time scales linearly with audio length. A 30-second recording typically processes in 10–30 seconds depending on your device. Very long recordings (several minutes) may cause the browser tab to use significant memory during inference. For best results, keep recordings under 60 seconds and trim silence from the start and end.

Hum to Sheet Music

Record or Upload Audio

Transcribing…

Sheet Music

How it works

Frequently asked questions