ALPHA Prototype · waveform decomposition 11 recordings analysed

Sonic
Decomposition.

An AI-assisted toolkit for pulling a recording apart using song-file data, web-based context, and a user's ear. A series of modules progressively extract named components of the waveform — spectra, partials, transients, formants, residual — so that a song or composition can be inspected, reasoned about, or resynthesised on its own terms.

What's new is the workflow, not the algorithms. The signal-processing steps are 1960s–2000s standards: short-time Fourier transform, McAulay-Quatieri sinusoidal partial tracking, linear-predictive coding for formants, harmonic-percussive source separation, chroma plus self-similarity matrix for structure, Krumhansl-Schmuckler key-finding. The contribution is how they're assembled — two engines that don't see each other's evidence, a recurring conference — Φ — that reconciles them and folds each result forward as a prior, a deliberate refusal to name anything until enough independent fixes line up, and a diagnostic residual (D1) that audits every commitment by trying to rebuild the recording from it.

Waveform decomposition Sinusoidal partial tracking Component-level resynthesis Diagnostic residual Two-engine architecture

01 Architecture

The Flow.

The run has a shape: a boot, one fixed orientation pass, a flat transcription stage, and a conference the whole thing spins around. At Boot 0 both engines and the Φ conference layer load and stay resident — the foundation every later step re-enters. Orientation then runs both engines once, shallow, to set the first priors. After that the middle is a flat, user-led stage: tones, hits, voice, audit and structure are run in whatever order the work wants (percussion tends to go early, voice late — a preference, not a rule), and each one is the same two engines re-entered deeper, not a new machine. Every round ends in Φ, the conference: binary, web and user reconcile, a name is committed, and the resolution folds forward as a prior for the next round. Click any module in the diagram to jump to its write-up below.

0 · Boot 0 · resident foundationboth engines + the Φ conference layer load & stay resident
Orientation the one fixed pass
Binary ‖ Web · UserBoth engines run once, shallow → first Φ
Transcription stage · any order · your call
Tone
binary ‖ web · usertrajectory mode
Hits
binary ‖ web · userpercussion & onset
Voice
binary ‖ web · uservocal & formants
Audit
binary ‖ web · userD1 — resynth & residual
Structure
binary ‖ web · userstructure & harmony
Φ · the pinwheel — recurs after every module, folds findings forward as priors

0 · Boot 0 · loads resident.mp3 / .wav + artist & title — both engines + Φ layer held residenthover · 25 files ▸

1Orientationfixed passloads ▸

Binaryspectrogram & roster Web · Userinitial context

Φ · confer · commit · fold forward

Transcription stage · any order · user-led

2·1Toneloads ▸

Binarytrajectory mode Web · Userinstrumentation

Φ · confer · commit · fold forward

2·2Hitsloads ▸

Binarypercussion & onset Web · Userkit · per-hit check

Φ · confer · commit · fold forward

2·3Voiceloads ▸

Binaryvocal & formants Web · Usersinger context

Φ · confer · commit · fold forward

2·4Auditloads ▸

BinaryD1 — resynth & residual Web · Userresidual review

Φ · confer · commit · fold forward

2·5Structureloads ▸

Binarystructure & harmony Web · Usertheory · user override

Φ · confer · commit · fold forward

Hover or tap any section for the entirety of what loads there · click a binary, web·user or Φ node to jump to its write-up.

Boot 0 loads first and stays resident; Orientation is the one fixed pass; Tone, Hits, Voice, Audit and Structure are a flat stage run in any order; every round ends in the Φ conference, which folds its findings forward as priors. Switch Simple / Detailed; click any module, binary, web·user or Φ node to jump to its write-up.

02 Modules

What each module does.

The modules from the pipeline for pulling a raw sound file apart for musical analysis. A series of pair modules extracts components of the waveform and interprets them alongside web and user-derived context, in order to isolate musical elements via the waveform — spectra, partials, transients, formants, residual — so that each musical component can later be inspected, resynthesised, and reasoned about on its own terms.

The engine alternates between: a binary module (the A side) reads the audio file, and a (B side) web/user module, running in tandem. Using what’s gleaned from both the binary data alongside the cultural context and any user input, a moment of conference (the chat) reconciles what is true across all domains before the next module begins (drums are here in the mix, lead vocalist is a woman, etc.).

The engine holds attributes provisionally — recognised as repeated and regular, but not yet given a specific name. Each conference is where the User directs to commit a name, based on the combined evidence from the binary side (what's measurable in the audio), the web side (what's credited, claimed, written about), and the user side (what the human somatically or experientially confirms or corrects). The conference recurs at the end of every module pass — it's the connection or transference between the gleaned and the discussed — and each round of conference and reconciliation hands its resolutions forward as priors for the next round. In this way it is a collaborative effort, and the greater effort a user brings to the space, the stronger and more quality the final analytical output will be.

Pass 1 — orientation· Pass 2 — tones· Pass 3 — hits· Pass 4 — voice· Pass 5 — audit· Pass 6 — structure

05 Status

Where the prototype stands.

components

named, addressable parts the binary engine can extract and resynth (e.g. snare body, vocal F2, kick sub)

sonic fingerprints

recurring signal-shapes catalogued across analysed recordings (gated reverb, glue-comp pumping, etc.)

genres mapped

genres with at least metadata, credits, and reception-language entries in the web-engine library

genre baselines

genres with a full set of expected production traits the conference can run markedness checks against (LLM-summarised, not corpus-measured — see Step 4 in Web Engine)

The framework is in alpha. Eleven recordings have been analysed end-to-end. The Binary Engine modules — spectral roster, trajectory mode, percussion, vocal LPC, D1 resynthesis, structure — ship as text-spec plus runnable Python. The Web Engine ships the same way. The Conference’s three filters are implemented at every Φ conference.

Conversational ground truth ranks above engine values. The engine is allowed — encouraged — to surface a coherent hypothesis even when it can't yet prove the claim from its own parameters. The residual decides which hypotheses to keep.

Honest failure mode: the confirmation engine.
The rule above — felt response outranks engine values — is also the rule that lets this pipeline become a machine for agreeing with the user's ear. If the user is confident, the engine has a structural incentive to recover that confidence in the residual. We name this here rather than letting a reader catch it: the conference is bidirectional in principle, but a careless run will drift toward whatever the user heard first.

Three things push back against the drift in practice. (i) D1 resynthesis: every commitment has to survive a rebuild of the recording from the named components. If a singer is committed as male and D1 returns a clearly female resynth, the commit is forced back open even when the user is confident. (One such hold-out is recorded in the project's prediction-accuracy log against the Coltrane / Hartman session, where the engine refused a vibrato attribution the user had assumed.) (ii) The web/binary independence: when the two sides disagree before the conference, the user is shown both claims and the disagreement itself before being asked which to commit. (iii) The residual is a one-way audit — the user can't talk it down; either the rebuild accounts for the recording or it doesn't.

None of these eliminate the failure mode. They bound it.

Sonic
Decomposition.

The Flow.

What each module does.

Orientation

Spectrogram & spectral roster

Initial context

Reconcile: roster vs. genre baseline

Tones

Trajectory mode — sinusoidal partial tracking

Instrumentation-granular context

Commit instrument names

Hits

Percussion & onset extraction

Kit-granular context

Commit the percussion timeline

Voice

Vocal isolation & LPC formant extraction

Singer-granular context

Commit the vocal extraction

Audit

D1 — diagnostic resynthesis & residual

Residual review

Commit the residual diagnosis

Structure

Structure, harmony, chroma

Music-theory-granular context

Commit the structural read

Working figures — leftovers.

Two recordings, compared by component.

Where the prototype stands.

Working notes, not a product page.

SonicDecomposition.

The Flow.

What each module does.

Orientation

Spectrogram & spectral roster

Initial context

Reconcile: roster vs. genre baseline

Tones

Trajectory mode — sinusoidal partial tracking

Instrumentation-granular context

Commit instrument names

Hits

Percussion & onset extraction

Kit-granular context

Commit the percussion timeline

Voice

Vocal isolation & LPC formant extraction

Singer-granular context

Commit the vocal extraction

Audit

D1 — diagnostic resynthesis & residual

Residual review

Commit the residual diagnosis

Structure

Structure, harmony, chroma

Music-theory-granular context

Commit the structural read

Working figures — leftovers.

Two recordings, compared by component.

Where the prototype stands.

Working notes, not a product page.

Sonic
Decomposition.