TrackAudioFeatures
The result of analysing a track's audio. Populated in two
asynchronous phases: classic acoustic analysis (librosa) and ML
enrichment (Essentia + TensorFlow). This page describes
every field with its origin, unit, range and usage.
apps/api/src/Features/Tracks/Models/TrackAudioFeatures.cs
1 Overview
A 1:1 relationship with Track:
TrackAudioFeatures.TrackId is unique and has
OnDelete = Cascade. The record can exist partially
populated — at the end of Phase 1 every acoustic column is present;
at the end of Phase 2 every enrichment column is present.
| Category |
Count |
Source |
| Acoustic (librosa) |
13 fields |
apps/audio-analyzer/analyze.py · Python ·
librosa, pyloudnorm, Chromaprint
|
| ML enrichment (Essentia + TF) |
14 direct fields + 2 heuristics |
apps/audio-enricher/enrich.py · Python ·
Essentia + discogs-effnet
|
| pgvector embedding |
1 column (1280-dim) |
SQL column EmbeddingDiscogs vector(1280) |
| Lifecycle |
4 fields |
Worker (retry state) |
Tag legend
phase 1 = computed by
audio-analyzer.
phase 2 = computed by
audio-enricher.
derived = heuristic formula over
other fields.
jsonb = stored as JSON.
pgvector = vector column.
pending integration = column exists
but is not populated yet.
2 Phase 1 · Acoustic analysis
Runs inside ProcessTrackConsumer, immediately after HLS
encoding, invoking the Python sidecar
audio-analyzer/analyze.py as a subprocess. Target sample
rate: 22050 Hz. Maximum processed duration: 15
minutes (truncated).
- Type
- int?
- Unit
- BPM (integer)
- Range
- ~40 – 240
- Algorithm
- librosa beat_track (DP)
Tempo in beats per minute, rounded to the nearest integer. The
detector is a dynamic programming beat tracker computed
over the onset-strength envelope.
Where it comes from
librosa.beat.beat_track(onset_envelope=...).
Estimates in the mid range (90–180 BPM) tend to be solid;
half-time genres can double/halve (e.g. a 75 BPM track reported
as 150).
How it is used
- Filter in
/tracks/{id}/similar (bpmMin, bpmMax).
- Shown in
ManageAlbumPageClient (Studio) in the analysis section.
- Tempo heuristic (slow / mid / fast) in the
TrackInsightsModal.
How it could be used
- Building BPM-compatible DJ playlists (beatmatching).
- Workout / running playlists by cadence band.
- "Similar tracks within ±5 BPM" recommendations.
- Type
- float?
- Range
- [0, 1]
- Computation
- 1 − CV(intervals)
Confidence of the BPM estimate, derived from the variance of the
intervals between detected beats. Below ~0.5 the
estimate is considered unreliable and the UI shows a warning.
How it is used
- The UI shows BPM with a confidence %.
- Recommendation pipelines may ignore low-confidence BPM.
- Type
- string?
- Values
- C, C#, D, D#, E, F, F#, G, G#, A, A#, B
- Algorithm
- Krumhansl-Schmuckler
Detected tonal note (tonic). Chromagram via
librosa.feature.chroma_cqt(), compared against
Krumhansl-Kessler profiles for each of the 12 pitch classes.
How it is used
- Shown in the track detail UI.
- Together with Mode, displays the tonality ("A minor").
How it could be used
- Harmonic mixing (Camelot / Open Key wheel) — find compatible tracks for transitions.
- Suggestions for covers/remixes in the same key.
- Type
- MusicalMode?
- Values
- "major" · "minor"
- Storage
- lowercase string
Tonality mode (major or minor). Detected as the higher
correlation between the mean chromagram and the major/minor
Krumhansl-Kessler profiles.
- Type
- float?
- Range
- [0, 1]
- Computation
- top-1 vs top-2 margin
Distance between the best tonal profile and the second best.
Atonal or very percussive tracks tend to have low confidence.
- Type
- float?
- Unit
- LUFS (dB)
- Range
- typically −30 to −5
- Standard
- ITU-R BS.1770-4
Integrated loudness in LUFS (Loudness Units relative to Full
Scale). Computed by pyloudnorm following the
international broadcast standard. Typical targets:
- Spotify / Apple Music / YouTube: −14 LUFS
- Tidal: −14 LUFS
- Broadcast (EBU R128): −23 LUFS
How it is used
LoudnessMeter in the TrackInsightsModal — shows LUFS vs target.
- Classified as "high / ok / low" via
loudnessInfo() (insights.ts).
- Shown as text in the Studio analysis section.
How it could be used
- Automatic loudness normalisation in the player (applied gain).
- Warning artists that their master is "below the streaming target".
- Type
- float?
- Unit
- dBFS
- Computation
- 20 · log10(max |y|)
Sample peak in dB Full Scale. Indicates whether the master is
digitally clipping (a value > 0 dBFS is impossible with
integer samples; it can occur in float).
- Type
- float?
- Unit
- dBTP
- Computation
- 4× oversampling + maximum
True peak, reconstructed via polyphase oversampling (4×). Lets
you detect inter-sample peaks — peaks that appear
between samples after D/A reconstruction and can cause clipping
in lossy codecs.
Rule of thumb
Keep TruePeakDb ≤ −1 dBTP to avoid distortion when
playing across different platforms.
WaveformPeaksJson
phase 1
jsonb
- Type
- string? (jsonb)
- Format
- array of 800 floats
- Range
- [0, 1] normalised
Amplitude envelope reduced for visualisation. 800 RMS magnitude
samples per window, normalised to 0–1. The front-end
(Waveform) downsamples to the canvas size via
downsamplePeaks().
Where it comes from
When the audiowaveform (BBC) binary is available it
is used for fast extraction; otherwise numpy
computes windowed RMS.
How it is used
- Rendering the waveform in the player (Studio / Web).
- Visual scrubbing during playback.
- Type
- string?
- Algorithm
- Chromaprint (fpcalc)
Chromaprint acoustic fingerprint — a compact string representing
the spectral content of the recording. Robust against
re-encoding and small pitch/tempo variations.
Where it comes from
The external fpcalc binary (from the AcoustID
project), invoked by the Python sidecar.
How it is / could be used
- Duplicate detection — compare fingerprints with tolerance to identify uploads of the same audio under different names.
- AcoustID lookup — feed the
AcoustIdMatch column (pending).
- Detecting re-uploads after a takedown.
AcoustIdMatch
phase 1
pending integration
- Type
- string? (GUID)
- Future source
- AcoustID Web Service
Best AcoustID recording GUID found from the
Fingerprint.
Currently always null — the
integration with the public AcoustID API is reserved but not
active yet.
When to enable
- Cross-link with MusicBrainz (canonical metadata, ISRC, releases).
- Detecting tracks that already exist on other platforms.
SpeechMusicProfile
phase 1
jsonb
- Type
- string? (jsonb)
- Shape
- { speech, music, noise }
- Range
- each value in [0, 1]
Estimated distribution between speech, music and noise across
the file. A heuristic combining RMS, spectral flatness
and zero-crossing rate.
How it is / could be used
- Detecting podcasts/spoken-word incorrectly published as music.
- Filtering spoken previews/intros.
- Smart skipping ("skip talking parts").
- Type
- DateTime? (UTC)
Timestamp marking the end of acoustic analysis. Used to
invalidate data after algorithm changes (re-analyse when stale).
3 Phase 2 · ML enrichment
Runs in EnrichTrackAudioConsumer (prefetch 1, retry 2×
every 5 min). Sample rates: 44100 Hz for the classic
Essentia algorithms, 16000 Hz for the TensorFlow
models (discogs-effnet, an EfficientNet-B0 backbone over 128-bin
log-mel spectrograms).
Danceability
phase 2
ML model
- Type
- float?
- Range
- [0, 1]
- Model
- danceability-discogs-effnet
Probability that the track is "danceable" (class 0 of the binary
classifier). Combines pulse, rhythmic regularity and spectral
stability.
How it is used
- An axis of the
AudioDnaRadar (TrackInsightsModal).
- Filters and ranking in dance/club playlists.
- Type
- float?
- Range
- [0, 1]
- Mapping
- −60 dBFS → 0; 0 dBFS → 1 (log)
Perceptual energy derived from RMS via a logarithmic mapping.
Not an ML model — it is a direct computation on
the waveform.
How it is / could be used
- An axis of the
AudioDnaRadar.
energyMin filters in /tracks/{id}/similar.
- Curating "high-energy" vs "chill" playlists.
- Ordering tracks in a set/album along an emotional arc.
DynamicComplexity
phase 2
- Type
- float?
- Range
- [0, 1] (normalised)
- Source
- Essentia DynamicComplexity ÷ 9
How much the loudness varies over time. High
values indicate dynamic material (classical, jazz);
low values indicate compressed/loud material
(commercial pop, EDM).
How it could be used
- Flagging over-compressed masters.
- Contributing to the track's "DNA" radar.
- Recommending to listeners who prefer dynamic material.
VoiceProbability
phase 2
ML model
- Type
- float?
- Range
- [0, 1]
- Model
- voice_instrumental-discogs-effnet
Probability of the "voice" class (class 1) — how much vocal
content is in the track. Independent of the editorial
IsInstrumental flag: it can be used to validate the
artist's declaration.
How it is / could be used
- An axis of the DNA radar.
- "Karaoke / Instrumental" filters.
- Cross-checking
Track.IsInstrumental to surface upload inconsistencies.
MoodHappy
phase 2
ML model
- Type
- float?
- Range
- [0, 1]
- Model
- mood_happy-discogs-effnet
Probability that the track conveys a "happy" mood (class 0 of
the binary classifier). A "happy vs not happy" classifier trained
on MTG-Jamendo labels.
How it is used
- Feeds the Valence heuristic.
- Directly in the UI for "Feel Good" playlists.
- Type
- float?
- Range
- [0, 1]
- Model
- mood_sad-discogs-effnet
Probability of a "sad" mood. A component of the Valence calculation.
MoodAggressive
phase 2
ML model
- Type
- float?
- Range
- [0, 1]
- Model
- mood_aggressive-discogs-effnet
Probability of an "aggressive" mood (typical of metal, hard rock,
heavy rap). A component of Arousal.
MoodRelaxed
phase 2
ML model
- Type
- float?
- Range
- [0, 1]
- Model
- mood_relaxed-discogs-effnet
Probability of a "relaxed" mood. A component of Arousal (inverted).
MoodParty
phase 2
ML model
- Type
- float?
- Range
- [0, 1]
- Model
- mood_party-discogs-effnet
Probability of a "party feel" — uplift, energy, danceable BPM.
MoodThemeTagsJson
phase 2
jsonb
multi-label
- Type
- string? (jsonb)
- Labels
- 56 MTG-Jamendo tags
- Model
- mtg_jamendo_moodtheme-discogs-effnet (Sigmoid)
Multi-label probabilities for 56 mood/theme tags from the
MTG-Jamendo dataset. Each tag has an independent score in [0,1].
Example: { "energetic": 0.82, "epic": 0.61, "dark": 0.04,
... }.
Available tags
action · adventure · advertising · ambiental · background ·
ballad · calm · children · christmas · commercial · cool ·
corporate · dark · deep · documentary · drama · dramatic ·
dream · emotional · energetic · epic · fast · film · fun ·
funny · game · groovy · happy · heavy · holiday · hopeful ·
inspiring · love · meditative · melancholic · melodic ·
motivational · movie · nature · party · positive · powerful ·
relaxing · retro · romantic · sad · sexy · slow · soft ·
soundscape · space · sport · summer · trailer · travel ·
upbeat · uplifting
How it could be used
- Sync licensing: matching tracks to briefs ("epic / dramatic / trailer").
- Auto-tagging editorial moods.
- Contextual recommendation: "workout music" →
energetic + upbeat + powerful.
EnergyBandsJson
phase 2
jsonb
- Type
- string? (jsonb)
- Bands
- 8 ERB bands → sub · bass · mid · high
- Algorithm
- Essentia ERBBands + EqualLoudness
Mean spectral energy as 8 ERB-band values computed at 44100 Hz.
For display these are grouped into four bands:
- sub: ~20–80 Hz (kick, sub-bass)
- bass: ~80–250 Hz (low end, electric bass)
- mid: ~250–4000 Hz (vocals, instruments)
- high: ~4–20 kHz (cymbals, air, brightness)
How it is used
EnergyBands chart in the TrackInsightsModal.
- Visualising the master's "sonic profile".
How it could be used
- Detecting masters lacking bass or with excessive brightness.
- Recommending by the "colour" of the mix (warm vs bright).
4 Derived heuristics
These are not trained models; they are deterministic formulas
computed by the enricher from other mood scores. Dedicated columns
exist for fast access and indexing.
Arousal
phase 2
heuristic
- Type
- float?
- Range
- [0, 1]
- Formula
- (Aggressive + Party + (1−Relaxed)) ÷ 3
The arousal axis of the valence/arousal plane (Russell's
circumplex model). How intense and active the track is vs calm
and contemplative.
Not a regressor
It is a weighted combination of binary classifiers, not a
model trained to regress continuous arousal. Good enough for
visualisation, but use with caution in recommendation
pipelines that depend on precise calibration.
How it is used
- The Y axis of the
MoodMap (Studio).
- An axis of the
AudioDnaRadar.
Valence
phase 2
heuristic
- Type
- float?
- Range
- [0, 1]
- Formula
- (Happy + (1−Sad)) ÷ 2
The valence axis — emotional positive vs negative.
Values near 1 = happy/uplifting; near 0 = sad/melancholic.
How it is used
- The X axis of the
MoodMap.
valenceMin / valenceMax filters in /tracks/{id}/similar.
- An axis of the DNA radar.
5 pgvector embedding
EmbeddingDiscogs
phase 2
pgvector
raw column
- SQL type
- vector(1280)
- Index
- HNSW · vector_cosine_ops
- Origin
- discogs-effnet (mean pooling)
A 1280-dimension embedding extracted from the penultimate layer
of the discogs-effnet-bs64-1.pb model, with mean
pooling over time. Captures musical semantics (style,
instrumentation, sonic texture).
Not in the C# model
The column exists in PostgreSQL via migration
(20260513230003_AddAudioEnrichmentSchema) with an
HNSW index, but is not mapped by EF Core in
TrackAudioFeatures.cs. It is accessed via raw SQL
in the GetSimilarEndpoint.
How it is used
-- /tracks/{id}/similar — approximate cosine search
SELECT t."Id",
1 - (taf."EmbeddingDiscogs" <=> '[…]'::vector(1280)) AS similarity
FROM tracks t
JOIN track_audio_features taf ON taf."TrackId" = t."Id"
WHERE taf."EmbeddingDiscogs" IS NOT NULL
ORDER BY taf."EmbeddingDiscogs" <=> '[…]'::vector(1280)
LIMIT 50;
How it could be used
- "More like this" anywhere in the app.
- Personal radio blending embeddings of liked tracks.
- Catalog clustering for discovering emerging genres.
- Outlier detection within an album.
6 Enrichment lifecycle
EnrichmentStatus
enum string
- Type
- EnrichmentStatus
- Values
- Pending · Completed · Failed
- Storage
- UPPERCASE string
Drives the retries of EnrichTrackAudioConsumer.
Initial value Pending at the end of Phase 1.
- Type
- DateTime? (UTC)
Timestamp of the last successful enrichment.
- Type
- int
- Default
- 0
Attempt counter. The worker increments it on each run. After the
configured maximum (3), it records
EnrichmentStatus = Failed and releases the track to
Ready (fail-open).
- Type
- string? (truncated to 1024)
Last error message. Useful for diagnostics in Studio (admin UI)
and for alerting/observability.
7 Used today
API · GetSimilar
GET /tracks/{id}/similar
Uses: EmbeddingDiscogs (cosine, HNSW),
Bpm, Energy, Valence.
Studio · TrackInsightsModal
Waveform, AudioDnaRadar (6 axes), LoudnessMeter, MoodMap
(Valence × Arousal), EnergyBands, ProcessingStepper.
Studio · ManageAlbumPageClient
Read-only analysis section: BPM, Key+Mode, LUFS, True Peak,
Speech/Music Profile.
Mood pipeline
Uses Valence, Arousal and MoodThemeTagsJson to
suggest editorial Moods (auto-tagging) in Studio.
8 Possible uses (not implemented)
Discovery
Personalised radio based on the centroid of embeddings of the
user's liked tracks. Diversification by cone in embedding space.
DJ / mixing
Harmonic mixing (key + mode) + beatmatching (BPM ±5%) for
automatic transitions in "Continuous Mix" playlists.
Master QA
Warn artists at upload: "Your master is at −18 LUFS — 4 dB below
the streaming target". Combines LoudnessLufs and TruePeakDb.
Sync licensing
Search by brief: "I need something epic/cinematic in D minor,
≤120 BPM, high arousal" → a mixed query
(MoodThemeTagsJson + Key + BPM + Arousal).
Duplicate detection
Hashing the Fingerprint + embedding distance to
detect re-uploads, close covers and uncredited samples.
Anti-fraud
Combine VoiceProbability,
SpeechMusicProfile and
DynamicComplexity to detect suspicious uploads
(disguised podcasts, white noise, short loops).