TrackAudioFeatures

The result of analysing a track's audio. Populated in two asynchronous phases: classic acoustic analysis (librosa) and ML enrichment (Essentia + TensorFlow). This page describes every field with its origin, unit, range and usage.

apps/api/src/Features/Tracks/Models/TrackAudioFeatures.cs

1 Overview

A 1:1 relationship with Track: TrackAudioFeatures.TrackId is unique and has OnDelete = Cascade. The record can exist partially populated — at the end of Phase 1 every acoustic column is present; at the end of Phase 2 every enrichment column is present.

Category Count Source
Acoustic (librosa) 13 fields apps/audio-analyzer/analyze.py · Python · librosa, pyloudnorm, Chromaprint
ML enrichment (Essentia + TF) 14 direct fields + 2 heuristics apps/audio-enricher/enrich.py · Python · Essentia + discogs-effnet
pgvector embedding 1 column (1280-dim) SQL column EmbeddingDiscogs vector(1280)
Lifecycle 4 fields Worker (retry state)
Tag legend

phase 1 = computed by audio-analyzer. phase 2 = computed by audio-enricher. derived = heuristic formula over other fields. jsonb = stored as JSON. pgvector = vector column. pending integration = column exists but is not populated yet.

2 Phase 1 · Acoustic analysis

Runs inside ProcessTrackConsumer, immediately after HLS encoding, invoking the Python sidecar audio-analyzer/analyze.py as a subprocess. Target sample rate: 22050 Hz. Maximum processed duration: 15 minutes (truncated).

Bpm
phase 1 nullable
Type
int?
Unit
BPM (integer)
Range
~40 – 240
Algorithm
librosa beat_track (DP)

Tempo in beats per minute, rounded to the nearest integer. The detector is a dynamic programming beat tracker computed over the onset-strength envelope.

Where it comes from

librosa.beat.beat_track(onset_envelope=...). Estimates in the mid range (90–180 BPM) tend to be solid; half-time genres can double/halve (e.g. a 75 BPM track reported as 150).

How it is used
  • Filter in /tracks/{id}/similar (bpmMin, bpmMax).
  • Shown in ManageAlbumPageClient (Studio) in the analysis section.
  • Tempo heuristic (slow / mid / fast) in the TrackInsightsModal.
How it could be used
  • Building BPM-compatible DJ playlists (beatmatching).
  • Workout / running playlists by cadence band.
  • "Similar tracks within ±5 BPM" recommendations.
BpmConfidence
phase 1
Type
float?
Range
[0, 1]
Computation
1 − CV(intervals)

Confidence of the BPM estimate, derived from the variance of the intervals between detected beats. Below ~0.5 the estimate is considered unreliable and the UI shows a warning.

How it is used
  • The UI shows BPM with a confidence %.
  • Recommendation pipelines may ignore low-confidence BPM.
MusicalKey
phase 1
Type
string?
Values
C, C#, D, D#, E, F, F#, G, G#, A, A#, B
Algorithm
Krumhansl-Schmuckler

Detected tonal note (tonic). Chromagram via librosa.feature.chroma_cqt(), compared against Krumhansl-Kessler profiles for each of the 12 pitch classes.

How it is used
  • Shown in the track detail UI.
  • Together with Mode, displays the tonality ("A minor").
How it could be used
  • Harmonic mixing (Camelot / Open Key wheel) — find compatible tracks for transitions.
  • Suggestions for covers/remixes in the same key.
Mode
phase 1 enum string
Type
MusicalMode?
Values
"major" · "minor"
Storage
lowercase string

Tonality mode (major or minor). Detected as the higher correlation between the mean chromagram and the major/minor Krumhansl-Kessler profiles.

KeyConfidence
phase 1
Type
float?
Range
[0, 1]
Computation
top-1 vs top-2 margin

Distance between the best tonal profile and the second best. Atonal or very percussive tracks tend to have low confidence.

LoudnessLufs
phase 1
Type
float?
Unit
LUFS (dB)
Range
typically −30 to −5
Standard
ITU-R BS.1770-4

Integrated loudness in LUFS (Loudness Units relative to Full Scale). Computed by pyloudnorm following the international broadcast standard. Typical targets:

  • Spotify / Apple Music / YouTube: −14 LUFS
  • Tidal: −14 LUFS
  • Broadcast (EBU R128): −23 LUFS
How it is used
  • LoudnessMeter in the TrackInsightsModal — shows LUFS vs target.
  • Classified as "high / ok / low" via loudnessInfo() (insights.ts).
  • Shown as text in the Studio analysis section.
How it could be used
  • Automatic loudness normalisation in the player (applied gain).
  • Warning artists that their master is "below the streaming target".
PeakDb
phase 1
Type
float?
Unit
dBFS
Computation
20 · log10(max |y|)

Sample peak in dB Full Scale. Indicates whether the master is digitally clipping (a value > 0 dBFS is impossible with integer samples; it can occur in float).

TruePeakDb
phase 1
Type
float?
Unit
dBTP
Computation
4× oversampling + maximum

True peak, reconstructed via polyphase oversampling (4×). Lets you detect inter-sample peaks — peaks that appear between samples after D/A reconstruction and can cause clipping in lossy codecs.

Rule of thumb

Keep TruePeakDb ≤ −1 dBTP to avoid distortion when playing across different platforms.

WaveformPeaksJson
phase 1 jsonb
Type
string? (jsonb)
Format
array of 800 floats
Range
[0, 1] normalised

Amplitude envelope reduced for visualisation. 800 RMS magnitude samples per window, normalised to 0–1. The front-end (Waveform) downsamples to the canvas size via downsamplePeaks().

Where it comes from

When the audiowaveform (BBC) binary is available it is used for fast extraction; otherwise numpy computes windowed RMS.

How it is used
  • Rendering the waveform in the player (Studio / Web).
  • Visual scrubbing during playback.
Fingerprint
phase 1
Type
string?
Algorithm
Chromaprint (fpcalc)

Chromaprint acoustic fingerprint — a compact string representing the spectral content of the recording. Robust against re-encoding and small pitch/tempo variations.

Where it comes from

The external fpcalc binary (from the AcoustID project), invoked by the Python sidecar.

How it is / could be used
  • Duplicate detection — compare fingerprints with tolerance to identify uploads of the same audio under different names.
  • AcoustID lookup — feed the AcoustIdMatch column (pending).
  • Detecting re-uploads after a takedown.
AcoustIdMatch
phase 1 pending integration
Type
string? (GUID)
Future source
AcoustID Web Service

Best AcoustID recording GUID found from the Fingerprint. Currently always null — the integration with the public AcoustID API is reserved but not active yet.

When to enable
  • Cross-link with MusicBrainz (canonical metadata, ISRC, releases).
  • Detecting tracks that already exist on other platforms.
SpeechMusicProfile
phase 1 jsonb
Type
string? (jsonb)
Shape
{ speech, music, noise }
Range
each value in [0, 1]

Estimated distribution between speech, music and noise across the file. A heuristic combining RMS, spectral flatness and zero-crossing rate.

How it is / could be used
  • Detecting podcasts/spoken-word incorrectly published as music.
  • Filtering spoken previews/intros.
  • Smart skipping ("skip talking parts").
AnalyzedAtUtc
phase 1
Type
DateTime? (UTC)

Timestamp marking the end of acoustic analysis. Used to invalidate data after algorithm changes (re-analyse when stale).

3 Phase 2 · ML enrichment

Runs in EnrichTrackAudioConsumer (prefetch 1, retry 2× every 5 min). Sample rates: 44100 Hz for the classic Essentia algorithms, 16000 Hz for the TensorFlow models (discogs-effnet, an EfficientNet-B0 backbone over 128-bin log-mel spectrograms).

Danceability
phase 2 ML model
Type
float?
Range
[0, 1]
Model
danceability-discogs-effnet

Probability that the track is "danceable" (class 0 of the binary classifier). Combines pulse, rhythmic regularity and spectral stability.

How it is used
  • An axis of the AudioDnaRadar (TrackInsightsModal).
  • Filters and ranking in dance/club playlists.
Energy
phase 2 derived
Type
float?
Range
[0, 1]
Mapping
−60 dBFS → 0; 0 dBFS → 1 (log)

Perceptual energy derived from RMS via a logarithmic mapping. Not an ML model — it is a direct computation on the waveform.

How it is / could be used
  • An axis of the AudioDnaRadar.
  • energyMin filters in /tracks/{id}/similar.
  • Curating "high-energy" vs "chill" playlists.
  • Ordering tracks in a set/album along an emotional arc.
DynamicComplexity
phase 2
Type
float?
Range
[0, 1] (normalised)
Source
Essentia DynamicComplexity ÷ 9

How much the loudness varies over time. High values indicate dynamic material (classical, jazz); low values indicate compressed/loud material (commercial pop, EDM).

How it could be used
  • Flagging over-compressed masters.
  • Contributing to the track's "DNA" radar.
  • Recommending to listeners who prefer dynamic material.
VoiceProbability
phase 2 ML model
Type
float?
Range
[0, 1]
Model
voice_instrumental-discogs-effnet

Probability of the "voice" class (class 1) — how much vocal content is in the track. Independent of the editorial IsInstrumental flag: it can be used to validate the artist's declaration.

How it is / could be used
  • An axis of the DNA radar.
  • "Karaoke / Instrumental" filters.
  • Cross-checking Track.IsInstrumental to surface upload inconsistencies.
MoodHappy
phase 2 ML model
Type
float?
Range
[0, 1]
Model
mood_happy-discogs-effnet

Probability that the track conveys a "happy" mood (class 0 of the binary classifier). A "happy vs not happy" classifier trained on MTG-Jamendo labels.

How it is used
  • Feeds the Valence heuristic.
  • Directly in the UI for "Feel Good" playlists.
MoodSad
phase 2 ML model
Type
float?
Range
[0, 1]
Model
mood_sad-discogs-effnet

Probability of a "sad" mood. A component of the Valence calculation.

MoodAggressive
phase 2 ML model
Type
float?
Range
[0, 1]
Model
mood_aggressive-discogs-effnet

Probability of an "aggressive" mood (typical of metal, hard rock, heavy rap). A component of Arousal.

MoodRelaxed
phase 2 ML model
Type
float?
Range
[0, 1]
Model
mood_relaxed-discogs-effnet

Probability of a "relaxed" mood. A component of Arousal (inverted).

MoodParty
phase 2 ML model
Type
float?
Range
[0, 1]
Model
mood_party-discogs-effnet

Probability of a "party feel" — uplift, energy, danceable BPM.

MoodThemeTagsJson
phase 2 jsonb multi-label
Type
string? (jsonb)
Labels
56 MTG-Jamendo tags
Model
mtg_jamendo_moodtheme-discogs-effnet (Sigmoid)

Multi-label probabilities for 56 mood/theme tags from the MTG-Jamendo dataset. Each tag has an independent score in [0,1]. Example: { "energetic": 0.82, "epic": 0.61, "dark": 0.04, ... }.

Available tags

action · adventure · advertising · ambiental · background · ballad · calm · children · christmas · commercial · cool · corporate · dark · deep · documentary · drama · dramatic · dream · emotional · energetic · epic · fast · film · fun · funny · game · groovy · happy · heavy · holiday · hopeful · inspiring · love · meditative · melancholic · melodic · motivational · movie · nature · party · positive · powerful · relaxing · retro · romantic · sad · sexy · slow · soft · soundscape · space · sport · summer · trailer · travel · upbeat · uplifting

How it could be used
  • Sync licensing: matching tracks to briefs ("epic / dramatic / trailer").
  • Auto-tagging editorial moods.
  • Contextual recommendation: "workout music" → energetic + upbeat + powerful.
EnergyBandsJson
phase 2 jsonb
Type
string? (jsonb)
Bands
8 ERB bands → sub · bass · mid · high
Algorithm
Essentia ERBBands + EqualLoudness

Mean spectral energy as 8 ERB-band values computed at 44100 Hz. For display these are grouped into four bands:

  • sub: ~20–80 Hz (kick, sub-bass)
  • bass: ~80–250 Hz (low end, electric bass)
  • mid: ~250–4000 Hz (vocals, instruments)
  • high: ~4–20 kHz (cymbals, air, brightness)
How it is used
  • EnergyBands chart in the TrackInsightsModal.
  • Visualising the master's "sonic profile".
How it could be used
  • Detecting masters lacking bass or with excessive brightness.
  • Recommending by the "colour" of the mix (warm vs bright).

4 Derived heuristics

These are not trained models; they are deterministic formulas computed by the enricher from other mood scores. Dedicated columns exist for fast access and indexing.

Arousal
phase 2 heuristic
Type
float?
Range
[0, 1]
Formula
(Aggressive + Party + (1−Relaxed)) ÷ 3

The arousal axis of the valence/arousal plane (Russell's circumplex model). How intense and active the track is vs calm and contemplative.

Not a regressor

It is a weighted combination of binary classifiers, not a model trained to regress continuous arousal. Good enough for visualisation, but use with caution in recommendation pipelines that depend on precise calibration.

How it is used
  • The Y axis of the MoodMap (Studio).
  • An axis of the AudioDnaRadar.
Valence
phase 2 heuristic
Type
float?
Range
[0, 1]
Formula
(Happy + (1−Sad)) ÷ 2

The valence axis — emotional positive vs negative. Values near 1 = happy/uplifting; near 0 = sad/melancholic.

How it is used
  • The X axis of the MoodMap.
  • valenceMin / valenceMax filters in /tracks/{id}/similar.
  • An axis of the DNA radar.

5 pgvector embedding

EmbeddingDiscogs
phase 2 pgvector raw column
SQL type
vector(1280)
Index
HNSW · vector_cosine_ops
Origin
discogs-effnet (mean pooling)

A 1280-dimension embedding extracted from the penultimate layer of the discogs-effnet-bs64-1.pb model, with mean pooling over time. Captures musical semantics (style, instrumentation, sonic texture).

Not in the C# model

The column exists in PostgreSQL via migration (20260513230003_AddAudioEnrichmentSchema) with an HNSW index, but is not mapped by EF Core in TrackAudioFeatures.cs. It is accessed via raw SQL in the GetSimilarEndpoint.

How it is used
-- /tracks/{id}/similar — approximate cosine search
SELECT t."Id",
       1 - (taf."EmbeddingDiscogs" <=> '[…]'::vector(1280)) AS similarity
FROM tracks t
JOIN track_audio_features taf ON taf."TrackId" = t."Id"
WHERE taf."EmbeddingDiscogs" IS NOT NULL
ORDER BY taf."EmbeddingDiscogs" <=> '[…]'::vector(1280)
LIMIT 50;
How it could be used
  • "More like this" anywhere in the app.
  • Personal radio blending embeddings of liked tracks.
  • Catalog clustering for discovering emerging genres.
  • Outlier detection within an album.

6 Enrichment lifecycle

EnrichmentStatus
enum string
Type
EnrichmentStatus
Values
Pending · Completed · Failed
Storage
UPPERCASE string

Drives the retries of EnrichTrackAudioConsumer. Initial value Pending at the end of Phase 1.

EnrichedAtUtc
Type
DateTime? (UTC)

Timestamp of the last successful enrichment.

EnrichmentAttempts
Type
int
Default
0

Attempt counter. The worker increments it on each run. After the configured maximum (3), it records EnrichmentStatus = Failed and releases the track to Ready (fail-open).

EnrichmentLastError
Type
string? (truncated to 1024)

Last error message. Useful for diagnostics in Studio (admin UI) and for alerting/observability.

7 Used today

API · GetSimilar

GET /tracks/{id}/similar
Uses: EmbeddingDiscogs (cosine, HNSW), Bpm, Energy, Valence.

Studio · TrackInsightsModal

Waveform, AudioDnaRadar (6 axes), LoudnessMeter, MoodMap (Valence × Arousal), EnergyBands, ProcessingStepper.

Studio · ManageAlbumPageClient

Read-only analysis section: BPM, Key+Mode, LUFS, True Peak, Speech/Music Profile.

Mood pipeline

Uses Valence, Arousal and MoodThemeTagsJson to suggest editorial Moods (auto-tagging) in Studio.

8 Possible uses (not implemented)

Discovery

Personalised radio based on the centroid of embeddings of the user's liked tracks. Diversification by cone in embedding space.

DJ / mixing

Harmonic mixing (key + mode) + beatmatching (BPM ±5%) for automatic transitions in "Continuous Mix" playlists.

Master QA

Warn artists at upload: "Your master is at −18 LUFS — 4 dB below the streaming target". Combines LoudnessLufs and TruePeakDb.

Sync licensing

Search by brief: "I need something epic/cinematic in D minor, ≤120 BPM, high arousal" → a mixed query (MoodThemeTagsJson + Key + BPM + Arousal).

Duplicate detection

Hashing the Fingerprint + embedding distance to detect re-uploads, close covers and uncredited samples.

Anti-fraud

Combine VoiceProbability, SpeechMusicProfile and DynamicComplexity to detect suspicious uploads (disguised podcasts, white noise, short loops).