Docs › Track › TrackAudioFeatures

TrackAudioFeatures

The result of analysing a track's audio. Populated in two asynchronous phases: classic acoustic analysis (librosa) and ML enrichment (Essentia + TensorFlow). This page describes every field with its origin, unit, range and usage.

apps/api/src/Features/Tracks/Models/TrackAudioFeatures.cs

1 Overview

A 1:1 relationship with Track: TrackAudioFeatures.TrackId is unique and has OnDelete = Cascade. The record can exist partially populated — at the end of Phase 1 every acoustic column is present; at the end of Phase 2 every enrichment column is present.

Category	Count	Source
Acoustic (librosa)	13 fields	`apps/audio-analyzer/analyze.py` · Python · librosa, pyloudnorm, Chromaprint
ML enrichment (Essentia + TF)	14 direct fields + 2 heuristics	`apps/audio-enricher/enrich.py` · Python · Essentia + discogs-effnet
pgvector embedding	1 column (1280-dim)	SQL column `EmbeddingDiscogs vector(1280)`
Lifecycle	4 fields	Worker (retry state)

Tag legend

phase 1 = computed by audio-analyzer. phase 2 = computed by audio-enricher. derived = heuristic formula over other fields. jsonb = stored as JSON. pgvector = vector column. pending integration = column exists but is not populated yet.

2 Phase 1 · Acoustic analysis

Runs inside ProcessTrackConsumer, immediately after HLS encoding, invoking the Python sidecar audio-analyzer/analyze.py as a subprocess. Target sample rate: 22050 Hz. Maximum processed duration: 15 minutes (truncated).

Bpm

phase 1 nullable

Type: int?
Unit: BPM (integer)
Range: ~40 – 240
Algorithm: librosa beat_track (DP)

Tempo in beats per minute, rounded to the nearest integer. The detector is a dynamic programming beat tracker computed over the onset-strength envelope.

Where it comes from

librosa.beat.beat_track(onset_envelope=...). Estimates in the mid range (90–180 BPM) tend to be solid; half-time genres can double/halve (e.g. a 75 BPM track reported as 150).

How it is used

Filter in /tracks/{id}/similar (bpmMin, bpmMax).
Shown in ManageAlbumPageClient (Studio) in the analysis section.
Tempo heuristic (slow / mid / fast) in the TrackInsightsModal.

How it could be used

Building BPM-compatible DJ playlists (beatmatching).
Workout / running playlists by cadence band.
"Similar tracks within ±5 BPM" recommendations.

BpmConfidence

phase 1

Type: float?
Range: [0, 1]
Computation: 1 − CV(intervals)

Confidence of the BPM estimate, derived from the variance of the intervals between detected beats. Below ~0.5 the estimate is considered unreliable and the UI shows a warning.

How it is used

The UI shows BPM with a confidence %.
Recommendation pipelines may ignore low-confidence BPM.

MusicalKey

phase 1

Type: string?
Values: C, C#, D, D#, E, F, F#, G, G#, A, A#, B
Algorithm: Krumhansl-Schmuckler

Detected tonal note (tonic). Chromagram via librosa.feature.chroma_cqt(), compared against Krumhansl-Kessler profiles for each of the 12 pitch classes.

How it is used

Shown in the track detail UI.
Together with Mode, displays the tonality ("A minor").

How it could be used

Harmonic mixing (Camelot / Open Key wheel) — find compatible tracks for transitions.
Suggestions for covers/remixes in the same key.

Mode

phase 1 enum string

Type: MusicalMode?
Values: "major" · "minor"
Storage: lowercase string

Tonality mode (major or minor). Detected as the higher correlation between the mean chromagram and the major/minor Krumhansl-Kessler profiles.

KeyConfidence

phase 1

Type: float?
Range: [0, 1]
Computation: top-1 vs top-2 margin

Distance between the best tonal profile and the second best. Atonal or very percussive tracks tend to have low confidence.

LoudnessLufs

phase 1

Type: float?
Unit: LUFS (dB)
Range: typically −30 to −5
Standard: ITU-R BS.1770-4

Integrated loudness in LUFS (Loudness Units relative to Full Scale). Computed by pyloudnorm following the international broadcast standard. Typical targets:

Spotify / Apple Music / YouTube: −14 LUFS
Tidal: −14 LUFS
Broadcast (EBU R128): −23 LUFS

How it is used

LoudnessMeter in the TrackInsightsModal — shows LUFS vs target.
Classified as "high / ok / low" via loudnessInfo() (insights.ts).
Shown as text in the Studio analysis section.

How it could be used

Automatic loudness normalisation in the player (applied gain).
Warning artists that their master is "below the streaming target".

PeakDb

phase 1

Type: float?
Unit: dBFS
Computation: 20 · log10(max |y|)

Sample peak in dB Full Scale. Indicates whether the master is digitally clipping (a value > 0 dBFS is impossible with integer samples; it can occur in float).

TruePeakDb

phase 1

Type: float?
Unit: dBTP
Computation: 4× oversampling + maximum

True peak, reconstructed via polyphase oversampling (4×). Lets you detect inter-sample peaks — peaks that appear between samples after D/A reconstruction and can cause clipping in lossy codecs.

Rule of thumb

Keep TruePeakDb ≤ −1 dBTP to avoid distortion when playing across different platforms.

WaveformPeaksJson

phase 1 jsonb

Type: string? (jsonb)
Format: array of 800 floats
Range: [0, 1] normalised

Amplitude envelope reduced for visualisation. 800 RMS magnitude samples per window, normalised to 0–1. The front-end (Waveform) downsamples to the canvas size via downsamplePeaks().

Where it comes from

When the audiowaveform (BBC) binary is available it is used for fast extraction; otherwise numpy computes windowed RMS.

How it is used

Rendering the waveform in the player (Studio / Web).
Visual scrubbing during playback.

Fingerprint

phase 1

Type: string?
Algorithm: Chromaprint (fpcalc)

Chromaprint acoustic fingerprint — a compact string representing the spectral content of the recording. Robust against re-encoding and small pitch/tempo variations.

Where it comes from

The external fpcalc binary (from the AcoustID project), invoked by the Python sidecar.

How it is / could be used

Duplicate detection — compare fingerprints with tolerance to identify uploads of the same audio under different names.
AcoustID lookup — feed the AcoustIdMatch column (pending).
Detecting re-uploads after a takedown.

AcoustIdMatch

phase 1 pending integration

Type: string? (GUID)
Future source: AcoustID Web Service

Best AcoustID recording GUID found from the Fingerprint. Currently always null — the integration with the public AcoustID API is reserved but not active yet.

When to enable

Cross-link with MusicBrainz (canonical metadata, ISRC, releases).
Detecting tracks that already exist on other platforms.

SpeechMusicProfile

phase 1 jsonb

Type: string? (jsonb)
Shape: { speech, music, noise }
Range: each value in [0, 1]

Estimated distribution between speech, music and noise across the file. A heuristic combining RMS, spectral flatness and zero-crossing rate.

How it is / could be used

Detecting podcasts/spoken-word incorrectly published as music.
Filtering spoken previews/intros.
Smart skipping ("skip talking parts").

AnalyzedAtUtc

phase 1

Type: DateTime? (UTC)

Timestamp marking the end of acoustic analysis. Used to invalidate data after algorithm changes (re-analyse when stale).

3 Phase 2 · ML enrichment

Runs in EnrichTrackAudioConsumer (prefetch 1, retry 2× every 5 min). Sample rates: 44100 Hz for the classic Essentia algorithms, 16000 Hz for the TensorFlow models (discogs-effnet, an EfficientNet-B0 backbone over 128-bin log-mel spectrograms).

Danceability

phase 2 ML model

Type: float?
Range: [0, 1]
Model: danceability-discogs-effnet

Probability that the track is "danceable" (class 0 of the binary classifier). Combines pulse, rhythmic regularity and spectral stability.

How it is used

An axis of the AudioDnaRadar (TrackInsightsModal).
Filters and ranking in dance/club playlists.

Energy

phase 2 derived

Type: float?
Range: [0, 1]
Mapping: −60 dBFS → 0; 0 dBFS → 1 (log)

Perceptual energy derived from RMS via a logarithmic mapping. Not an ML model — it is a direct computation on the waveform.

How it is / could be used

An axis of the AudioDnaRadar.
energyMin filters in /tracks/{id}/similar.
Curating "high-energy" vs "chill" playlists.
Ordering tracks in a set/album along an emotional arc.

DynamicComplexity

phase 2

Type: float?
Range: [0, 1] (normalised)
Source: Essentia DynamicComplexity ÷ 9

How much the loudness varies over time. High values indicate dynamic material (classical, jazz); low values indicate compressed/loud material (commercial pop, EDM).

How it could be used

Flagging over-compressed masters.
Contributing to the track's "DNA" radar.
Recommending to listeners who prefer dynamic material.

VoiceProbability

phase 2 ML model

Type: float?
Range: [0, 1]
Model: voice_instrumental-discogs-effnet

Probability of the "voice" class (class 1) — how much vocal content is in the track. Independent of the editorial IsInstrumental flag: it can be used to validate the artist's declaration.

How it is / could be used

An axis of the DNA radar.
"Karaoke / Instrumental" filters.
Cross-checking Track.IsInstrumental to surface upload inconsistencies.

MoodHappy

phase 2 ML model

Type: float?
Range: [0, 1]
Model: mood_happy-discogs-effnet

Probability that the track conveys a "happy" mood (class 0 of the binary classifier). A "happy vs not happy" classifier trained on MTG-Jamendo labels.

How it is used

Feeds the Valence heuristic.
Directly in the UI for "Feel Good" playlists.

MoodSad

phase 2 ML model

Type: float?
Range: [0, 1]
Model: mood_sad-discogs-effnet

Probability of a "sad" mood. A component of the Valence calculation.

MoodAggressive

phase 2 ML model

Type: float?
Range: [0, 1]
Model: mood_aggressive-discogs-effnet

Probability of an "aggressive" mood (typical of metal, hard rock, heavy rap). A component of Arousal.

MoodRelaxed

phase 2 ML model

Type: float?
Range: [0, 1]
Model: mood_relaxed-discogs-effnet

Probability of a "relaxed" mood. A component of Arousal (inverted).

MoodParty

phase 2 ML model

Type: float?
Range: [0, 1]
Model: mood_party-discogs-effnet

Probability of a "party feel" — uplift, energy, danceable BPM.

MoodThemeTagsJson

phase 2 jsonb multi-label

Type: string? (jsonb)
Labels: 56 MTG-Jamendo tags
Model: mtg_jamendo_moodtheme-discogs-effnet (Sigmoid)

Multi-label probabilities for 56 mood/theme tags from the MTG-Jamendo dataset. Each tag has an independent score in [0,1]. Example: { "energetic": 0.82, "epic": 0.61, "dark": 0.04, ... }.

Available tags

action · adventure · advertising · ambiental · background · ballad · calm · children · christmas · commercial · cool · corporate · dark · deep · documentary · drama · dramatic · dream · emotional · energetic · epic · fast · film · fun · funny · game · groovy · happy · heavy · holiday · hopeful · inspiring · love · meditative · melancholic · melodic · motivational · movie · nature · party · positive · powerful · relaxing · retro · romantic · sad · sexy · slow · soft · soundscape · space · sport · summer · trailer · travel · upbeat · uplifting

How it could be used

Sync licensing: matching tracks to briefs ("epic / dramatic / trailer").
Auto-tagging editorial moods.
Contextual recommendation: "workout music" → energetic + upbeat + powerful.

EnergyBandsJson

phase 2 jsonb

Type: string? (jsonb)
Bands: 8 ERB bands → sub · bass · mid · high
Algorithm: Essentia ERBBands + EqualLoudness

Mean spectral energy as 8 ERB-band values computed at 44100 Hz. For display these are grouped into four bands:

sub: ~20–80 Hz (kick, sub-bass)
bass: ~80–250 Hz (low end, electric bass)
mid: ~250–4000 Hz (vocals, instruments)
high: ~4–20 kHz (cymbals, air, brightness)

How it is used

EnergyBands chart in the TrackInsightsModal.
Visualising the master's "sonic profile".

How it could be used

Detecting masters lacking bass or with excessive brightness.
Recommending by the "colour" of the mix (warm vs bright).

4 Derived heuristics

These are not trained models; they are deterministic formulas computed by the enricher from other mood scores. Dedicated columns exist for fast access and indexing.

Arousal

phase 2 heuristic

Type: float?
Range: [0, 1]
Formula: (Aggressive + Party + (1−Relaxed)) ÷ 3

The arousal axis of the valence/arousal plane (Russell's circumplex model). How intense and active the track is vs calm and contemplative.

Not a regressor

It is a weighted combination of binary classifiers, not a model trained to regress continuous arousal. Good enough for visualisation, but use with caution in recommendation pipelines that depend on precise calibration.

How it is used

The Y axis of the MoodMap (Studio).
An axis of the AudioDnaRadar.

Valence

phase 2 heuristic

Type: float?
Range: [0, 1]
Formula: (Happy + (1−Sad)) ÷ 2

The valence axis — emotional positive vs negative. Values near 1 = happy/uplifting; near 0 = sad/melancholic.

How it is used

The X axis of the MoodMap.
valenceMin / valenceMax filters in /tracks/{id}/similar.
An axis of the DNA radar.

5 pgvector embedding

EmbeddingDiscogs

phase 2 pgvector raw column

SQL type: vector(1280)
Index: HNSW · vector_cosine_ops
Origin: discogs-effnet (mean pooling)

A 1280-dimension embedding extracted from the penultimate layer of the discogs-effnet-bs64-1.pb model, with mean pooling over time. Captures musical semantics (style, instrumentation, sonic texture).

Not in the C# model

The column exists in PostgreSQL via migration (20260513230003_AddAudioEnrichmentSchema) with an HNSW index, but is not mapped by EF Core in TrackAudioFeatures.cs. It is accessed via raw SQL in the GetSimilarEndpoint.

How it is used

-- /tracks/{id}/similar — approximate cosine search
SELECT t."Id",
       1 - (taf."EmbeddingDiscogs" <=> '[…]'::vector(1280)) AS similarity
FROM tracks t
JOIN track_audio_features taf ON taf."TrackId" = t."Id"
WHERE taf."EmbeddingDiscogs" IS NOT NULL
ORDER BY taf."EmbeddingDiscogs" <=> '[…]'::vector(1280)
LIMIT 50;

How it could be used

"More like this" anywhere in the app.
Personal radio blending embeddings of liked tracks.
Catalog clustering for discovering emerging genres.
Outlier detection within an album.

6 Enrichment lifecycle

EnrichmentStatus

enum string

Type: EnrichmentStatus
Values: Pending · Completed · Failed
Storage: UPPERCASE string

Drives the retries of EnrichTrackAudioConsumer. Initial value Pending at the end of Phase 1.

EnrichedAtUtc

Type: DateTime? (UTC)

Timestamp of the last successful enrichment.

EnrichmentAttempts

Type: int
Default: 0

Attempt counter. The worker increments it on each run. After the configured maximum (3), it records EnrichmentStatus = Failed and releases the track to Ready (fail-open).

EnrichmentLastError

Type: string? (truncated to 1024)

Last error message. Useful for diagnostics in Studio (admin UI) and for alerting/observability.

7 Used today

API · GetSimilar

GET /tracks/{id}/similar
Uses: EmbeddingDiscogs (cosine, HNSW), Bpm, Energy, Valence.

Studio · TrackInsightsModal

Waveform, AudioDnaRadar (6 axes), LoudnessMeter, MoodMap (Valence × Arousal), EnergyBands, ProcessingStepper.

Studio · ManageAlbumPageClient

Read-only analysis section: BPM, Key+Mode, LUFS, True Peak, Speech/Music Profile.

Mood pipeline

Uses Valence, Arousal and MoodThemeTagsJson to suggest editorial Moods (auto-tagging) in Studio.

8 Possible uses (not implemented)

Discovery

Personalised radio based on the centroid of embeddings of the user's liked tracks. Diversification by cone in embedding space.

DJ / mixing

Harmonic mixing (key + mode) + beatmatching (BPM ±5%) for automatic transitions in "Continuous Mix" playlists.

Master QA

Warn artists at upload: "Your master is at −18 LUFS — 4 dB below the streaming target". Combines LoudnessLufs and TruePeakDb.

Sync licensing

Search by brief: "I need something epic/cinematic in D minor, ≤120 BPM, high arousal" → a mixed query (MoodThemeTagsJson + Key + BPM + Arousal).

Duplicate detection

Hashing the Fingerprint + embedding distance to detect re-uploads, close covers and uncredited samples.

Anti-fraud

Combine VoiceProbability, SpeechMusicProfile and DynamicComplexity to detect suspicious uploads (disguised podcasts, white noise, short loops).