Sound production — authoring contract¶
Companion to sound.md's editorial standard (docs/authoring/standards/sound.md).
This file defines the machine-read shape consumed by the naluma-audio pipeline.
Canon: docs/superpowers/specs/2026-06-14-app-sound-production-design.md.
Folder layout¶
sounds/<album-slug>/
album.md # album: title, description (≤120), display_order, image_alt, library_category
album.<locale>.md # album title + description per locale (text-only)
research.md # sourcing + IP/license dossier
hero.webp # square 1024×1024 album cover (via the illustrate skill)
<track-slug>/
sound.md # track contract (below)
sound.<locale>.md # track title only
audio/sounds/<album-slug>/<track-slug>/ # gitignored render output (master.m4a + manifest.json)
sound.md frontmatter contract¶
---
type: sound # publish validate.py discriminator (required)
slug: light-rain # globally unique; maps to content.sounds.slug
album: rain # album slug (FK resolved at publish)
title: Light rain
tier: free # free | premium
unlocks_in_week: 1
default_mixing_point: 0.5 # 0..1, TRT default
source:
kind: synthesized # synthesized | licensed
# --- synthesized only ---
colour: pink # base + variants; see the noise vocabulary below
duration_s: 600 # generated length in seconds (pre-loop)
seed: 0 # deterministic synthesis seed
# --- licensed only ---
file: source.wav # path RELATIVE to this sound.md (gitignored local source)
url: https://freesound.org/… # provenance
trim_start_ms: 0 # optional, default 0
trim_end_ms: null # optional; null = to end
license: CC0 # required for licensed; "synthesized" for synthesized
channels: 2 # 1 | 2 (noise colours may be 1; ambient 2)
loop:
enabled: true # equal-power seam crossfade
crossfade_ms: 2000
min_duration_s: 600 # floor: tile the seamless loop up to >= this (default 600 = 10 min)
spatial:
render: none # none | baked
# render: baked → ALSO emits a headphone-only binaural master (master.binaural.m4a)
# preset: enveloping # enveloping | wide-diffuse | front-stage (omit → config default)
# scene: [{channel: 0, azimuth: 110, elevation: 0}, ...] # advanced; overrides preset
cleanup: # optional; OMIT → no processing (byte-identical render). LICENSED sources only.
denoise: medium # off | light | medium | strong (afftdn noise-reduction)
highpass_hz: 25 # int Hz in [10,200], or omit → none (DC + sub-sonic rumble cut)
declick: false # bool → adeclick (isolated clicks/pops)
image_alt: Soft rain on a window
---
# Description
One sentence on what the sound is; one optional sentence on what it is for. (≤120 chars.)
Rules:
- type: is required and is the publish discriminator — type: sound on a sound.md,
type: sound_album on an album.md (validate.py routes on it). album.md frontmatter carries
type: sound_album, slug, title, display_order, image_alt, library_category; the
≤120-char one-sentence album description is the markdown body (mirrors how audioGuided puts its
description in the body). Keep the album description free of colons (a bare-string body with a
colon is parsed as a mapping, not text).
- source.kind: synthesized requires colour + duration_s; license: synthesized.
colour ∈ the noise vocabulary: base colours white | pink | brown | blue | violet
plus spectral variants deep-brown | soft-brown | soft-pink | warm-white | bright-white
(β tilts) and green (mid-band band-pass, not a tilt). The renderer maps each to its
spectrum in audio_pipeline/naluma_audio/noise.py (NOISE_BETA + the green branch);
the sound_lint guard's NOISE_COLOURS set mirrors it.
- source.kind: licensed requires file (local, gitignored) + url + a real license.
- spatial.render: baked (implemented) — emits a SECOND, headphone-only binaural master
(master.binaural.m4a) alongside the flat master.m4a; the flat master stays canonical (speakers +
universal fallback). The bake convolves the looped master with a pinned HRTF (MIT KEMAR SOFA via
sofar + scipy.signal.fftconvolve), seam-preserved (tile ×N → convolve → central slice),
deterministic, peak-limited to −1 dBFS before loudnorm. Choose a preset (enveloping |
wide-diffuse | front-stage; omit → the config default_preset) or give an explicit scene:
[{channel, azimuth, elevation}] (overrides preset). Baked requires channels: 2 and every scene
channel < channels. is_spatial stays false — the binaural file is an alternate render, NOT
the deferred dynamic-spatial path (-stereo / spatial_stems). HRTF + presets are pinned in
audio_pipeline/naluma_audio/config/spatial.toml + data/hrtf/; manifest records the render
provenance (dataset, preset/scene) in a spatial block. Canon:
docs/superpowers/specs/2026-06-18-baked-binaural-spatial-design.md.
- Categories are NOT a track field (see docs/authoring/standards/sound.md); library_category
is an editorial tag on album.md only.
- loop.crossfade_ms must be shorter than half the source duration — the render folds the
tail over the head, so 2 × crossfade must be less than the source length or the render fails.
- Every master targets 10–20 min. loop.min_duration_s (default 600) tiles a short seamless
loop up to the floor — this is perceived length only (the underlying repeat period stays the
source length; tiling is click-free because the loop body already wraps). Cap over-long sources at
≤ 1200 s (20 min) via source.trim_end_ms. duration_ms reflects the tiled master.
- cleanup: is opt-in and licensed-only. Absent → render is byte-identical to today. A cleanup
block on a synthesized source is rejected (denoising a generated colour would attack its spectrum).
Cleanup runs after trim, before the loop, so the loop/loudnorm/binaural all inherit clean audio.
Catalog source of truth (sounds-backlog.csv retired, #131)¶
Per-item album.md/sound.md are the catalog source of truth ("files are truth", #126).
The old sounds/sounds-backlog.csv roadmap was retired (renamed
sounds-backlog.superseded.csv; see sounds/SUPERSEDED.md) — not deleted. The coach-schedule
validator (tools/schedule_checks/validate.py) derives valid sound slugs directly from the
per-item files (_sound_universe: album dir names + track dir names under sounds/), so no
separate index file is needed. (An earlier plan floated a generated sounds/index.csv; it was
never produced and is not required.)