Skip to content

Sound production — authoring contract

Companion to sound.md's editorial standard (docs/authoring/standards/sound.md). This file defines the machine-read shape consumed by the naluma-audio pipeline. Canon: docs/superpowers/specs/2026-06-14-app-sound-production-design.md.

Folder layout

sounds/<album-slug>/
  album.md                 # album: title, description (≤120), display_order, image_alt, library_category
  album.<locale>.md        # album title + description per locale (text-only)
  research.md              # sourcing + IP/license dossier
  hero.webp                # square 1024×1024 album cover (via the illustrate skill)
  <track-slug>/
    sound.md               # track contract (below)
    sound.<locale>.md      # track title only
audio/sounds/<album-slug>/<track-slug>/   # gitignored render output (master.m4a + manifest.json)

sound.md frontmatter contract

---
type: sound                    # publish validate.py discriminator (required)
slug: light-rain               # globally unique; maps to content.sounds.slug
album: rain                    # album slug (FK resolved at publish)
title: Light rain
tier: free                     # free | premium
unlocks_in_week: 1
default_mixing_point: 0.5      # 0..1, TRT default
source:
  kind: synthesized            # synthesized | licensed
  # --- synthesized only ---
  colour: pink                 # base + variants; see the noise vocabulary below
  duration_s: 600              # generated length in seconds (pre-loop)
  seed: 0                      # deterministic synthesis seed
  # --- licensed only ---
  file: source.wav             # path RELATIVE to this sound.md (gitignored local source)
  url: https://freesound.org/… # provenance
  trim_start_ms: 0             # optional, default 0
  trim_end_ms: null            # optional; null = to end
license: CC0                   # required for licensed; "synthesized" for synthesized
channels: 2                    # 1 | 2 (noise colours may be 1; ambient 2)
loop:
  enabled: true                # equal-power seam crossfade
  crossfade_ms: 2000
  min_duration_s: 600          # floor: tile the seamless loop up to >= this (default 600 = 10 min)
spatial:
  render: none                 # none | baked
  # render: baked → ALSO emits a headphone-only binaural master (master.binaural.m4a)
  # preset: enveloping         # enveloping | wide-diffuse | front-stage (omit → config default)
  # scene: [{channel: 0, azimuth: 110, elevation: 0}, ...]  # advanced; overrides preset
cleanup:                 # optional; OMIT → no processing (byte-identical render). LICENSED sources only.
  denoise: medium        # off | light | medium | strong  (afftdn noise-reduction)
  highpass_hz: 25        # int Hz in [10,200], or omit → none (DC + sub-sonic rumble cut)
  declick: false         # bool → adeclick (isolated clicks/pops)
image_alt: Soft rain on a window
---
# Description
One sentence on what the sound is; one optional sentence on what it is for. (≤120 chars.)

Rules: - type: is required and is the publish discriminatortype: sound on a sound.md, type: sound_album on an album.md (validate.py routes on it). album.md frontmatter carries type: sound_album, slug, title, display_order, image_alt, library_category; the ≤120-char one-sentence album description is the markdown body (mirrors how audioGuided puts its description in the body). Keep the album description free of colons (a bare-string body with a colon is parsed as a mapping, not text). - source.kind: synthesized requires colour + duration_s; license: synthesized. colour ∈ the noise vocabulary: base colours white | pink | brown | blue | violet plus spectral variants deep-brown | soft-brown | soft-pink | warm-white | bright-white (β tilts) and green (mid-band band-pass, not a tilt). The renderer maps each to its spectrum in audio_pipeline/naluma_audio/noise.py (NOISE_BETA + the green branch); the sound_lint guard's NOISE_COLOURS set mirrors it. - source.kind: licensed requires file (local, gitignored) + url + a real license. - spatial.render: baked (implemented) — emits a SECOND, headphone-only binaural master (master.binaural.m4a) alongside the flat master.m4a; the flat master stays canonical (speakers + universal fallback). The bake convolves the looped master with a pinned HRTF (MIT KEMAR SOFA via sofar + scipy.signal.fftconvolve), seam-preserved (tile ×N → convolve → central slice), deterministic, peak-limited to −1 dBFS before loudnorm. Choose a preset (enveloping | wide-diffuse | front-stage; omit → the config default_preset) or give an explicit scene: [{channel, azimuth, elevation}] (overrides preset). Baked requires channels: 2 and every scene channel < channels. is_spatial stays false — the binaural file is an alternate render, NOT the deferred dynamic-spatial path (-stereo / spatial_stems). HRTF + presets are pinned in audio_pipeline/naluma_audio/config/spatial.toml + data/hrtf/; manifest records the render provenance (dataset, preset/scene) in a spatial block. Canon: docs/superpowers/specs/2026-06-18-baked-binaural-spatial-design.md. - Categories are NOT a track field (see docs/authoring/standards/sound.md); library_category is an editorial tag on album.md only. - loop.crossfade_ms must be shorter than half the source duration — the render folds the tail over the head, so 2 × crossfade must be less than the source length or the render fails. - Every master targets 10–20 min. loop.min_duration_s (default 600) tiles a short seamless loop up to the floor — this is perceived length only (the underlying repeat period stays the source length; tiling is click-free because the loop body already wraps). Cap over-long sources at ≤ 1200 s (20 min) via source.trim_end_ms. duration_ms reflects the tiled master. - cleanup: is opt-in and licensed-only. Absent → render is byte-identical to today. A cleanup block on a synthesized source is rejected (denoising a generated colour would attack its spectrum). Cleanup runs after trim, before the loop, so the loop/loudnorm/binaural all inherit clean audio.

Catalog source of truth (sounds-backlog.csv retired, #131)

Per-item album.md/sound.md are the catalog source of truth ("files are truth", #126). The old sounds/sounds-backlog.csv roadmap was retired (renamed sounds-backlog.superseded.csv; see sounds/SUPERSEDED.md) — not deleted. The coach-schedule validator (tools/schedule_checks/validate.py) derives valid sound slugs directly from the per-item files (_sound_universe: album dir names + track dir names under sounds/), so no separate index file is needed. (An earlier plan floated a generated sounds/index.csv; it was never produced and is not required.)