Craft Guide — writing engaging coach conversations¶

This is the craft companion to the coach-conversation editorial standard. The standard is the terse compliance checklist (what must be true to pass review). This guide is the how: how to structure a conversation, embed an exercise so it feels earned, and write the turn-by-turn copy that makes a scripted chat worth opening. When the two disagree, the standard wins -- it is the gate.

Read first: voice.md (the five principles + contextual tone shifts), ai-patterns-en.md / ai-patterns-de.md (the mechanical ban lists), and the composition rules in naluma-app/openspec/specs/coach/spec.md (sequencing, slot logic, branch-and-merge). This guide does not restate those -- it sits on top of them.

0. What Naluma is -- and what it is not¶

Read this before anything else, because most published "conversational coaching" research assumes a capability Naluma does not have.

A Naluma coach conversation is a pre-authored, branching script, rendered in a chat-like interface:

The coach speaks in received_bubble messages.
The user responds only by tapping a reply_chip_row -- a small set of author-written options. There is no free-text input.
Branches are selected by tag_match on the chip the user tapped, then merge back.
session_card / sound_card blocks deep-link to sessions and sounds.

Naluma is not an AI chatbot. There is no natural-language understanding, no generation, and no model reading what the user types (the user types nothing). The coach cannot "understand" a feeling it was not pre-authored to anticipate; it can only route between branches the author wrote.

This matters because the strongest published evidence on engaging coaching chat -- Woebot, Wysa, Tess -- is largely about free-text agents. Several of their best-documented strengths are structurally unavailable to us, and it would be dishonest to copy them as "best practices." The table below is the honest translation this whole guide is built on:

What the research says works	Why it works there	Naluma's reality	What we do instead
Free-text conversation builds the therapeutic bond; static, menu-only flows get criticised (high confidence — Wysa, Beatty 2022)	The agent reads and reflects back what the user actually wrote	We have no free text. We are exactly the "static menu flow" users criticised.	This is our central design risk. We compensate with richly branched, specific chip options (so the user feels seen by recognising their state in a chip), reflective bubbles after a branch, and personalisation to the known programme week. We do not pretend the chip is a conversation.
Mood-contingent replies — the agent tailors its wording live to the user's typed mood (Woebot, Fitzpatrick 2017)	Real-time generation against free input	We can only branch on pre-set chips.	Author a genuine branch for every realistic state, especially the hard ones. The "tailoring" is done in advance, by us, not live by a model.
"It remembered me" — agent follows up on prior activities, names earlier goals (high confidence — Fitzpatrick 2017)	Persistent per-user state + generation	Partial. We know `program_week`, slot, and prior `tag_match` selections within reach of the spec.	Use the state we do have. Reference the programme week and what the user has already encountered. Do not fake memory we cannot support.
Reflective listening: 1 open question → 2+ reflections (high confidence — SAMHSA TIP 35)	The counsellor reflects the client's free-text answer	We cannot reflect a free-text answer; there isn't one.	Write the reflection as a `received_bubble` keyed to the chip the user tapped. The chip is the "answer"; the next bubble reflects it. This is achievable and we should do it deliberately.

The rule that follows: where a best practice depends on reading or generating free text, translate it into pre-authored branching, or drop it. Everywhere this guide makes a recommendation, it is a recommendation for a scripted system. Where the research is about something we can't do, the guide says so.

1. The conversational arc — structuring each conversation¶

The most robustly evidenced structure for a single coaching conversation is Motivational Interviewing's four-process arc (high confidence — SAMHSA TIP 35, NBK571068). MI is framed as recursive, not a fixed sequence -- a conversation can loop back. But as a default skeleton for a Naluma slot it maps cleanly:

MI process	What it does	Naluma blocks	One slot's worth
Engaging	Build rapport; make it safe to be here	opening `received_bubble`(s) + a check-in `reply_chip_row`	"Open + check in"
Focusing	Agree what this conversation is about today	`tag_match` branch on the check-in; a bubble that names the focus	"Name today's thing"
Evoking	Draw out the user's own reason it matters (see §2)	bubble(s) that connect the user's state to why a technique helps	"Earn the exercise"
Planning	Co-create the next step	`session_card` / `sound_card` + a closing bubble	"Offer + close"

The turn-by-turn skeleton¶

A typical morning/night slot, in order:

Open (Engaging). One or two short bubbles. Present, not cheerful. No "Good morning!" opener (banned). For a returning user, anchor in the programme: "Night three of the breathing week." For a morning slot, the open also carries the arc bridge: a programme-framed look-back ("This week we've been on breathing") + a one-line today-focus ("today, putting it to work at night"). Frame against the programme, never the user's action — "Yesterday you practised X" asserts a completion the app can't confirm (see §0, "It remembered me", and §3 don't #8).
Check in (Engaging). A reply_chip_row that captures the user's state. This is the only "input" we get, so the chips must cover the real range, including the hard end (standard, element 2). This is our equivalent of Woebot's open-with-a-mood-check pattern (high confidence — Fitzpatrick 2017) — except our "mood capture" is the chip set, so it has to be authored well.
Reflect + focus (Focusing). The tag_match branch opens with a reflective bubble that mirrors the chip back before doing anything else: "A rough night. That tracks -- the evenings are when it's loudest for a lot of people." This is the scripted form of reflective listening (§0, last row).
Evoke (Evoking). One or two bubbles that connect this state to why the coming technique helps -- in the user's terms, not as a sales pitch (§2).
Offer + plan (Planning). The session_card, then a short closing bubble that sets the next step or simply lands the moment.

Pacing and bubble chunking¶

One idea per bubble. Long unbroken paragraphs are a chat-UX failure (standard: 1-4 sentences per bubble). Break a longer thought into sequential bubbles.
Front-load the human bit. The validating/reflective bubble comes before the informational one. Acknowledge, then explain.
Keep the whole thing short. 300-600 words of bubble text per conversation (standard); welcome conversations up to ~800. The user is scrolling a chat, not reading a document.

Closings¶

Close on a takeaway or a landing, not a fresh question (a question the user can't answer in free text just dangles). Two good shapes:

Commitment-light: "That's tonight's. Same time tomorrow."
Acknowledgement: "You showed up on a hard night. That counts." (earned, understated -- never "You've got this!", which is banned).

For a night slot, add a one-line outlook to tomorrow before the landing — the cliffhanger that earns the next open ("Tomorrow we start the body scan. Rest first."). One short bubble, programme-framed.

The forward outlook is a NIGHT-only device. Mornings carry a today-focus (arc bridge), but afternoon/teach slots must NOT trail the next slot or the night ("tonight we set up for sleep…") — they close on their own takeaway. One forward-look per day, owned by the night; a second one (from the afternoon) over-signposts and dilutes the night's hook.

Ending a conversation (terminators) — you do NOT need a `complete` chip¶

A conversation (and every tag_match/profile_match branch, incl. _default) must end on one of five authored shapes — the contract is in naluma-app/openspec/specs/coach/spec.md ("Authored conversation terminator contract"), enforced by conversation_lint.py:

a reply_chip_row with a complete chip (the only chip-row that ends a path);
a teaser_card (e.g. the baseline check-in card);
a session_card;
a sound_card (optional offer — a night wind-down's closing album);
a final received_bubble (a closing message ends the conversation).

A closing message or a card terminates — do not add a complete chip just to "finish." Reach for the complete chip only when the content wants an explicit tap ("Start it now"/"Not tonight"). Most mornings end on their closing line; most teach/night slots end on the card or the wind-down. Invalid endings: a bare tag_match/profile_match (must rejoin), a day_divider, or a chip row whose chips are only noop/navigate:*/check-in:open.

Morning slots: deliver value, vary the mode¶

A morning that is only a "how was your night?" check-in, with a reflective reply and no consequence, goes hollow within days and gets skipped. The morning's job is to hand the user one small, real, useful thing — and to vary what that is so it never feels like a fixed ritual. Three rotating modes:

Mode	What it gives the user	When
Intention / lens (default)	A concrete thing to notice or try today, tied to the day's theme ("today, catch one moment the sound slipped into the background — you don't have to make it happen"). Value = a frame for the day. Ends on the message.	Most mornings
Micro-practice	A short optional tool to start grounded — a one-minute breath, or a sound to put on. Value = an actual tool. Ends on the card.	Some mornings (esp. Relaxation phase)
Check-in with consequence	A state check-in that leads somewhere — the branch offers a relevant tool or shifts the focus. Never reflection-only; never every day.	Occasional (a week/phase start, a sleep-themed stretch)

The "useful thing" should track the phase: Foundation → an intention/reframe; Relaxation → a micro-practice; Cognitive → a noticing experiment; Acceptance → a presence/values prompt; Consolidation → a reflection/plan. That gives built-in variety and phase-fit. The morning still opens with the arc bridge (programme look-back + today-focus), but the check-in is no longer the default morning shape — value is. A morning that takes one tap and gives nothing back is the thing to avoid.

2. Embedding exercises smoothly¶

The single most transferable principle from the evidence -- and one that does apply to a scripted system -- is MI's "righting reflex" warning (high confidence — SAMHSA TIP 35): do not jump straight to the fix. A session card dropped in cold, before any context, is the most common way a conversation skips the coaching (standard, element 4).

The order that works:

Name the difficulty first (Focusing). The user just told you their state via a chip. Reflect it.
Give the rationale (Evoking). Say why this helps, concretely and mechanistically -- not "this will help you relax" but "this targets the nervous system's threat response, not the sound itself." Specific before generic (voice.md principle 2).
Then offer the card (Planning). By now the user has a reason to want it.

Invitation, not command (and the honest limit)¶

Self-determination and autonomy-supportive framing suggest offering rather than ordering: "If you want, here's a three-minute one" beats "Do this breathing exercise." Caveat worth stating plainly: the evidence that a bare word-swap ("should" → "could") changes outcomes is weak/unverified (PMC6393822 found no significant effect from word-level swaps alone). Autonomy support has to be substantive -- a real rationale plus a genuine sense of choice -- not a cosmetic verb change. In a chip system, genuine choice can be built in: offer a reply_chip_row like "Start it now" / "Maybe later" so declining is a first-class option, not a dead end.

Read or listen — show both cards, let the user pick¶

Most psychoeducation exists in two formats: an insight (read, ~1–2 min) and its -audio audioGuided twin (narrated, ~2–4 min) — 66 of 68 insights have a twin. Where both exist, offer BOTH cards back-to-back so the user picks format on the spot. No interim "Read it / Listen" chip — the two cards are the choice. Frame it once, lightly, as one thing two ways ("anything I point you to, you can read or listen to — whichever suits"), so it never reads as "do both." A session_card for the -audio twin uses session_id: audioGuided, session_slug: <slug>-audio.

Only where the twin exists. A few topics are insight-only (e.g. understanding-tinnitus-types) and the relaxation/breathing/meditation sessions are audio-only — show the one format there.
A read+listen pair counts as ONE content offer against the card budget. Keep a conversation to ~3 offers; in a dense multi-branch conversation, doubling every offer makes a wall of cards — pair the hero offer and keep secondary reference offers single-format (read) so the card count stays digestible.

The post-exercise debrief — what we can and can't do¶

A live coach debriefs after an exercise ("how was that?"). We cannot -- the session player hands control back to the app, and we have no free text to process a real answer. Be honest about this gap. What we can do:

Author a short return bubble the conversation can show after a session, if the flow supports re-entry (check the coach spec for what's actually wired).
Use a chip-based mini check ("That helped" / "Not tonight") only if a real tag_match branch exists for each option. Never ask a debrief question with no authored branch behind it -- a question the system can't answer is worse than no question.

3. Dos and don'ts¶

These are microcopy-level rules. Confidence varies; it's marked. The "don'ts" are the documented failure modes that drive disengagement.

Do¶

Validate before you redirect. Acknowledge the state before introducing a technique. In a distress branch this is non-negotiable (standard, element 3). (Credibly sourced, unverified this pass — arXiv 2602.22775, JMIR e69709.)
Validate and engage -- don't let validation be the whole turn. A documented failure of chatbots is over-validating while under-asking (therapists drew out elaboration ~2× as often; chatbots made ~2× the suggestions) (JMIR e69709, unverified this pass). Our version: after the reflective bubble, move -- to a rationale, a card, a focus -- don't stack three "that sounds hard" bubbles.
Be specific to the programme stage. A Week-3 conversation speaks from Week-3 knowledge. Generic, context-free copy reads as canned and is a named top failure mode (JMIR e69709, arXiv 2504.18932, unverified this pass).
Write the hard branch as carefully as the easy one. The user having a bad night is the user who most needs the conversation to land.
Keep reading level low. Grade 8 for bubbles, Grade 6 for chips (standard).

Don't¶

The five documented relational failure modes (arXiv 2602.22775, unverified this pass -- but they line up exactly with voice.md and the existing standard, so treat as working rules):

Validation failure -- dismissing or skipping past the user's stated state.
Minimisation -- "it's not that bad," "at least…". Reducing the significance of distress.
Toxic positivity -- forced optimism against the user's actual state. The single most on-brand-to-avoid failure: voice.md principle 4 and the banned "You've got this!" / "wellness journey" patterns are this exact failure mode.
Distress mishandling -- jumping to a fix (righting reflex), or under-reacting to a high-distress chip.
Autonomy violation -- commanding rather than offering; no "later"/"no" path.

Plus two Naluma-specific ones:

Asking what we can't hear. "How are you feeling?" as free-form, or any question with no tag_match branch behind it. The chip row is the question.
Repetition. Perceived repetition and "it doesn't understand me" are the dominant disengagement drivers for chat coaches (Frontiers 2022, unverified this pass). With a finite script and repeat cadences, repetition is our highest structural risk -- vary openers across slots, and never reuse the same reflective bubble verbatim across conversations the same user will see in one week.
Asserting unearned memory. A look-back that says the user did something ("yesterday you practised…") — the app has no completion signal, so for a user who skipped yesterday this is the exact "it doesn't understand me" failure. Reference the programme or what a prior conversation offered, never the user's action.

4. Rules for engaging conversations (and the dropout problem)¶

Digital tinnitus programmes lose a lot of users -- the standard cites 21-51% dropout (vault note #599), and the coach conversation is the primary thing standing between a user and that exit. What the evidence says makes a conversational coach worth opening, filtered through what a scripted system can deliver:

Proactive, regular initiation + follow-up creates accountability. Being "remembered" was a top-valued feature (high confidence — Fitzpatrick 2017). We deliver the initiation via push notification + the daily slot. We deliver follow-up by referencing the programme arc ("night three of the breathing week"), which we know. We cannot follow up on free-text goals the user set -- so don't write copy implying we remember things we don't store.
Personalisation to what we actually know. Slot, programme week, phase, and prior chip selections (within spec) are real personalisation levers. Use them. They are our honest substitute for the free-text personalisation Woebot/Wysa get.
Predictability is comfort; sameness is death. A reliable daily shape (open → check in → focus → offer → close) is reassuring. The same words every day are the repetition failure (§3, don't #7). Keep the skeleton constant, vary the flesh.
A transparent, consistent coach persona. See §5.
Earned lightness, not cheer. Progress acknowledgement works when rare and understated (voice.md principle 4). Cheerfulness on a hard day is the toxic- positivity failure.

Writing the push hook¶

The push is the initiation — it must be worth tapping AND honestly preview THIS conversation. Six rules (grounded in voice.md):

Specific before generic (voice.md's strongest asymmetry). The test: could this push belong to any other day? If yes, rewrite. ✅ "A two-minute grounding before the day starts" ❌ "Your morning check-in is ready."
Honest preview, no clickbait / curiosity-gap. Promise only what the conversation delivers; never imply a cure; never imply we remember a free-text goal we don't store (§4 / Fitzpatrick 2017).
Initiation, not summary. Frame the value of the next two minutes — an invitation to open, not a recap.
Inherit the slot + week register. Morning present-not-cheerful; night quiet; spike branch warm-authoritative.
User-state value, not app demand. ❌ "Time for your daily session" ✅ "A short grounding to steady the morning."
No toxic positivity / wellness clichés; personalize only to slot / programme week / phase.

Mechanical floor (≤140, one sentence, no em dash, no banned/generic phrase) is enforced by conversation_lint.py; fidelity + engagement are scored by review --conversation.

Honest note on the bond evidence¶

Woebot (3.84/5) and Wysa (3.98/5) reached therapeutic-bond scores comparable to human CBT, within ~5 days (high confidence — JMIR e27868, Frontiers 2022). Both lines of evidence attribute the bond substantially to free-text capacity. We should be clear-eyed: a scripted, chip-only coach is less likely to form that depth of bond, and the studies don't tell us how much a no-free-text coach can achieve. Our realistic target is trust and reliability (a coach that shows up, gets the hard nights right, and never patronises), not the parasocial "fun little dude" bond Woebot users reported. Treat bond as a hypothesis to validate (§7), not a promise.

5. Persona — applying the evidence to our coach¶

The best-documented persona finding is genuinely useful to us (high confidence — Darcy 2021, PMC8074987):

Be transparently non-human and non-therapist. Woebot deliberately presented as "an archetypal robot," explicitly "not a human or a therapist but a guided self-help coach," to avoid the uncanny valley -- and users still bonded with it. We should not write the coach pretending to be a person. (This also keeps us honest: it isn't one.)
Pair any limitation with warmth. Woebot referenced its own limits alongside empathic statements. For us: where the coach can't do something, say so kindly, and signpost (a real clinician, the SOS path) rather than overreaching.
Singular and consistent. First-person "I" (the coach), second-person "you," no institutional "we" (standard + voice.md). Every bubble should sound like the same person across the whole 12 weeks.

What we don't copy: Woebot's heavy emoji/gif reinforcement and jokey register sit against the Naluma voice ("firm warmth, like a good therapist -- not a meditation app"). Lightness is earned and rare here, not a constant texture.

6. What competitors teach¶

Confidence labels matter here. The Woebot/Wysa rows are vote-verified, peer-reviewed. The tinnitus-app rows are documented in a named source but were not independently verified in our research pass (rate-limiting cut the verification short) -- treat as competitive intel to sanity-check, not as established fact.

App	Type	Pattern worth noting	Steal / Avoid / Can't	Confidence
Woebot	Free-text CBT chatbot	Daily check-in opens with context + mood; mood-tailored short content; transparent-robot persona; proactive prompting + "it remembered me"	Steal: check-in-first structure, transparent persona, proactive follow-up. Can't: mood-contingent free-text replies. Avoid: gif/joke-heavy register.	High
Wysa	Free-text CBT/DBT chatbot ("penguin") + SOS flow	Free-text bond comparable to human CBT; dedicated crisis/SOS path	Steal: a clear distress/SOS path. Can't: the free-text bond mechanism itself.	High (bond); SOS flow unverified
Oto	Tinnitus CBT app	Reportedly ~52 sessions on a habituation track (~27-day suggested pace, 1-3/day); blends education + CBT/ACT + relaxation + sound therapy; explore screen by goal (Calm/Focus/Learn/Sleep)	Steal: goal-categorised library; education-before-technique sequencing. (Closest real analogue to Naluma.)	Documented, unverified
Beltone Tinnitus Calmer	Sound-therapy app (not a coach chat)	Sound + relaxation on a habituation rationale; usage visualised as "bubbles" sized by frequency	Steal: making the user's own usage visible. Note: not a conversational model.	Documented, unverified
Tinnibot	Tinnitus iCBT chatbot	~10 min/day over 8 weeks; CBT + mindfulness + soundscapes; reported TFI improvement 42%→64% at 16-wk follow-up	Steal: short bounded daily dose. Verify before citing the efficacy numbers anywhere.	Documented, unverified

Context note: Woebot Health wound down its consumer app in 2025 (reported: founder cited AI moving faster than regulators) (secondary sources — STAT, MobiHealthNews; unverified this pass). The design patterns above are 2017-2021 references; they're craft lessons, not descriptions of a currently-shipping product.

Cross-cutting lesson: the apps with the strongest bond evidence got it from free text we don't have. Our edge isn't conversational depth -- it's specificity to the tinnitus experience and to the programme stage, which a generic mental-health bot can't match. Lean into that.

7. Validation hooks — how we'll know this works¶

This guide is grounded in evidence about free-text coaches and adapted, by judgement, to a scripted one. That adaptation is a set of hypotheses, not proven facts. Once conversations ship, validate against PostHog (the analytics taxonomy is owned by naluma-analytics):

Conversation completion rate by slot and programme week -- where do users drop mid-conversation?
Chip-branch distribution -- are the "hard" branches actually being selected? (If no one ever taps the distress chip, either the cohort is unusually well or the chip is mislabelled.)
Session-card tap-through from within conversations vs. cold from the library -- does embedding-after-motivation (§2) actually lift starts?
Day-2 / Day-7 return against conversation exposure -- the retention signal the whole programme rides on.

Until we have that data, label confident-sounding claims in review as what they are.

8. Worked example — Week 3, night slot, breathing branch¶

Context: Week 3 (Relaxation & Basics phase -- breathing/PMR). Night slot. The user has already met the breathing technique earlier in the week, so this conversation reinforces, it does not introduce. The check-in branches three ways; the annotated branch below is the hard one.

[received_bubble] Night three of the breathing week.

[received_bubble] How's the ringing right now?

Open (Engaging): two short bubbles, present not cheerful, anchored in the programme stage we actually know. No "Good evening!". The second bubble is the check-in prompt -- and the reply_chip_row below is the only "answer" we'll get, so it carries the load.

[reply_chip_row] · "Loud. Hard to settle." · "There, but I'm okay." · "Quieter than last night."

The chips cover the real range, hard end included (standard, element 2). Grade-6 language, 3-8 words each, none implying which is "correct."

Branch — tag_match on "Loud. Hard to settle.":

[received_bubble] A loud night. The evenings are when it's hardest for a lot of people -- you're not doing anything wrong.

Reflect + focus (Focusing): mirrors the chip back before doing anything (standard, element 3). Validates without minimising. Note the honest reframe, not "don't worry."

[received_bubble] When it's loud like this, the body tightens up to brace against it. That bracing is its own kind of noise.

[received_bubble] The breathing one works on that -- the bracing, the threat response -- not on the sound itself. Three minutes.

Evoke (Evoking): connects this state to why the technique helps, mechanistically and specifically (voice.md principle 2). It does not oversell -- it's explicit that it doesn't touch the sound. This is the righting-reflex discipline: rationale before the offer.

[session_card: breathing — "Settling the bracing" (3 min)]

[reply_chip_row] · "Start it now" · "Not tonight"

Offer + plan (Planning): card comes only after the motivation. The chip row builds in a genuine "no" path -- autonomy as structure, not just wording (§2). Both chips need a real authored branch.

[received_bubble — tag_match "Not tonight"] Fair. It'll be here tomorrow. Resting counts too.

Close: honest, no guilt-trip, no "you've got this." Lands the moment. "Resting counts too" is earned, understated acknowledgement (voice.md principle 4) -- not cheer.

Why the whole thing works: it opens in the programme we know, captures state with a chip set that includes the hard option, reflects before redirecting, earns the session with a specific mechanism, offers rather than commands, and closes without patronising. Every move is one a scripted system can actually make -- no faked memory, no free-text reflection, no question we can't hear the answer to.

Failure version of the same beat (for contrast):

❌ [received_bubble] Good evening! 🌙 Ready to continue your wellness journey? ❌ [session_card] ❌ [received_bubble] You've got this! Find your calm and let the stress melt away.

Fails on: banned opener ("Good evening!"), toxic positivity ("you've got this", "wellness journey", "melt away"), card before any motivation (skips the coaching), no check-in so no idea what state the user is in, emoji register that's off-voice.

Sources¶

Confidence reflects our research pass (two deep-research runs, 2026-06-11); several verifications were cut short by API rate-limiting and are marked "unverified this pass" -- meaning not yet vote-confirmed, not disproven. Re-verify before treating any unverified claim as settled, and cite Miller & Rollnick (2012, MI 3rd ed.) directly alongside the TIP if MI claims are ever externalised.

High confidence (vote-verified, peer-reviewed): - SAMHSA TIP 35, Enhancing Motivation for Change — MI four-process arc, righting reflex, reflection-to-question ratio. https://www.ncbi.nlm.nih.gov/books/NBK571068/ - Fitzpatrick, Darcy & Vierhile 2017, JMIR Mental Health (Woebot RCT) — check-in structure, mood capture, proactive prompting, accountability. https://pmc.ncbi.nlm.nih.gov/articles/PMC5478797/ - Prochaska et al. (W-SUDs), PMC8074987 — daily text-conversation unit; non-therapist coach persona. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8074987/ - Darcy et al. 2021, JMIR Formative Research (N=36,070) — Woebot bond 3.84/5; transparent-robot persona. https://formative.jmir.org/2021/5/e27868 - Beatty et al. 2022, Frontiers in Digital Health (Wysa, N=1,205) — bond 3.98/5; free-text capacity as bond driver. https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2022.847991/full

Credibly sourced, unverified this pass (dos/don'ts & failure modes): - arXiv 2602.22775 — five relational failure modes (validation failure, minimisation, toxic positivity, distress mishandling, autonomy violation). - JMIR e69709 (2025) — chatbots over-validate / under-inquire; generic advice as a top failure. https://mental.jmir.org/2025/1/e69709 - arXiv 2504.18932 — generic one-size-fits-all advice as a failure mode; communicate limitations. - PMC6393822 — autonomy-supportive word-swaps alone showed no significant effect (must be substantive). https://pmc.ncbi.nlm.nih.gov/articles/PMC6393822/

Documented, unverified this pass (competitor / tinnitus apps): - Oto: PMC12575424; https://soundrelief.com/blogs/oto-the-app-thats-turning-heads-in-tinnitus-care - Beltone Tinnitus Calmer reference guide (beltone.com). - Tinnibot: Frontiers in Audiology & Otology 2023, fauot.2023.1302215. - Woebot consumer wind-down 2025: STAT, MobiHealthNews (secondary).

Internal evidence base (always reconcile against these -- they are our ground truth, the above is supporting craft): coach-conversation.md, voice.md, docs/protocols/ (ACT/CBT/TRT), and the session-content evidence base.