naluma-app-editor-copilot — E2E Test Report¶
Date: 2026-06-09 · Plugin: naluma-app-editor-copilot@naluma-ai ·
Method: one real content item produced end-to-end through every shipped skill,
each handoff checked against its authoritative oracle (Directus schema, DB CHECK
constraints, editorial standard, ai-patterns).
Subject: insight session "The Habituation Model" (PSY-03) — chosen for real
source material, the longest skill chain, and an illustrate dependency.
Batch label: copilot-test-2026-06-09.
Outcome in one line¶
The text production chain works well (research → write-session → review → humanize produce schema-valid, well-grounded, on-voice content); the publish/media path is architecturally broken (images + audio must go direct to R2, not via Directus) and the offline gates under-validate (editorial placement fields aren't checked against the DB constraints, producing a false green). No dev rows were written (stopped at the publish gate by choice).
Per-step results¶
| # | Skill | Result | Notes |
|---|---|---|---|
| 1 | naluma-context | ✅ PASS | paths resolve; --verify green |
| 2 | research | ✅ PASS | dossier grounded in real PMIDs (Umashankar 2025, Gold 2021, Henry 2023, Thompson 2017); no fabrication; flagged the PMC8632517 gap |
| 3 | write-session | ✅ PASS (content) | content payload validates vs insight.schema.json; 4 cards in-band; correct flags. But editorial placement fields wrong (see I/J) |
| 4 | illustrate | ✅ PASS | enforced category gate; 2 distinct scenes; in-palette square WebP; provenance written |
| 5 | review | ✅ PASS | advisory, read-only, dossier-traced; factuality 88 / style 95; sidecar written |
| 6 | humanize | ✅ PASS | idempotent (0 banned patterns; prose byte-identical) |
| 7 | pipeline | ⚠️ PARTIAL | routing correct; --item path bug (G) crashes documented usage |
| 8 | publish | ⛔ BLOCKED | offline content-validation passed, but parent-row vocab unchecked (K), program_week/pain_points invalid as drafted (I/J), and media→R2 transport is wrong (M) |
Findings¶
| ID | Finding | Class | Repo | Sev | Issue |
|---|---|---|---|---|---|
| A | research SKILL.md says naluma-context "not yet shipped" (it shipped, PR #116) |
stale-instruction | marketplace | Low | folded into cleanups |
| B+C | session-insight.md mandates grounding in PMC8632517/evidence-base; research educational mode forbids non-article sources → conflicting canon |
cross-repo inconsistency | app-content ↔ marketplace | Med | filed |
| D | illustrate correctly refused on unconfirmed category |
— (positive) | — | — | — |
| E | OPENAI_API_KEY has no documented local provisioning (Fly-only); blocks illustrate locally |
env/doc gap | naluma-root | Med | filed |
| F | Flat assets/illustrations/ + split research/+sessions/ scatter one item across 3 trees → colocate per-item folders |
IA improvement | app-content (+marketplace) | Med | filed |
| G | pipeline.py --item not resolved relative to --root → documented usage crashes (absolute path works) |
bug | marketplace | Low-Med | filed |
| H | publish accepts status: humanized, bypassing the final editor sign-off the lifecycle mandates |
consistency | marketplace | Low | folded into cleanups |
| I | write-session emits program_week: null; schema is NOT NULL DEFAULT 1 (1 = always available) → should default to 1 |
schema-drift | marketplace | Med | filed (w/ J) |
| J | write-session/dossier emit pain_points+segment_affinity as free-text, not the DB controlled vocab → CHECK rejects the upsert |
bug/schema-drift | marketplace | Med-High | filed (w/ I) |
| K | publish validate.py doesn't validate parent-row editorial fields vs DB CHECK constraints → false green, fails server-side |
gate-gap | marketplace | Med-High | filed |
| L | publish curl uses $DIRECTUS_URL/$DIRECTUS_TOKEN; actual env names are NALUMA_DIRECTUS_MCP_URL/_TOKEN |
doc bug | marketplace | Low | folded into cleanups |
| M | publish media transport wrong: image and audio must be S3-PUT direct to the R2 bucket + a content.assets/audio_assets row (r2_key), not Directus /files. Reference: naluma-app/tools/content/seed-dummy |
architecture | marketplace (+docs) | High | filed |
| N | Directus MCP role can't read schema (directus_collections FORBIDDEN) → publish must rely on committed snapshot.yaml/JSON schemas, not MCP introspection |
infra/doc | marketplace/directus | Low | folded into cleanups |
| — | App-side: relax content.sessions.pain_points so unmapped content can be unset |
content-model | naluma-app | Med | #624 (filed) |
What we did NOT do¶
- No dev write — stopped at the publish gate; no
content.sessionsrows, no R2 objects, nothing to clean up in admin. - No coach conversation run (the second, gotcha-heavy case) — deferred; the session loop came first.
- No real R2 upload — that's the publish-skill redesign (Issue M), not something to improvise on the paid bucket.
Artifacts produced (working files, uncommitted)¶
research/insight/the-habituation-model.md— the dossiersessions/insight/the-habituation-model.md— a valid, publishable insight draft (status: humanized, scores 88/95)sessions/insight/the-habituation-model.review.md— review sidecar (gitignored)assets/illustrations/sessions/the-habituation-model.webp— hero image (gitignored)~/.naluma-env/openai.env— local OpenAI key (created during the run; provisionsillustrate)
Recommendation¶
The text chain is production-ready. Prioritise the publish redesign (M, High) —
mirror seed-dummy's direct-R2 + content.assets pattern for image and audio — then
the validation gap (K) and write-session placement (I/J), which together make
publish currently produce a row the DB rejects. The colocation refactor (F) is a good
quality-of-life follow-up once publish is sound.
Decisions (2026-06-09 discussion) → issues adjusted¶
- Q1 — grounding. Insights (educational mode only) use article-as-ground-truth: ground in the finished,
quality_gates_passedarticle, inherit its citations verbatim, no PMID re-verification;reviewfactuality becomes article-fidelity. Technique mode unchanged. → naluma-app-content#19 (standard rewrite) + naluma-ai-marketplace#128 (research+review contracts). - Q2 — card guidance. Adopt 3–8 cards (was 3–5); harden in
insight.schema.json(minItems 3 / maxItems 8) +review.pycard-count check; validate the 40–80 word band on iPhone SE. → naluma-app-content#21. - Q3 — read-more link. Add a conditional "Read the full article" link as a learn-more/footer affordance (in-app browser), not a terminal card; needs a source-article field on the content model + app UI + copilot carry-forward + analytics event. → naluma-app#625.