naluma-app-editor-copilot — E2E Test Report¶

Date: 2026-06-09 · Plugin: naluma-app-editor-copilot@naluma-ai · Method: one real content item produced end-to-end through every shipped skill, each handoff checked against its authoritative oracle (Directus schema, DB CHECK constraints, editorial standard, ai-patterns). Subject: insight session "The Habituation Model" (PSY-03) — chosen for real source material, the longest skill chain, and an illustrate dependency. Batch label: copilot-test-2026-06-09.

Outcome in one line¶

The text production chain works well (research → write-session → review → humanize produce schema-valid, well-grounded, on-voice content); the publish/media path is architecturally broken (images + audio must go direct to R2, not via Directus) and the offline gates under-validate (editorial placement fields aren't checked against the DB constraints, producing a false green). No dev rows were written (stopped at the publish gate by choice).

Per-step results¶

#	Skill	Result	Notes
1	naluma-context	✅ PASS	paths resolve; `--verify` green
2	research	✅ PASS	dossier grounded in real PMIDs (Umashankar 2025, Gold 2021, Henry 2023, Thompson 2017); no fabrication; flagged the PMC8632517 gap
3	write-session	✅ PASS (content)	`content` payload validates vs `insight.schema.json`; 4 cards in-band; correct flags. But editorial placement fields wrong (see I/J)
4	illustrate	✅ PASS	enforced category gate; 2 distinct scenes; in-palette square WebP; provenance written
5	review	✅ PASS	advisory, read-only, dossier-traced; factuality 88 / style 95; sidecar written
6	humanize	✅ PASS	idempotent (0 banned patterns; prose byte-identical)
7	pipeline	⚠️ PARTIAL	routing correct; `--item` path bug (G) crashes documented usage
8	publish	⛔ BLOCKED	offline content-validation passed, but parent-row vocab unchecked (K), `program_week`/`pain_points` invalid as drafted (I/J), and media→R2 transport is wrong (M)

Findings¶

ID	Finding	Class	Repo	Sev	Issue
A	`research` SKILL.md says naluma-context "not yet shipped" (it shipped, PR #116)	stale-instruction	marketplace	Low	folded into cleanups
B+C	`session-insight.md` mandates grounding in PMC8632517/evidence-base; `research` educational mode forbids non-article sources → conflicting canon	cross-repo inconsistency	app-content ↔ marketplace	Med	filed
D	`illustrate` correctly refused on unconfirmed `category`	— (positive)	—	—	—
E	`OPENAI_API_KEY` has no documented local provisioning (Fly-only); blocks `illustrate` locally	env/doc gap	naluma-root	Med	filed
F	Flat `assets/illustrations/` + split `research/`+`sessions/` scatter one item across 3 trees → colocate per-item folders	IA improvement	app-content (+marketplace)	Med	filed
G	`pipeline.py --item` not resolved relative to `--root` → documented usage crashes (absolute path works)	bug	marketplace	Low-Med	filed
H	`publish` accepts `status: humanized`, bypassing the `final` editor sign-off the lifecycle mandates	consistency	marketplace	Low	folded into cleanups
I	`write-session` emits `program_week: null`; schema is `NOT NULL DEFAULT 1` (1 = always available) → should default to 1	schema-drift	marketplace	Med	filed (w/ J)
J	`write-session`/dossier emit `pain_points`+`segment_affinity` as free-text, not the DB controlled vocab → CHECK rejects the upsert	bug/schema-drift	marketplace	Med-High	filed (w/ I)
K	`publish` `validate.py` doesn't validate parent-row editorial fields vs DB CHECK constraints → false green, fails server-side	gate-gap	marketplace	Med-High	filed
L	`publish` curl uses `$DIRECTUS_URL/$DIRECTUS_TOKEN`; actual env names are `NALUMA_DIRECTUS_MCP_URL/_TOKEN`	doc bug	marketplace	Low	folded into cleanups
M	`publish` media transport wrong: image and audio must be S3-PUT direct to the R2 bucket + a `content.assets`/`audio_assets` row (`r2_key`), not Directus `/files`. Reference: `naluma-app/tools/content/seed-dummy`	architecture	marketplace (+docs)	High	filed
N	Directus MCP role can't read schema (`directus_collections` FORBIDDEN) → `publish` must rely on committed `snapshot.yaml`/JSON schemas, not MCP introspection	infra/doc	marketplace/directus	Low	folded into cleanups
—	App-side: relax `content.sessions.pain_points` so unmapped content can be unset	content-model	naluma-app	Med	#624 (filed)

What we did NOT do¶

No dev write — stopped at the publish gate; no content.sessions rows, no R2 objects, nothing to clean up in admin.
No coach conversation run (the second, gotcha-heavy case) — deferred; the session loop came first.
No real R2 upload — that's the publish-skill redesign (Issue M), not something to improvise on the paid bucket.

Artifacts produced (working files, uncommitted)¶

research/insight/the-habituation-model.md — the dossier
sessions/insight/the-habituation-model.md — a valid, publishable insight draft (status: humanized, scores 88/95)
sessions/insight/the-habituation-model.review.md — review sidecar (gitignored)
assets/illustrations/sessions/the-habituation-model.webp — hero image (gitignored)
~/.naluma-env/openai.env — local OpenAI key (created during the run; provisions illustrate)

Recommendation¶

The text chain is production-ready. Prioritise the publish redesign (M, High) — mirror seed-dummy's direct-R2 + content.assets pattern for image and audio — then the validation gap (K) and write-session placement (I/J), which together make publish currently produce a row the DB rejects. The colocation refactor (F) is a good quality-of-life follow-up once publish is sound.

Decisions (2026-06-09 discussion) → issues adjusted¶

Q1 — grounding. Insights (educational mode only) use article-as-ground-truth: ground in the finished, quality_gates_passed article, inherit its citations verbatim, no PMID re-verification; review factuality becomes article-fidelity. Technique mode unchanged. → naluma-app-content#19 (standard rewrite) + naluma-ai-marketplace#128 (research+review contracts).
Q2 — card guidance. Adopt 3–8 cards (was 3–5); harden in insight.schema.json (minItems 3 / maxItems 8) + review.py card-count check; validate the 40–80 word band on iPhone SE. → naluma-app-content#21.
Q3 — read-more link. Add a conditional "Read the full article" link as a learn-more/footer affordance (in-app browser), not a terminal card; needs a source-article field on the content model + app UI + copilot carry-forward + analytics event. → naluma-app#625.