Diagram-Engine — PRD

Size M · P1 · Developer tool Status: ✅ M1 shipped (public playground + 4-section docs) — see Implementation for build details Originally planned: 3 weekends / Actual: 11 phases over ~6 weeks part-time

1. Problem

LLMs are increasingly asked to emit diagrams inline in chat: architecture sketches, sequence flows, state machines, dependency graphs. The current tool set — Mermaid, PlantUML, Graphviz/DOT, D2 — was all designed with humans as the primary author. When an LLM emits them, three failure modes recur:

Whitespace and bracket sensitivity — a missing ] or off-by-one indent produces a silent broken render.
Ambiguous error messages — “unexpected token at line 4 col 17” is unactionable for an LLM doing one-shot generation.
Layout drift — same input across versions produces different output; LLMs can’t anchor on what worked last time.

Pain: Inline-in-chat diagrams from LLMs fail to render or render badly often enough that users have learned to ask for a textual description instead. The medium has been written off as unreliable.

Why now: The frontier of agent UX is “show, don’t tell.” A diagram engine that an agent can drive with 94%+ first-shot success unlocks visual reasoning in chat without an extra round-trip.

2. Goal & Success Metrics

Goal: An LLM, given a brief like “draw a 4-service architecture with a DB and a cache,” emits Diagram-Engine source that renders to a clean, readable SVG on the first attempt, in under 200 tokens.

Metrics — actual achieved:

Metric	Target M1	Achieved	Note
First-shot render success (LLM-emitted)	≥85%	94%	Measured on a 50-prompt mixed eval (architecture / sequence / state)
Median tokens per diagram	≤250	180	vs Mermaid ~350–500 for the same shape
Playground p50 render	<250 ms	120 ms	parse → layout → SVG, client-side
Validator-error-to-repair rate	≥75%	88%	LLM fixes the diagram on one retry given the validator error
Determinism	byte-identical	✅	Same input → identical SVG bytes

3. User journey

Primary user is the LLM, not the human. The human reads the output; the LLM authors the input.

A user asks Claude (or any LLM): “Show me how OAuth PKCE works as a sequence diagram.”
The LLM emits a short Diagram-Engine source block inline.
The chat client (or a downstream renderer) detects the block, calls the Diagram-Engine parser+renderer (client-side, no backend).
SVG appears inline; the user sees the diagram without leaving the conversation.
If the source had a validator error, the validator returns a named-token error + an expected-shape hint + a working example, and the LLM repairs the source on a single retry.

The public playground at the project site is the adoption surface for humans — paste a snippet, see the render, copy the URL.

4. Scope (MoSCoW) — final

Must — DONE:

✅ Recursive-descent parser for a single, locked grammar
✅ Deterministic layout engine (no floating-point drift, no theme files)
✅ SVG renderer for the 12 base shapes (rect, round-rect, cylinder, queue, cloud, actor, lifeline, diamond, hex, ellipse, doc, comment)
✅ Validator with one-shot repairable errors (named token + expected shape + working example)
✅ Architecture / sequence / state diagram support
✅ Public playground (Astro + CF Pages)
✅ 4-section docs (syntax / examples / layout rules / API)

Should — DONE:

✅ Auto-clustering (R7) — nodes connected by ≥3 edges form an implicit group
✅ Lane width inheritance (R12) — vertical lanes size from widest child + 24px
✅ Edge label overlap avoidance (R19) — re-route or shorten before overlap
✅ Visual style rules R31–R37 — single coherent aesthetic without theming
✅ Custom inline SVG icon support — late add in P10

Could — partial:

⏸️ npm publish — deferred to M2, no external consumer yet
⏸️ Inline-in-chat plugin (Claude / ChatGPT integration) — deferred to M2
⏸️ Embeddable web component (<diagram-engine>) — deferred to M2
⏸️ Dark mode — playground only, not part of the deterministic render output

Won’t (M1) — kept out:

Multiple dialects or syntax flavors (one grammar, one render path)
Server-side rendering (client-side only; no backend)
Plugins / themes / CSS overrides (every visual property is set in the source text)
Diff/animation between diagrams (out of scope; static SVG only)
Manual layout hints (rank, position, etc.) — auto-layout must be good enough on its own

5. Architecture (final)

Single render path: text → AST → layout → SVG. No alternate paths, no plugins. Parser, layout engine, and renderer are three TypeScript modules shipped together as one bundle. See Architecture for the pipeline diagram.

6. Tech Stack — final choices

Layer	Original spec	Implemented	Reason for change
Language	TypeScript	TypeScript ✓	Static types matter for AST/visitor patterns
Parser	Tree-sitter	Hand-written recursive descent	Tree-sitter grammar churn would have leaked into the LLM-facing surface; hand-roll keeps grammar locked
Layout	ELK.js	Custom layout	ELK is powerful but its output isn’t byte-deterministic across versions, breaking constraint #3
Renderer	HTML5 Canvas	SVG	SVG is scalable, copy-pasteable, and diffable; canvas would force a raster output
Site	Next.js	Astro	Static-first, lighter, faster cold-start on CF Pages
Hosting	Vercel	Cloudflare Pages	Free, fast, plays well with the Cloudflare Tunnel preview workflow
Distribution	npm package	Playground first, npm deferred	Real adoption comes from people pasting into the textarea, not reading a README

Cost posture: Zero recurring cost. Astro site is static; CF Pages free tier; client-side render means no compute bill. Custom domain via the existing Cloudflare zone.

7. Milestones — actual

Phase	What shipped
P1	Grammar lock + recursive-descent parser + AST shape
P2	Layout engine v0 — left-to-right flow only
P3	SVG renderer + 12 base shapes
P4	Validator + error catalog (named-token errors, expected-shape hints, working-example snippets)
P5	Sequence diagrams (lifelines, async messages, activation bars)
P6	State machines + self-loops + initial/final pseudostates
P7	Auto-clustering (R7) + lane width inheritance (R12)
P8	Edge routing + label-overlap avoidance (R19)
P9	Visual style rules R31–R37 (corner radius, stroke weight, icon padding, color tokens)
P10	Custom inline SVG icon support
P11	Public playground site (Astro) + 4-section docs + CF Pages deploy

M1 DoD passed:

✅ 94% first-shot LLM render success (target 85%)
✅ 180 median tokens per diagram (target ≤250)
✅ 120 ms p50 render (target <250)
✅ Public playground live, byte-deterministic output verified across browsers

8. Cost & Quota

Item	Free tier?	Actual usage
Cloudflare Pages	✅	Static site, ~few hundred requests/day
Cloudflare Tunnel (preview)	✅	Used for previewing dev branches
Custom domain	✅	Existing Cloudflare zone
Build minutes	✅	<1 min/build via Astro static export

No recurring spend. No backend, no database, no auth, no rate limit needed.

9. Risks & open questions — outcomes

Original risks:

“Custom layout engine will be too hard” → mitigated by 11-phase incremental approach; R31–R37 visual rules added late only after structural rules were stable.
“Determinism will be hard to maintain” → resolved by avoiding floating-point math in layout decisions; all coordinates are integer.
“LLM eval set is subjective” → mitigated by fixing a 50-prompt held-out eval before tuning the grammar.

New risks (M2 backlog):

Without an embed surface (web component / npm), adoption is gated on people finding the playground.
The grammar is locked, but any new shape (e.g. ERD entity-relationship) reopens the parser; need a backward-compatibility policy before M2.

Original open Qs:

Q1: Should we ship a Mermaid → Diagram-Engine transpiler? → ❌ dropped, the syntaxes are too different and the conversion would re-introduce Mermaid’s footguns.
Q2: Theming support? → ❌ dropped, conflicts with constraint #5 (no hidden state).
Q3: Should the playground support sharing via URL? → ✅ yes, source is encoded into the URL fragment (no backend).

10. Definition of Done

M1 Done: ✅ Public playground live, 4-section docs published, 94% first-shot success on the LLM eval set, byte-deterministic output, custom-icon support shipped.

M2 Done (distribution):

⏳ npm package published (@diagram-engine/core)
⏳ Embeddable web component (<diagram-engine>)
⏳ At least one external consumer (Claude skill / VS Code extension / blog post) using the engine in production
⏳ Backward-compatibility policy for grammar additions

Diagram-Engine — PRD

Diagram-Engine — PRD

1. Problem

2. Goal & Success Metrics

3. User journey

4. Scope (MoSCoW) — final

5. Architecture (final)

6. Tech Stack — final choices

7. Milestones — actual

8. Cost & Quota

9. Risks & open questions — outcomes

10. Definition of Done

See also