Diagram-Engine — PRD
Size M · P1 · Developer tool Status: ✅ M1 shipped (public playground + 4-section docs) — see Implementation for build details Originally planned: 3 weekends / Actual: 11 phases over ~6 weeks part-time
1. Problem
LLMs are increasingly asked to emit diagrams inline in chat: architecture sketches, sequence flows, state machines, dependency graphs. The current tool set — Mermaid, PlantUML, Graphviz/DOT, D2 — was all designed with humans as the primary author. When an LLM emits them, three failure modes recur:
- Whitespace and bracket sensitivity — a missing
]or off-by-one indent produces a silent broken render. - Ambiguous error messages — “unexpected token at line 4 col 17” is unactionable for an LLM doing one-shot generation.
- Layout drift — same input across versions produces different output; LLMs can’t anchor on what worked last time.
Pain: Inline-in-chat diagrams from LLMs fail to render or render badly often enough that users have learned to ask for a textual description instead. The medium has been written off as unreliable.
Why now: The frontier of agent UX is “show, don’t tell.” A diagram engine that an agent can drive with 94%+ first-shot success unlocks visual reasoning in chat without an extra round-trip.
2. Goal & Success Metrics
Goal: An LLM, given a brief like “draw a 4-service architecture with a DB and a cache,” emits Diagram-Engine source that renders to a clean, readable SVG on the first attempt, in under 200 tokens.
Metrics — actual achieved:
| Metric | Target M1 | Achieved | Note |
|---|---|---|---|
| First-shot render success (LLM-emitted) | ≥85% | 94% | Measured on a 50-prompt mixed eval (architecture / sequence / state) |
| Median tokens per diagram | ≤250 | 180 | vs Mermaid ~350–500 for the same shape |
| Playground p50 render | <250 ms | 120 ms | parse → layout → SVG, client-side |
| Validator-error-to-repair rate | ≥75% | 88% | LLM fixes the diagram on one retry given the validator error |
| Determinism | byte-identical | ✅ | Same input → identical SVG bytes |
3. User journey
Primary user is the LLM, not the human. The human reads the output; the LLM authors the input.
- A user asks Claude (or any LLM): “Show me how OAuth PKCE works as a sequence diagram.”
- The LLM emits a short Diagram-Engine source block inline.
- The chat client (or a downstream renderer) detects the block, calls the Diagram-Engine parser+renderer (client-side, no backend).
- SVG appears inline; the user sees the diagram without leaving the conversation.
- If the source had a validator error, the validator returns a named-token error + an expected-shape hint + a working example, and the LLM repairs the source on a single retry.
The public playground at the project site is the adoption surface for humans — paste a snippet, see the render, copy the URL.
4. Scope (MoSCoW) — final
Must — DONE:
- ✅ Recursive-descent parser for a single, locked grammar
- ✅ Deterministic layout engine (no floating-point drift, no theme files)
- ✅ SVG renderer for the 12 base shapes (rect, round-rect, cylinder, queue, cloud, actor, lifeline, diamond, hex, ellipse, doc, comment)
- ✅ Validator with one-shot repairable errors (named token + expected shape + working example)
- ✅ Architecture / sequence / state diagram support
- ✅ Public playground (Astro + CF Pages)
- ✅ 4-section docs (syntax / examples / layout rules / API)
Should — DONE:
- ✅ Auto-clustering (R7) — nodes connected by ≥3 edges form an implicit group
- ✅ Lane width inheritance (R12) — vertical lanes size from widest child + 24px
- ✅ Edge label overlap avoidance (R19) — re-route or shorten before overlap
- ✅ Visual style rules R31–R37 — single coherent aesthetic without theming
- ✅ Custom inline SVG icon support — late add in P10
Could — partial:
- ⏸️ npm publish — deferred to M2, no external consumer yet
- ⏸️ Inline-in-chat plugin (Claude / ChatGPT integration) — deferred to M2
- ⏸️ Embeddable web component (
<diagram-engine>) — deferred to M2 - ⏸️ Dark mode — playground only, not part of the deterministic render output
Won’t (M1) — kept out:
- Multiple dialects or syntax flavors (one grammar, one render path)
- Server-side rendering (client-side only; no backend)
- Plugins / themes / CSS overrides (every visual property is set in the source text)
- Diff/animation between diagrams (out of scope; static SVG only)
- Manual layout hints (
rank,position, etc.) — auto-layout must be good enough on its own
5. Architecture (final)
Single render path: text → AST → layout → SVG. No alternate paths, no plugins. Parser, layout engine, and renderer are three TypeScript modules shipped together as one bundle. See Architecture for the pipeline diagram.
6. Tech Stack — final choices
| Layer | Original spec | Implemented | Reason for change |
|---|---|---|---|
| Language | TypeScript | TypeScript ✓ | Static types matter for AST/visitor patterns |
| Parser | Tree-sitter | Hand-written recursive descent | Tree-sitter grammar churn would have leaked into the LLM-facing surface; hand-roll keeps grammar locked |
| Layout | ELK.js | Custom layout | ELK is powerful but its output isn’t byte-deterministic across versions, breaking constraint #3 |
| Renderer | HTML5 Canvas | SVG | SVG is scalable, copy-pasteable, and diffable; canvas would force a raster output |
| Site | Next.js | Astro | Static-first, lighter, faster cold-start on CF Pages |
| Hosting | Vercel | Cloudflare Pages | Free, fast, plays well with the Cloudflare Tunnel preview workflow |
| Distribution | npm package | Playground first, npm deferred | Real adoption comes from people pasting into the textarea, not reading a README |
Cost posture: Zero recurring cost. Astro site is static; CF Pages free tier; client-side render means no compute bill. Custom domain via the existing Cloudflare zone.
7. Milestones — actual
| Phase | What shipped |
|---|---|
| P1 | Grammar lock + recursive-descent parser + AST shape |
| P2 | Layout engine v0 — left-to-right flow only |
| P3 | SVG renderer + 12 base shapes |
| P4 | Validator + error catalog (named-token errors, expected-shape hints, working-example snippets) |
| P5 | Sequence diagrams (lifelines, async messages, activation bars) |
| P6 | State machines + self-loops + initial/final pseudostates |
| P7 | Auto-clustering (R7) + lane width inheritance (R12) |
| P8 | Edge routing + label-overlap avoidance (R19) |
| P9 | Visual style rules R31–R37 (corner radius, stroke weight, icon padding, color tokens) |
| P10 | Custom inline SVG icon support |
| P11 | Public playground site (Astro) + 4-section docs + CF Pages deploy |
M1 DoD passed:
- ✅ 94% first-shot LLM render success (target 85%)
- ✅ 180 median tokens per diagram (target ≤250)
- ✅ 120 ms p50 render (target <250)
- ✅ Public playground live, byte-deterministic output verified across browsers
8. Cost & Quota
| Item | Free tier? | Actual usage |
|---|---|---|
| Cloudflare Pages | ✅ | Static site, ~few hundred requests/day |
| Cloudflare Tunnel (preview) | ✅ | Used for previewing dev branches |
| Custom domain | ✅ | Existing Cloudflare zone |
| Build minutes | ✅ | <1 min/build via Astro static export |
No recurring spend. No backend, no database, no auth, no rate limit needed.
9. Risks & open questions — outcomes
Original risks:
- “Custom layout engine will be too hard” → mitigated by 11-phase incremental approach; R31–R37 visual rules added late only after structural rules were stable.
- “Determinism will be hard to maintain” → resolved by avoiding floating-point math in layout decisions; all coordinates are integer.
- “LLM eval set is subjective” → mitigated by fixing a 50-prompt held-out eval before tuning the grammar.
New risks (M2 backlog):
- Without an embed surface (web component / npm), adoption is gated on people finding the playground.
- The grammar is locked, but any new shape (e.g. ERD entity-relationship) reopens the parser; need a backward-compatibility policy before M2.
Original open Qs:
- Q1: Should we ship a Mermaid → Diagram-Engine transpiler? → ❌ dropped, the syntaxes are too different and the conversion would re-introduce Mermaid’s footguns.
- Q2: Theming support? → ❌ dropped, conflicts with constraint #5 (no hidden state).
- Q3: Should the playground support sharing via URL? → ✅ yes, source is encoded into the URL fragment (no backend).
10. Definition of Done
M1 Done: ✅ Public playground live, 4-section docs published, 94% first-shot success on the LLM eval set, byte-deterministic output, custom-icon support shipped.
M2 Done (distribution):
- ⏳ npm package published (
@diagram-engine/core) - ⏳ Embeddable web component (
<diagram-engine>) - ⏳ At least one external consumer (Claude skill / VS Code extension / blog post) using the engine in production
- ⏳ Backward-compatibility policy for grammar additions
See also
- Implementation — DSL grammar, layout rule catalog, perf numbers, repro steps
- Architecture — pipeline diagram, error model, deploy topology
- Notes — chronological decision log