← Back to project
● Shipped P1 Size M Developer tool

Diagram-Engine — PRD

Product spec: JTBD, design constraints, scope, milestones, and success metrics for an AI-first text-to-diagram DSL.

Diagram-Engine — PRD

Size M · P1 · Developer tool Status: ✅ M1 shipped (public playground + 4-section docs) — see Implementation for build details Originally planned: 3 weekends / Actual: 11 phases over ~6 weeks part-time

1. Problem

LLMs are increasingly asked to emit diagrams inline in chat: architecture sketches, sequence flows, state machines, dependency graphs. The current tool set — Mermaid, PlantUML, Graphviz/DOT, D2 — was all designed with humans as the primary author. When an LLM emits them, three failure modes recur:

  1. Whitespace and bracket sensitivity — a missing ] or off-by-one indent produces a silent broken render.
  2. Ambiguous error messages — “unexpected token at line 4 col 17” is unactionable for an LLM doing one-shot generation.
  3. Layout drift — same input across versions produces different output; LLMs can’t anchor on what worked last time.

Pain: Inline-in-chat diagrams from LLMs fail to render or render badly often enough that users have learned to ask for a textual description instead. The medium has been written off as unreliable.

Why now: The frontier of agent UX is “show, don’t tell.” A diagram engine that an agent can drive with 94%+ first-shot success unlocks visual reasoning in chat without an extra round-trip.

2. Goal & Success Metrics

Goal: An LLM, given a brief like “draw a 4-service architecture with a DB and a cache,” emits Diagram-Engine source that renders to a clean, readable SVG on the first attempt, in under 200 tokens.

Metrics — actual achieved:

MetricTarget M1AchievedNote
First-shot render success (LLM-emitted)≥85%94%Measured on a 50-prompt mixed eval (architecture / sequence / state)
Median tokens per diagram≤250180vs Mermaid ~350–500 for the same shape
Playground p50 render<250 ms120 msparse → layout → SVG, client-side
Validator-error-to-repair rate≥75%88%LLM fixes the diagram on one retry given the validator error
Determinismbyte-identicalSame input → identical SVG bytes

3. User journey

Primary user is the LLM, not the human. The human reads the output; the LLM authors the input.

  1. A user asks Claude (or any LLM): “Show me how OAuth PKCE works as a sequence diagram.”
  2. The LLM emits a short Diagram-Engine source block inline.
  3. The chat client (or a downstream renderer) detects the block, calls the Diagram-Engine parser+renderer (client-side, no backend).
  4. SVG appears inline; the user sees the diagram without leaving the conversation.
  5. If the source had a validator error, the validator returns a named-token error + an expected-shape hint + a working example, and the LLM repairs the source on a single retry.

The public playground at the project site is the adoption surface for humans — paste a snippet, see the render, copy the URL.

4. Scope (MoSCoW) — final

Must — DONE:

  • ✅ Recursive-descent parser for a single, locked grammar
  • ✅ Deterministic layout engine (no floating-point drift, no theme files)
  • ✅ SVG renderer for the 12 base shapes (rect, round-rect, cylinder, queue, cloud, actor, lifeline, diamond, hex, ellipse, doc, comment)
  • ✅ Validator with one-shot repairable errors (named token + expected shape + working example)
  • ✅ Architecture / sequence / state diagram support
  • ✅ Public playground (Astro + CF Pages)
  • ✅ 4-section docs (syntax / examples / layout rules / API)

Should — DONE:

  • ✅ Auto-clustering (R7) — nodes connected by ≥3 edges form an implicit group
  • ✅ Lane width inheritance (R12) — vertical lanes size from widest child + 24px
  • ✅ Edge label overlap avoidance (R19) — re-route or shorten before overlap
  • ✅ Visual style rules R31–R37 — single coherent aesthetic without theming
  • ✅ Custom inline SVG icon support — late add in P10

Could — partial:

  • ⏸️ npm publish — deferred to M2, no external consumer yet
  • ⏸️ Inline-in-chat plugin (Claude / ChatGPT integration) — deferred to M2
  • ⏸️ Embeddable web component (<diagram-engine>) — deferred to M2
  • ⏸️ Dark mode — playground only, not part of the deterministic render output

Won’t (M1) — kept out:

  • Multiple dialects or syntax flavors (one grammar, one render path)
  • Server-side rendering (client-side only; no backend)
  • Plugins / themes / CSS overrides (every visual property is set in the source text)
  • Diff/animation between diagrams (out of scope; static SVG only)
  • Manual layout hints (rank, position, etc.) — auto-layout must be good enough on its own

5. Architecture (final)

Single render path: text → AST → layout → SVG. No alternate paths, no plugins. Parser, layout engine, and renderer are three TypeScript modules shipped together as one bundle. See Architecture for the pipeline diagram.

6. Tech Stack — final choices

LayerOriginal specImplementedReason for change
LanguageTypeScriptTypeScript ✓Static types matter for AST/visitor patterns
ParserTree-sitterHand-written recursive descentTree-sitter grammar churn would have leaked into the LLM-facing surface; hand-roll keeps grammar locked
LayoutELK.jsCustom layoutELK is powerful but its output isn’t byte-deterministic across versions, breaking constraint #3
RendererHTML5 CanvasSVGSVG is scalable, copy-pasteable, and diffable; canvas would force a raster output
SiteNext.jsAstroStatic-first, lighter, faster cold-start on CF Pages
HostingVercelCloudflare PagesFree, fast, plays well with the Cloudflare Tunnel preview workflow
Distributionnpm packagePlayground first, npm deferredReal adoption comes from people pasting into the textarea, not reading a README

Cost posture: Zero recurring cost. Astro site is static; CF Pages free tier; client-side render means no compute bill. Custom domain via the existing Cloudflare zone.

7. Milestones — actual

PhaseWhat shipped
P1Grammar lock + recursive-descent parser + AST shape
P2Layout engine v0 — left-to-right flow only
P3SVG renderer + 12 base shapes
P4Validator + error catalog (named-token errors, expected-shape hints, working-example snippets)
P5Sequence diagrams (lifelines, async messages, activation bars)
P6State machines + self-loops + initial/final pseudostates
P7Auto-clustering (R7) + lane width inheritance (R12)
P8Edge routing + label-overlap avoidance (R19)
P9Visual style rules R31–R37 (corner radius, stroke weight, icon padding, color tokens)
P10Custom inline SVG icon support
P11Public playground site (Astro) + 4-section docs + CF Pages deploy

M1 DoD passed:

  • ✅ 94% first-shot LLM render success (target 85%)
  • ✅ 180 median tokens per diagram (target ≤250)
  • ✅ 120 ms p50 render (target <250)
  • ✅ Public playground live, byte-deterministic output verified across browsers

8. Cost & Quota

ItemFree tier?Actual usage
Cloudflare PagesStatic site, ~few hundred requests/day
Cloudflare Tunnel (preview)Used for previewing dev branches
Custom domainExisting Cloudflare zone
Build minutes<1 min/build via Astro static export

No recurring spend. No backend, no database, no auth, no rate limit needed.

9. Risks & open questions — outcomes

Original risks:

  • “Custom layout engine will be too hard” → mitigated by 11-phase incremental approach; R31–R37 visual rules added late only after structural rules were stable.
  • “Determinism will be hard to maintain” → resolved by avoiding floating-point math in layout decisions; all coordinates are integer.
  • “LLM eval set is subjective” → mitigated by fixing a 50-prompt held-out eval before tuning the grammar.

New risks (M2 backlog):

  • Without an embed surface (web component / npm), adoption is gated on people finding the playground.
  • The grammar is locked, but any new shape (e.g. ERD entity-relationship) reopens the parser; need a backward-compatibility policy before M2.

Original open Qs:

  • Q1: Should we ship a Mermaid → Diagram-Engine transpiler? → ❌ dropped, the syntaxes are too different and the conversion would re-introduce Mermaid’s footguns.
  • Q2: Theming support? → ❌ dropped, conflicts with constraint #5 (no hidden state).
  • Q3: Should the playground support sharing via URL? → ✅ yes, source is encoded into the URL fragment (no backend).

10. Definition of Done

M1 Done: ✅ Public playground live, 4-section docs published, 94% first-shot success on the LLM eval set, byte-deterministic output, custom-icon support shipped.

M2 Done (distribution):

  • ⏳ npm package published (@diagram-engine/core)
  • ⏳ Embeddable web component (<diagram-engine>)
  • ⏳ At least one external consumer (Claude skill / VS Code extension / blog post) using the engine in production
  • ⏳ Backward-compatibility policy for grammar additions

See also

  • Implementation — DSL grammar, layout rule catalog, perf numbers, repro steps
  • Architecture — pipeline diagram, error model, deploy topology
  • Notes — chronological decision log