← All posts
📅

Diagram-Engine: text-to-diagram designed for AI agents, not humans

Why Mermaid/PlantUML break down when an LLM is the author. JTBD, syntax design constraints, and 37 layout rules that made the output good enough for a public playground.

TL;DR — I built Diagram-Engine, a text-to-diagram DSL whose primary user is an AI agent, not a human. The constraint changes everything: shorter tokens, fewer footguns, deterministic layout, and a validator that returns errors an LLM can fix in one shot. Live playground + 4-section docs. This post is the PM rationale, not the syntax tutorial.

JTBD

When an AI assistant needs to produce a diagram inline in chat (architecture, sequence, state machine), the assistant wants to emit a few lines of text that render to a clean image, so that the user gets a diagram without leaving the conversation or installing tools.

The hidden user here is the LLM itself. Mermaid, PlantUML, Graphviz, D2 are all designed for humans. They are technically usable by LLMs, but the failure modes are LLM-hostile.

Why existing tools fail AI authors

ToolLLM-hostile failure mode
MermaidWhitespace-sensitive, ambiguous error messages, 5+ dialects for similar shapes
PlantUMLVerbose, server-side rendering, layout unpredictable across versions
Graphviz / DOTPowerful but layout requires manual hinting, hard to express “list of services”
D2Best of the bunch, but still optimized for human editing flow

The shared anti-pattern: when the LLM emits invalid syntax, the error is either silent (broken render) or human-flavored (“unexpected token at line 4 col 17”). LLMs can’t iterate on that.

Design constraints I locked in before writing any code

  1. Single render path: text → AST → SVG. No alternate dialects, no plugins.
  2. Errors are repairable in one shot: every validator error names the offending token, the expected shape, and a working example.
  3. Layout is deterministic: same input → identical output, byte for byte. No floating-point layout drift.
  4. Token budget: a 10-node architecture diagram fits in <200 tokens. Mermaid for the same diagram is 350–500.
  5. No hidden state: every visual property is set in the text. No theme files, no CSS overrides.

These are PM constraints, not engineering ones. Each rules out a class of LLM failures.

The 37 layout rules

The hard part wasn’t the parser. It was layout quality.

A diagram engine where the LLM author can’t see the output has to produce a good layout on the first try, every time. I shipped 37 layout rules across 11 phases. Examples:

  • R7: Nodes connected by ≥3 edges form an implicit cluster — auto-group them.
  • R12: Vertical lanes inherit width from their widest child + 24px padding.
  • R19: Edge labels never overlap nodes — route around or shorten.
  • R31–R37: Visual style rules (corner radius, stroke weight, icon padding) that match a single coherent aesthetic without theming.

Each rule eliminates one category of “this diagram is technically correct but visually terrible” output.

What an AI-first DSL looks like in practice

Compare a 4-service architecture in Mermaid (LLM-emitted, real example):

graph LR
  A[Web] --> B[API]
  B --> C[(Postgres)]
  B --> D[Cache]

vs. the same in Diagram-Engine:

web -> api -> postgres
api -> cache

8 tokens vs 21. No bracket-shape memorization. No directional hint needed (left-to-right is the default).

Numbers after 6 weeks

MetricValue
First-shot render success (LLM-emitted)94%
Median tokens per diagram180
Playground p50 render120ms
Validator-error-to-repair rate88% (1 retry)

What I’d tell a PM building dev tools for AI authors

  1. Design for the agent, not the human. Human-friendly DSLs (Markdown, Mermaid) optimize for typing. AI-friendly DSLs optimize for first-shot correctness and one-shot repairability.
  2. Errors are a UX surface. An error message that an LLM can’t act on is worse than a silent failure — at least silent failures get logged.
  3. Determinism is a feature. If the same input ever produces a different output, you’ve added a debugging cost that compounds with every user.
  4. Layout quality > syntax expressiveness. A small DSL with great auto-layout beats a powerful DSL where every diagram needs hinting.
  5. Ship a playground first. External validation comes from people pasting your DSL into a textarea, not from reading docs.

Live at the public playground. Docs in 4 sections (syntax / examples / layout rules / API). npm publish deferred — distribution today is the playground and the inline-in-chat use case.