Most personal blogs are invisible to AI agents. The content is there, but it is locked inside opaque HTML with no structured entry points. Search engines learned to crawl decades ago. Agents are learning now, and the sites that meet them halfway will be the ones that get read, cited, and integrated.

This guide walks through the practical steps to make a personal blog or long-form essay archive fully agent-readable, using benmilne.com as a working reference implementation.

The discovery layer

Agents need to find your site and understand what it offers before they can use it. This starts with well-known files:

  • llms.txt and llms-full.txt at the root. These are plain-text files that describe the site, its API, and its content for LLM crawlers. Think of them as robots.txt for the age of agents. See the full version here.
  • agents.md — a Markdown file with instructions for autonomous agents: what tools are available, what the conventions are, how to search and cite. Example.
  • OpenAPI spec — a machine-parseable description of every API endpoint. See the spec.
  • MCP server card — the Model Context Protocol defines a standard for exposing tools to AI assistants. A server card at /.well-known/mcp/server-card.json tells MCP-compatible clients what your server can do.

Content negotiation

The same URL should serve different representations based on what the client asks for. On benmilne.com, every essay URL supports three formats via the Accept header:

  • text/html (default) — the rendered page for browsers.
  • application/json — a structured JSON object with title, date, body HTML, categories, tags, and metadata.
  • text/markdown — the raw essay source, clean and ready for LLM context windows.

This means an agent can fetch any essay as markdown without parsing HTML. A browser gets the designed reading experience. Same URL, different representations.

Structured data

JSON-LD gives search engines and agents a typed understanding of your content. For a personal essay site, the key schema types are:

  • Person — who you are, with sameAs links to your profiles.
  • Article — one per essay, with headline, date, author reference.
  • WebSite — the site itself.
  • Service — if you expose an API, describe it as a service.
  • BreadcrumbList — helps agents understand navigation hierarchy.

A public API

An API transforms a blog from a collection of HTML pages into a queryable archive. The minimum viable set:

  • Paginated post listing (GET /api/posts)
  • Single post by slug (GET /api/posts/{slug})
  • Full-text search (GET /api/search?q=)
  • Category and tag filtering
  • Site metadata endpoint

No authentication required for read operations. Include an attribution field in every response so agents know how to cite the source.

MCP tools

The Model Context Protocol lets AI assistants discover and call tools on your server via JSON-RPC. Registering tools like search_posts, list_posts, and get_post makes your content natively accessible from ChatGPT, Claude, and any MCP-compatible client.

Streaming for agents

For search-heavy workflows, an SSE (Server-Sent Events) endpoint lets agents receive results incrementally instead of waiting for the full response. This is especially useful for large archives where search may take time.

Implementation reference

benmilne.com implements everything described above as a Cloudflare Worker with a D1 database. The full source surfaces are:

The architecture is intentionally simple. No framework, no build step for the runtime, no external dependencies for serving content. The Worker is the framework.