Most personal blogs are invisible to AI agents. The content is there, but it is locked inside opaque HTML with no structured entry points. Search engines learned to crawl decades ago. Agents are learning now, and the sites that meet them halfway will be the ones that get read, cited, and integrated.
This guide walks through the practical steps to make a personal blog or long-form essay archive fully agent-readable, using benmilne.com as a working reference implementation.
The discovery layer
Agents need to find your site and understand what it offers before they can use it. This starts with well-known files:
llms.txtandllms-full.txtat the root. These are plain-text files that describe the site, its API, and its content for LLM crawlers. Think of them asrobots.txtfor the age of agents. See the full version here.agents.md— a Markdown file with instructions for autonomous agents: what tools are available, what the conventions are, how to search and cite. Example.- OpenAPI spec — a machine-parseable description of every API endpoint. See the spec.
- MCP server card — the Model Context Protocol defines a standard for exposing tools to AI assistants. A server card at
/.well-known/mcp/server-card.jsontells MCP-compatible clients what your server can do.
Content negotiation
The same URL should serve different representations based on what the client asks for. On benmilne.com, every essay URL supports three formats via the Accept header:
text/html(default) — the rendered page for browsers.application/json— a structured JSON object with title, date, body HTML, categories, tags, and metadata.text/markdown— the raw essay source, clean and ready for LLM context windows.
This means an agent can fetch any essay as markdown without parsing HTML. A browser gets the designed reading experience. Same URL, different representations.
Structured data
JSON-LD gives search engines and agents a typed understanding of your content. For a personal essay site, the key schema types are:
- Person — who you are, with
sameAslinks to your profiles. - Article — one per essay, with headline, date, author reference.
- WebSite — the site itself.
- Service — if you expose an API, describe it as a service.
- BreadcrumbList — helps agents understand navigation hierarchy.
A public API
An API transforms a blog from a collection of HTML pages into a queryable archive. The minimum viable set:
- Paginated post listing (
GET /api/posts) - Single post by slug (
GET /api/posts/{slug}) - Full-text search (
GET /api/search?q=) - Category and tag filtering
- Site metadata endpoint
No authentication required for read operations. Include an attribution field in every response so agents know how to cite the source.
MCP tools
The Model Context Protocol lets AI assistants discover and call tools on your server via JSON-RPC. Registering tools like search_posts, list_posts, and get_post makes your content natively accessible from ChatGPT, Claude, and any MCP-compatible client.
Streaming for agents
For search-heavy workflows, an SSE (Server-Sent Events) endpoint lets agents receive results incrementally instead of waiting for the full response. This is especially useful for large archives where search may take time.
Implementation reference
benmilne.com implements everything described above as a Cloudflare Worker with a D1 database. The full source surfaces are:
The architecture is intentionally simple. No framework, no build step for the runtime, no external dependencies for serving content. The Worker is the framework.