Skip to main content

Trusted Gateway

AI agents need to access the web — but the open web is hostile to automation. Pages block bots. JavaScript renders content invisibly. Markup is inconsistent. Fetching the same URL twice might return completely different content. md-proxy solves this.

It's a high-performance content gateway that sits between AI agents and the web, handling every complexity so agents receive clean, consistent, structured markdown — every time.

Content Conversion

md-proxy converts any web page to clean markdown with YAML frontmatter carrying structured metadata: title, description, author, publication date, word count, language, content hash. Six transform modes let you control exactly what you get — from full page content to navigation-only, headings-only, or a structured JSON crawl object.

Intelligent Rendering

Not every page responds to a simple HTTP GET. md-proxy runs a persistent pool of Chromium browser instances and applies heuristic scoring to decide — automatically — whether a page needs headless rendering. It analyses JavaScript indicators, content density, text-to-HTML ratio, and JSON-LD data. Static fetch when possible; Chromium when needed.

Multi-Tier Caching

An in-memory LRU cache (2GB by default) handles the hot path. Brotli-compressed S3 storage handles persistence with a configurable TTL (7 days by default). Async write-through keeps them in sync. Agents get sub-millisecond responses on cache hits — and origin servers barely notice the traffic.

Security by Design

md-proxy blocks SSRF attacks by checking DNS resolution against private IP ranges before making any outbound request. It respects robots.txt automatically and enforces multi-tier rate limiting: per-account, per-origin, and with daily quotas. A hot-reloadable host allowlist and client IP denylist give operators fine-grained control without restarts.

MCP Integration

Seven MCP tools expose md-proxy's capabilities directly to AI agents: fetch content, POST data, preview idempotent requests safely, extract action manifests from HTML forms, expand URI templates, check robots.txt compliance, and query system capabilities. All available via stdio (local) or SSE (network) transports.

Change Detection

md-proxy tracks content changes using xxHash64 on markdown output and maintains an S3-backed change log. Edge-Search polls this log to trigger automatic re-indexing when content updates — without anyone having to do anything manually.

Get in touch to learn more →