Scraping capabilities moved into the unified Data plan — metered per page alongside search, metadata, and Claude-backed PDF parsing. Existing Scrape subscribers keep their plans; new signups use Data.
Turn any URL into clean markdown, HTML, or JSON. Headless-browser render when it's needed, fast HTTP when it's not. Ships with every response's citation metadata baked in — same contract as Data.
curl -X POST https://api.katzilla.dev/scrape \
-H "X-API-Key: kz_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://nyc.gov/site/council/legislation/…",
"format": "markdown",
"engine": "auto"
}'
{
"success": true,
"data": {
"markdown": "# Local Law 144 of 2021 …",
"title": "Local Law 144",
"links": ["…"]
},
"citation": {
"source_url": "https://nyc.gov/…",
"retrieved_at": "2026-04-17T12:00:00Z",
"content_hash": "sha256:a1f3…"
}
}City council minutes, county records, state agency pages. Federal is in Data — state/local lives in HTML.
Pipeline pages, changelogs, press releases, status pages. Primary-source company info with no feed.
Trade pubs, Substacks, analyst notes. The interpretation layer on top of primary sources.
Your user or agent hands you a URL. "Read this and tell me about it" with clean structured output.
Smaller ministries, regional authorities, NGO reports. Beyond the aggregators Data already covers.
If it's on the web, Scrape can fetch it. Same API key as Data.
Per-URL, we pick HTTP or headless Playwright. You don't configure; we render what needs rendering.
Markdown (LLM-ready), clean HTML (archival), JSON with CSS selectors (structured extraction). Specify per call.
Pro plan adds full-page screenshot and printable PDF capture. Useful for visual archival and UI testing.
Pro and Business plans route through rotating residential proxies when a site's being difficult.
POST /scrape/batch for an array of URLs. POST /scrape/crawl to follow links up to N pages deep.
Every response carries source_url, retrieved_at, content_hash. Same shape as Katzilla Data — one contract across products.
Schedule a URL on a cron. Katzilla watches for content changes via content_hash. When the page updates, your webhook fires. Hobby gets 3 slots, Pro 25, Business 250.
{
"url": "https://sec.gov/cgi-bin/browse-edgar?action=…",
"cadence": "hourly",
"webhook": "https://yours.app/hooks/sec-new",
"notify_on": "content_change",
"format": "markdown"
}These plans remain active for existing Scrape subscribers. New signups get scraping bundled into the unified Data plan — see the retrieval diagram for the per-page metering.