// Scrape is now part of Data

Scraping capabilities moved into the unified Data plan — metered per page alongside search, metadata, and Claude-backed PDF parsing. Existing Scrape subscribers keep their plans; new signups use Data.

See Data plan
Katzilla Scrapeweb → structured

When the data lives
behind HTML, not an API.

Turn any URL into clean markdown, HTML, or JSON. Headless-browser render when it's needed, fast HTTP when it's not. Ships with every response's citation metadata baked in — same contract as Data.

markdown
default output
html · json
also supported
Playwright
render engine
// POST /scrapeapi.katzilla.dev
curl -X POST https://api.katzilla.dev/scrape \
  -H "X-API-Key: kz_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://nyc.gov/site/council/legislation/…",
    "format": "markdown",
    "engine": "auto"
  }'

{
  "success": true,
  "data": {
    "markdown": "# Local Law 144 of 2021 …",
    "title": "Local Law 144",
    "links": ["…"]
  },
  "citation": {
    "source_url": "https://nyc.gov/…",
    "retrieved_at": "2026-04-17T12:00:00Z",
    "content_hash": "sha256:a1f3…"
  }
}
[01/When to use it]

Use Scrape when the data's in a page, not an endpoint.

state + local gov

City council minutes, county records, state agency pages. Federal is in Data — state/local lives in HTML.

company pages

Pipeline pages, changelogs, press releases, status pages. Primary-source company info with no feed.

secondary coverage

Trade pubs, Substacks, analyst notes. The interpretation layer on top of primary sources.

ad-hoc URLs

Your user or agent hands you a URL. "Read this and tell me about it" with clean structured output.

international long tail

Smaller ministries, regional authorities, NGO reports. Beyond the aggregators Data already covers.

anything else

If it's on the web, Scrape can fetch it. Same API key as Data.

[02/What it does]

Auto-engine selection

Per-URL, we pick HTTP or headless Playwright. You don't configure; we render what needs rendering.

Three output formats

Markdown (LLM-ready), clean HTML (archival), JSON with CSS selectors (structured extraction). Specify per call.

Screenshot + PDF

Pro plan adds full-page screenshot and printable PDF capture. Useful for visual archival and UI testing.

Proxy rotation

Pro and Business plans route through rotating residential proxies when a site's being difficult.

Batch + crawl

POST /scrape/batch for an array of URLs. POST /scrape/crawl to follow links up to N pages deep.

Citation contract

Every response carries source_url, retrieved_at, content_hash. Same shape as Katzilla Data — one contract across products.

[03/Scheduled scrapes]

Fire and forget.

Schedule a URL on a cron. Katzilla watches for content changes via content_hash. When the page updates, your webhook fires. Hobby gets 3 slots, Pro 25, Business 250.

// POST /scrape/schedules
{
  "url": "https://sec.gov/cgi-bin/browse-edgar?action=…",
  "cadence": "hourly",
  "webhook": "https://yours.app/hooks/sec-new",
  "notify_on": "content_change",
  "format": "markdown"
}
[04/Pricing]

Scrape pricing (legacy — grandfathered)

These plans remain active for existing Scrape subscribers. New signups get scraping bundled into the unified Data plan — see the retrieval diagram for the per-page metering.

// free
$0/mo
1,000 scrapes / mo · 1 concurrent
Start free
// hobby
$12/mo
6,000 scrapes / mo · 3 concurrent · browser engine · 3 schedules
Go hobby
// pro
$39/mo
60,000 / mo · 15 concurrent · proxy rotation · screenshot + PDF · 25 schedules
Go pro
// business
$149/mo
300,000 / mo · 50 concurrent · priority support · 250 schedules
Go business