propublica/upton
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
A Rails engine supporting the discovery of web archives.
Wget-compatible web downloader and crawler.
Self-hosted webscraper.
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
1 capture since 2026-06-09
Gemfile
· ruby · 11 dependencies
wombat.gemspec
· ruby · 0 dependencies
Gemfile.lock
· ruby · 94 dependencies