s0rg/crawley
The unix-way web crawler
Extract web archive data using Wayback Machine and Common Crawl
The unix-way web crawler
Easy-to-use Web archiver
DuckDB extension to fetch pages from Wayback Machine & Common Crawl
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
2 captures since 2026-05-23