Pyx-Corp/spectrawl
The unified web layer for AI agents. Search (8 engines), stealth browse, auth, and act on 24 platforms. One npm install, self-hosted.
๐จ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
The unified web layer for AI agents. Search (8 engines), stealth browse, auth, and act on 24 platforms. One npm install, self-hosted.
Fast Python web crawler for RAG and AI ingestion. Extracts clean Markdown from any site for LLMs and vector stores.
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.
๐๐ค Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
2 captures since 2026-05-23