archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
A Rails engine supporting the discovery of web archives.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Web application for distributed compute analysis of Archive-It web archive collections.
Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in this repo is now only for reference. For support and issues of 'warc-indexer', please communicate with NetArchiveSuite.
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
WarcDB: Web crawl data as SQLite databases.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
2 captures since 2026-05-23