harvard-lil/warcbench
A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.
A robust web archive analytics toolkit
A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.
Tool and library for handling Web ARChive (WARC) files.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
An extremely fast Python linter and code formatter, written in Rust.
A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
Java library for reading and writing WARC files with a typed API
2 captures since 2026-05-23