helgeho/WarcPartitioner
Partition (W)ARC Files by MIME Type and Year
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Partition (W)ARC Files by MIME Type and Year
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
Java library for reading and writing WARC files with a typed API
Tool and library for handling Web ARChive (WARC) files.
Streaming WARC/ARC library for fast web archive IO
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
2 captures since 2026-05-23