helgeho/HadoopConcatGz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Partition (W)ARC Files by MIME Type and Year
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
Tool and library for handling Web ARChive (WARC) files.
Java library for reading and writing WARC files with a typed API
Streaming WARC/ARC library for fast web archive IO
2 captures since 2026-05-23