nla/httrack2warc
Converts HTTrack crawls to WARC files
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
Converts HTTrack crawls to WARC files
Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in this repo is now only for reference. For support and issues of 'warc-indexer', please communicate with NetArchiveSuite.
WarcDB: Web crawl data as SQLite databases.
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Partition (W)ARC Files by MIME Type and Year
Streaming WARC/ARC library for fast web archive IO
2 captures since 2026-05-23