peterk/warcworker
A dockerized, queued high fidelity web archiver based on Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
A dockerized, queued high fidelity web archiver based on Squidwarc
Parse And Create Web ARChive (WARC) files with node.js
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
The unix-way web crawler
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
2 captures since 2026-05-23