Sign in
← Back to search

ArchiveTeam/grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Stars
1,572
Forks
154
Commits
1174
Language
Python
Awesome lists
1

Similar repositories

N0taN3rd/Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

174 stars
JavaScript 1 awesome list

ArchiveTeam/wpull

Wget-compatible web downloader and crawler.

609 stars
HTML 1 awesome list

s0rg/crawley

The unix-way web crawler

340 stars
Go 2 awesome lists

netarchivesuite/solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.

144 stars
Java 1 awesome list

harvard-lil/scoop

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

198 stars
JavaScript 1 awesome list

Pyx-Corp/spectrawl

The unified web layer for AI agents. Search (8 engines), stealth browse, auth, and act on 24 platforms. One npm install, self-hosted.

24 stars
JavaScript 1 awesome list

Tracked growth

2 captures since 2026-05-23

Latest capture 2026-05-31 03:01

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2015-02-05
  • First commit: 2015-02-05
  • Last pushed: 2025-05-23
  • Archived: no
  • Stack detected: —
  • License: NOASSERTION

AI development signals

No AI development config files detected.