Sign in
โ† Back to search

harvard-lil/scoop

๐Ÿจ High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

Stars
198
Forks
13
Commits
1255
Language
JavaScript
Awesome lists
1

Similar repositories

Pyx-Corp/spectrawl

The unified web layer for AI agents. Search (8 engines), stealth browse, auth, and act on 24 platforms. One npm install, self-hosted.

24 stars
JavaScript 1 awesome list

AIMLPM/markcrawl

Fast Python web crawler for RAG and AI ingestion. Extracts clean Markdown from any site for LLMs and vector stores.

2 stars
Python 1 awesome list

N0taN3rd/Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

174 stars
JavaScript 1 awesome list

ArchiveTeam/grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

1572 stars
Python 1 awesome list

harvard-lil/warcbench

A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.

14 stars
Python 1 awesome list

unclecode/crawl4ai

๐Ÿš€๐Ÿค– Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

67645 stars
Python 2 awesome lists

Tracked growth

2 captures since 2026-05-23

Latest capture 2026-05-31 03:01

Stars history

Total stars

Commits history

Default branch commits

Metadata

  • Created: 2022-09-20
  • First commit: 2022-09-20
  • Last pushed: 2025-09-03
  • Archived: no
  • Stack detected: โ€”
  • License: MIT

AI development signals

No AI development config files detected.