Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
GitHub projects from awesome lists
Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Go package and CLI tool for saving web page as single HTML file
Snapshots a web page to get it as a static, self-contained HTML document.
🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
Extract web archive data using Wayback Machine and Common Crawl
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Tool and library for handling Web ARChive (WARC) files.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
A robust web archive analytics toolkit
Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in this repo is now only for reference. For support and issues of 'warc-indexer', please communicate with NetArchiveSuite.
:whale2: One-Click User Instigated Preservation
Parse And Create Web ARChive (WARC) files with node.js
📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity
Offline-first web browser
A Memento Aggregator CLI and Server in Go
A commandline tool and Python library for archiving data from Facebook using the Graph API.
A collection of tools for archiving and analysing the internet.
Various Jupyter notebooks about Common Crawl data
Easy-to-use Web archiver
A dockerized, queued high fidelity web archiver based on Squidwarc
Java library for reading and writing WARC files with a typed API
:gear: A Rust library for reading and writing WARC files
Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user access to the copy
Converts WARC files to static HTML
Convert HTTP Archive (HAR) -> Web Archive (WARC) format
NPM package and CLI tool for saving web page as single HTML file
A Rails engine supporting the discovery of web archives.
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.