Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
GitHub projects from awesome lists
Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
A whirlwind tour of Common Crawl's data using Python
Web archiving using Google Chrome
Web archive index server based on RocksDB
Prototype SOLR-powered web archive exploration UI.
Download and attach provenance to public datasets
Converts HTTrack crawls to WARC files
Command-line tool and Rust library for handling Web ARChive (WARC) files
No description.
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
Wget with Lua extension
simple script to convert web resources to a single warc file
Web Archiving Course
Create Robust Links from within Zotero
Web application for distributed compute analysis of Archive-It web archive collections.
DuckDB extension to fetch pages from Wayback Machine & Common Crawl
golang readers for ARC and WARC webarchive formats
A tool for detecting viruses and NSFW material in WARC files
No description.
A data retrieval & exploration protocol designed to investigate science and policy processes
Internet Archive's Sparkling Data Processing Library
A client for the Archive-It And Webrecorder WASAPI Data Transfer API
Object Resource Stream and CDXJ Drafts
A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web ARChive) files.
No description.
Playback webpages from Wayback Machine
Tika based link (URL) extractor for httpreserve
A Tool to Summarize Web Archive Holdings
The UKWA Heritrix3 custom modules and Docker builder.
Web archive deduplication tools