Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
GitHub projects from awesome lists
Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
CLI implementation of httpreserve that can test links and retrieve internet archive replacements
No description.
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Java application to download WARCs from WASAPI
JWAT Tools
A jupyter notebook illistrating the basics of Common Crawl's datasets.
DuckDB extension for parsing WARC files
Java Web Archive Toolkit
A whirlwind tour of Common Crawl's data using Java
Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js
Partition (W)ARC Files by MIME Type and Year
Convert a bag-nabit dataset stored in a ZIP into a full-content WARC.