internetarchive/Sparkling
Internet Archive's Sparkling Data Processing Library
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Internet Archive's Sparkling Data Processing Library
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
Web application for distributed compute analysis of Archive-It web archive collections.
ArchivesSpace, the archives management tool
2 captures since 2026-05-23