helgeho/ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Internet Archive's Sparkling Data Processing Library
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Web application for distributed compute analysis of Archive-It web archive collections.
PySpark + Scikit-learn = Sparkit-learn
2 captures since 2026-05-23