twitter/summingbird
Streaming MapReduce with Scalding and Storm
A Scala API for Cascading
Streaming MapReduce with Scalding and Storm
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Storehaus is a library that makes it easy to work with asynchronous key value stores
Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or whatever you choose to call your Hadoop data warehouse these days.
PySpark + Scikit-learn = Sparkit-learn
Python library for time series forecasting using scikit-learn compatible models, statistical methods, and foundation models
2 captures since 2026-05-24