Sign in

Awesome List

Awesome Bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

oxnr/awesome-bigdata #awesome#awesome-list#bigdata#data#data-analytics#data-science#data-stream#data-visualization#data-warehouse#database#distributed-database#series-database#stream-processing#streaming-data#visualize-data 404 Not Found | https://api.github.com/repos/streamsets/datacollector | message=Not Found | rate_limit_remaining=4447 | rate_limit_reset=1780487385
List stars
14,418
README repos
211
Indexed repos
183
List commits
592
Forks
2,585
Open issues
3

Tracked list growth

GitHub stars and default-branch commits for oxnr/awesome-bigdata.

Latest scan 2026-06-03 10:49

Likes history

GitHub stars

Commits history

Default branch commits

Indexed repositories

183 repos currently saved from this list.

No filters applied
Latest repo push 2026-06-03

Filter this list

Search within Awesome Bigdata or narrow by ecosystem and project health.

Search mode
Tune results
More filters Topics, generated tags, stack, age, archive status, and growth.
Ecosystem
Health

Uses known first-commit dates.

Momentum
Reset filters
Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

metabase/metabase

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

Clojure AngularExpressNext.js Bunnpm #analytics#bi#business-intelligence#businessintelligence pushed 2026-06-03 42,472 commits first commit 2015-02-02 4 list mentions AI dev signals
apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python CeleryCobraFastAPI Go modulesGradle #airflow#apache#apache-airflow#automation pushed 2026-06-03 38,722 commits first commit 2014-10-06 4 list mentions AI dev signals
pingcap/tidb

TiDB is built for agentic workloads that grow unpredictably, with ACID guarantees and native support for transactions, analytics, and vector search. No data silos. No noisy neighbors. No infrastructure ceiling.

Go #agent#agent-context#agent-memory#agentic pushed 2026-06-01 28,141 commits first commit 2015-09-06 2 list mentions AI dev signals
spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Python #hadoop#luigi#orchestration-framework#python pushed 2026-05-19 4,314 commits 3 list mentions
tikv/tikv

Distributed transactional key-value database, originally created to complement TiDB

Rust Cargo #cncf#consensus#distributed-transactions#hacktoberfest pushed 2026-06-03 8,431 commits first commit 2016-01-07 2 list mentions AI dev signals
spotify/annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

C++ #approximate-nearest-neighbor-search#c-plus-plus#golang#locality-sensitive-hashing pushed 2025-10-29 911 commits first commit 2013-02-20 3 list mentions
h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Jupyter Notebook #automl#big-data#data-science#deep-learning pushed 2026-05-30 32,791 commits first commit 2014-03-03 3 list mentions AI dev signals
numenta/nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

Python pytestTornado pip #artificial-intelligence#hierarchical-temporal-memory#machine-intelligence#neocortex pushed 2024-12-03 6,627 commits first commit 2013-04-05 1 list mention
aimhubio/aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

Python FastAPIpytestReact npmPEP 517 #ai#data-science#data-visualization#experiment-tracking pushed 2026-06-02 2,265 commits first commit 2019-05-31 3 list mentions