Sign in

Awesome List

Awesome Bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

oxnr/awesome-bigdata #awesome#awesome-list#bigdata#data#data-analytics#data-science#data-stream#data-visualization#data-warehouse#database#distributed-database#series-database#stream-processing#streaming-data#visualize-data 404 Not Found | https://api.github.com/repos/deeplearning4j/rl4j | message=Not Found | rate_limit_remaining=3323 | rate_limit_reset=1780400982
List stars
14,417
README repos
211
Indexed repos
183
List commits
592
Forks
2,585
Open issues
3

Tracked list growth

GitHub stars and default-branch commits for oxnr/awesome-bigdata.

Latest scan 2026-06-02 10:49

Likes history

GitHub stars

Commits history

Default branch commits

Indexed repositories

183 repos currently saved from this list.

No filters applied
Latest repo push 2026-06-02

Age filters use known first-commit dates and exclude repositories that have not synced that data yet.

Reset
Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

metabase/metabase

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

Clojure AngularExpressNext.js Bunnpm #analytics#bi#business-intelligence#businessintelligence pushed 2026-06-02 42,435 commits first commit 2015-02-02 4 list mentions AI dev signals
apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python CeleryCobraFastAPI Go modulesGradle #airflow#apache#apache-airflow#automation pushed 2026-06-02 38,680 commits first commit 2014-10-06 4 list mentions AI dev signals
pingcap/tidb

TiDB is built for agentic workloads that grow unpredictably, with ACID guarantees and native support for transactions, analytics, and vector search. No data silos. No noisy neighbors. No infrastructure ceiling.

Go #agent#agent-context#agent-memory#agentic pushed 2026-06-01 28,141 commits first commit 2015-09-06 2 list mentions AI dev signals
spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Python #hadoop#luigi#orchestration-framework#python pushed 2026-05-19 4,314 commits 3 list mentions
tikv/tikv

Distributed transactional key-value database, originally created to complement TiDB

Rust #cncf#consensus#distributed-transactions#hacktoberfest pushed 2026-06-01 8,430 commits first commit 2016-01-07 2 list mentions AI dev signals
spotify/annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

C++ #approximate-nearest-neighbor-search#c-plus-plus#golang#locality-sensitive-hashing pushed 2025-10-29 911 commits first commit 2013-02-20 3 list mentions
h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Jupyter Notebook #automl#big-data#data-science#deep-learning pushed 2026-05-30 32,791 commits first commit 2014-03-03 3 list mentions AI dev signals
numenta/nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

Python pytestTornado pip #artificial-intelligence#hierarchical-temporal-memory#machine-intelligence#neocortex pushed 2024-12-03 6,627 commits first commit 2013-04-05 1 list mention