Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
A curated list of awesome big data frameworks, ressources and other awesomeness.
GitHub stars and default-branch commits for oxnr/awesome-bigdata.
183 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
An Open Source Machine Learning Framework for Everyone
Apache Superset is a Data Visualization and Data Exploration Platform
Apache ECharts is a powerful, interactive charting and data visualization library for browser
scikit-learn: machine learning in Python
Deep Learning for humans
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Apache Spark - A unified analytics engine for large-scale data processing
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A library for efficient similarity search and clustering of dense vectors.
TiDB is built for agentic workloads that grow unpredictably, with ACID guarantees and native support for transactions, analytics, and vector search. No data silos. No noisy neighbors. No infrastructure ceiling.
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
Data Apps & Dashboards for Python. No JavaScript Required.
matplotlib: plotting with Python
high-performance graph database for real-time use cases
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Open-source JavaScript charting library behind Plotly and Dash
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Distributed transactional key-value database, originally created to complement TiDB
An orchestration platform for the development, production, and observation of data assets.
Apache Pulsar - distributed pub-sub messaging system
An open-source graph database
An embedded key/value database for Go.
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
Apache Druid: a high performance real-time analytics database.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
A fast, light-weight proxy for memcached and redis
A JavaScript library aimed at visualizing graphs of thousands of nodes and edges
A visualization grammar.
Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser.
YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
Real-time Geospatial and Geofencing
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Easy & Flexible Alerting With ElasticSearch
Simple feed-forward neural network in JavaScript
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
The Open Source Feature Store for AI/ML
The leader in Customer Data Infrastructure
Gephi - The Open Graph Viz Platform
A damn-sexy, open source real-time dashboard builder for IOT and other web mashups. A free open-source alternative to Geckoboard.
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL
Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
Agentic BI. Analytics at the speed of code ⚡️
A simple, distributed task scheduler and runner with a web based UI.