Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
A curated list of awesome big data frameworks, ressources and other awesomeness.
GitHub stars and default-branch commits for oxnr/awesome-bigdata.
183 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Cubism.js: A JavaScript library for time series visualization.
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support
Business intelligence made simple
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.
Privacy and Security focused Segment-alternative, in Golang and React
🍰 Progressive <svg> pie, donut, bar and line charts
Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
Riak is a decentralized datastore from Basho Technologies.
Scribe is a server for aggregating log data streamed in real time from a large number of servers.
Open Source AI Infra & Engineering Control Plane
Get your data in RAM. Get compute close to data. Enjoy the performance.
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
In-memory dimensional time series database.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🛠 All-in-one web-based IDE specialized for machine learning and data science.
A Scala API for Cascading
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
DEPRECATED: Data collection and processing made easy.
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Beringei is a high performance, in-memory storage engine for time series data.
PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
The Baidu File System.
Web UI for PrestoDB.
a graph visualization library using web workers and jQuery
A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase
Kubernetes-native platform to run massively parallel data/streaming jobs
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Open source framework for processing, monitoring, and alerting on time series data
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
🌀 Rapidly build feature-rich data portals using a modern frontend framework. Native CKAN support. OpenMetadata and Git compatible.
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Microsoft Graph Engine
Streaming MapReduce with Scalding and Storm
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
A Graph Traversal Language (no longer active - see Apache TinkerPop)
An Internet-Scale Database.
ActorDB distributed SQL database
Secor is a service implementing Kafka log persistence
A large-scale entity and relation database supporting aggregation of properties
Distributed object store
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Fast scalable time series database
Elassandra = Elasticsearch + Apache Cassandra
Blazingly fast analytics database that will rapidly devour all of your data.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Dynamic HTML5 visualization
Bloomberg's distributed RDBMS
AgensGraph, a transactional graph database based on PostgreSQL
HiBench is a big data benchmark suite.
In-memory NoSQL database with ACID transactions, Raft consensus, and Redis API