Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
A curated list of awesome big data frameworks, ressources and other awesomeness.
GitHub stars and default-branch commits for oxnr/awesome-bigdata.
183 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
HyperDex is a scalable, searchable key-value store
An open source event analytics platform
Memcache on SSD
SQL-based streaming analytics platform at scale
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
EliasDB a graph-based database.
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Bistro is a flexible distributed scheduler, a high-performance framework supporting multiple paradigms while retaining ease of configuration, management, and monitoring.
An Alert Management Web Application
Twemcache is the Twitter Memcached
CPU and GPU-accelerated Machine Learning Library
Fast multilayer perceptron neural network library for iOS and Mac OS X
A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself. New implementation in http://github.com/probcomp/bayeslite
Fast and reliable message broker built on top of Kafka.
Time-series database
Netflix's distributed Data Pipeline
📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Build platforms that flexibly mix SQL, batch, and stream processing paradigms
A probabilistic data structure service and storage
GhostDB is a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale.
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
Compose complex, data-driven visualizations from reusable charts and components with d3
See gitlab: https://gitlab.com/Project-FiFo/DalmatinerDB/dalmatinerdb
🦎 A multi-protocol edge & service proxy. Seamlessly interface web apps, IoT clients, & microservices to Apache Kafka® via declaratively defined, stateless APIs.
Banana for Solr - A Port of Kibana
A distributed system designed to ingest and process time series data
A flexible, partial, out-of-order and real-time typeahead search library
Map-Reduce for Clojure
Distributed database specialized in exporting key/value data from Hadoop
An Erlang implementation of Redis
SiriDB is a highly-scalable, robust and super fast time series database. Build from the ground up SiriDB uses a unique mechanism to operate without a global index and allows server resources to be added on the fly. SiriDB's unique query language includes dynamic grouping of time series for easy analysis over large amounts of time series.
Spring XD makes it easy to solve common big data problems such as data ingestion and export, real-time analytics, and batch workflow orchestration
Storehaus is a library that makes it easy to work with asynchronous key value stores
An open-source columnar data format designed for fast & realtime analytic with big data.
No description.
Graviton Database: ZFS for key-value stores.
An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration
DEPRECATED. Zeppelin has moved to Apache. Please make pull request there
Substation is a toolkit for routing, normalizing, and enriching security event and audit logs.
High performance distributed data processing engine
Accumulo backed time series database
A Relational Database Backed by Apache Kafka
Phoebus is a distributed framework for large scale graph processing written in Erlang.
Flexible and Extensible Machine Learning in Ruby
realtime search/indexing system
Scalable Machine Learning in Scalding
Next-generation web analytics processing with Scala, Spark, and Parquet.
Sparrow scheduling platform (U.C. Berkeley).
Serverless proxy for Spark cluster
Erlang LSM BTree Storage