Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
A curated list of awesome big data frameworks, ressources and other awesomeness.
GitHub stars and default-branch commits for oxnr/awesome-bigdata.
183 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
High Throughput Real-time Stream Processing Framework
A distributed Spark/Scala implementation of the isolation forest and extended isolation forest algorithms for unsupervised outlier detection, featuring support for scalable training and ONNX export for easy cross-platform inference.
Hadoop MapReduce in idiomatic Clojure.
InfiniDB Data Warehouse
The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).
One database for likes, views, follows — pre-computed, served in real-time
Hadoop log aggregator and dashboard
The DB that's replicated, sharded and transactional.
Haeinsa is linearly scalable multi-row, multi-table transaction library for HBase
Apache Tephra: Transactions for HBase.
RDF-Centric Map/Reduce Framework and Freebase data conversion tool
Key Value Database in .Net with Object DB Layer, RPC, dynamic IL and much more
Statistical Workload Injector for MapReduce - Project at UC Berkeley AMP Lab
An convenient R tool for manipulating tables in PostgreSQL type databases and a wrapper of Apache MADlib.
No description.
🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Schedoscope is a scheduling framework for painfree agile development, testing, (re)loading, and monitoring of your datahub, lake, or whatever you choose to call your Hadoop data warehouse these days.
Legacy Chartist Repo for old gh-pages
Hadoop Data Integration with various databases, ftp servers, salesforce. Incremental update, dedup, append, merge your data on Hadoop.
Mirror of Apache Slider
Big Data Science Swiss Army Knife - http://www.tuktu.io --
Wide NoSQL benchmark for RocksDB, LevelDB, Redis, WiredTiger and MongoDB extending the Yahoo Cloud Serving Benchmark
Tuple MapReduce for Hadoop: Hadoop API made easy
Develop streaming applications for IBM Streams in Python, Java & Scala.
Client-side AES-256-GCM zero-knowledge encryption library. Browser-native Web Crypto API, no dependencies. Powers FileShot.io.
SNA team homepage
No description.
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.
DocId set compression and set operation library
Web UI for Impala
NetLytics is a Hadoop-powered framework for performing advanced analytics on various kinds of networks logs
No description.
A column-oriented approach to feature engineering. Feature engineering and machine learning: together at last!