Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
GitHub projects from awesome lists
Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Get your data in RAM. Get compute close to data. Enjoy the performance.
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
In-memory dimensional time series database.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🛠 All-in-one web-based IDE specialized for machine learning and data science.
A Scala API for Cascading
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
DEPRECATED: Data collection and processing made easy.
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Beringei is a high performance, in-memory storage engine for time series data.
PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
The Baidu File System.
Web UI for PrestoDB.
a graph visualization library using web workers and jQuery
A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase
Kubernetes-native platform to run massively parallel data/streaming jobs
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Open source framework for processing, monitoring, and alerting on time series data
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
🌀 AI-native framework for building data portals. Scaffold a full portal from a brief and load datasets in minutes with agentic skills — any backend (CKAN, GitHub, Frictionless).
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Microsoft Graph Engine
Streaming MapReduce with Scalding and Storm
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
A Graph Traversal Language (no longer active - see Apache TinkerPop)
An Internet-Scale Database.
ActorDB distributed SQL database
Secor is a service implementing Kafka log persistence
A large-scale entity and relation database supporting aggregation of properties
Distributed object store