Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
GitHub projects from awesome lists
Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Apache Spark - A unified analytics engine for large-scale data processing
♞ lichess.org: the forever free, adless and open source chess server ♞
A Git platform powered by Scala with easy installation, high extensibility & GitHub API compatibility
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
The leader in Customer Data Infrastructure
Simple and Distributed Machine Learning
Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
TheHive is a Collaborative Case Management Platform, now distributed as a commercial version
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
In-memory dimensional time series database.
A Scala API for Cascading
Streaming MapReduce with Scalding and Storm
Cortex: a Powerful Observable Analysis and Active Response Engine
GeoTrellis is a geographic data processing engine for high performance applications.
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
CPU and GPU-accelerated Machine Learning Library
Rudder is a configuration and security automation platform. Manage your Cloud, hybrid or on-premises infrastructure in a simple, scalable and dynamic way.
groovy kind of love
Storehaus is a library that makes it easy to work with asynchronous key value stores
MOVED - The project is still under development but this page is deprecated.
Family Accounting Tool
Serverless proxy for Spark cluster
A distributed Spark/Scala implementation of the isolation forest and extended isolation forest algorithms for unsupervised outlier detection, featuring support for scalable training and ONNX export for easy cross-platform inference.
A lightweight framework for writing REST services in Scala.
Scala client for OpenAI API and other major LLM providers
The DB that's replicated, sharded and transactional.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
No description.
List editor for power users, backed by a self-hosted server