Search Awesome Repositories

apache/spark

Apache Spark - A unified analytics engine for large-scale data processing

AI dev

Stack

Scala Bundler Maven npm

GitHub topics

#big-data #java #jdbc #python #r #scala

Updated: 2026-06-03
Lists: 4 list mentions
First commit: 2010-03-29
History: 7 history points
License: Apache-2.0
Issues: 421 open

43,385

stars

Forks: 29,209
Commits: 48,390 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

lichess-org/lila

♞ lichess.org: the forever free, adless and open source chess server ♞

AI dev

Stack

Scala Svelte npm pnpm

GitHub topics

#chess #free-software #functional-programming #game #lichess #non-profit

Updated: 2026-06-11
Lists: 2 list mentions
First commit: 2012-02-19
History: 3 history points
License: AGPL-3.0
Issues: 1,266 open

18,347

stars

Forks: 2,689
Commits: 76,319 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

gitbucket/gitbucket

A Git platform powered by Scala with easy installation, high extensibility & GitHub API compatibility

Stack

Scala

GitHub topics

#git #gitbucket #scala #scalatra

Updated: 2026-06-03
Lists: 2 list mentions
First commit: 2013-04-10
History: 3 history points
License: Apache-2.0
Issues: 325 open

9,371

stars

Forks: 1,266
Commits: 6,355 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

delta-io/delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Stack

Scala Astro Vite Maven PEP 517 pip

GitHub topics

#acid #analytics #big-data #delta-lake #spark

Updated: 2026-06-04
Lists: 1 list mention
First commit: 2019-04-12
History: 2 history points
License: Apache-2.0
Issues: 1,509 open

8,833

stars

Forks: 2,107
Commits: 5,196 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

snowplow/snowplow

The leader in Customer Data Infrastructure

Stack

Scala

GitHub topics

#analytics #data #data-collection #data-pipeline #marketing-analytics #product-analytics

Updated: 2026-06-08
Lists: 1 list mention
First commit: 2012-03-01
History: 3 history points
License: Apache-2.0
Issues: 59 open

7,013

stars

Forks: 1,176
Commits: 6,929 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

microsoft/SynapseML

Simple and Distributed Machine Learning

AI dev

Stack

Scala React npm PEP 517 pip

GitHub topics

#ai #apache-spark #azure #big-data #cognitive-services #data-science

Updated: 2026-05-30
Lists: 1 list mention
First commit: 2017-06-02
History: 2 history points
License: MIT
Issues: 394 open

5,229

stars

Forks: 860
Commits: 1,790 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

d2iq-archive/marathon

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.

Archived

Stack

Scala pytest Bundler npm pip

GitHub topics

#dcos #dcos-orchestration-guild

Updated: 2022-09-08
Lists: 1 list mention
First commit: 2013-06-25
History: 21 history points
License: Apache-2.0
Issues: 29 open

4,036

stars

Forks: 832
Commits: 6,969 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

TheHive-Project/TheHive

TheHive is a Collaborative Case Management Platform, now distributed as a commercial version

Archived

Stack

Scala npm

GitHub topics

#analyzer #api #cortex #dfir #digital-forensics #free

Updated: 2025-07-25
Lists: 1 list mention
First commit: 2018-09-02
History: 2 history points
License: AGPL-3.0
Issues: 834 open

3,923

stars

Forks: 690
Commits: 2,756 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

awslabs/deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Stack

Scala Maven

GitHub topics

#dataquality #scala #spark #unit-testing

Updated: 2026-05-29
Lists: 1 list mention
First commit: 2018-08-07
History: 2 history points
License: Apache-2.0
Issues: 92 open

3,618

stars

Forks: 584
Commits: 362 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

Netflix/atlas

In-memory dimensional time series database.

AI dev

Stack

Scala Vite npm

Updated: 2026-06-13
Lists: 1 list mention
First commit: 2014-08-05
History: 3 history points
License: Apache-2.0
Issues: 9 open

3,550

stars

Forks: 349
Commits: 2,930 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

twitter/scalding

A Scala API for Cascading

Stack

Scala

Updated: 2023-05-28
Lists: 2 list mentions
First commit: 2012-01-10
History: 3 history points
License: Apache-2.0
Issues: 317 open

3,522

stars

Forks: 697
Commits: 4,195 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

twitter/summingbird

Streaming MapReduce with Scalding and Storm

Archived

Stack

Scala

Updated: 2022-01-19
Lists: 2 list mentions
First commit: 2012-09-25
History: 3 history points
License: Apache-2.0
Issues: 162 open

2,125

stars

Forks: 256
Commits: 1,793 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

TheHive-Project/Cortex

Cortex: a Powerful Observable Analysis and Active Response Engine

Stack

Scala npm

GitHub topics

#analysis #analyzer #api #cortex #cyber-threat-intelligence #dfir

Updated: 2026-05-20
Lists: 1 list mention
First commit: 2017-02-01
History: 2 history points
License: AGPL-3.0
Issues: 170 open

1,586

stars

Forks: 258
Commits: 724 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

Website GitHub

locationtech/geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.

Stack

Scala React npm Yarn

Updated: 2026-05-10
Lists: 1 list mention
First commit: 2016-11-25
History: 3 history points
License: NOASSERTION
Issues: 248 open

1,372

stars

Forks: 361
Commits: 2,520 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

TIBCOSoftware/snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

Stack

Scala Gradle pip

GitHub topics

#analytics #memory-database #scale #snappydata #spark #stream

Updated: 2022-11-21
Lists: 1 list mention
First commit: 2015-05-13
History: 20 history points
License: NOASSERTION
Issues: 117 open

1,033

stars

Forks: 198
Commits: 4,151 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

BIDData/BIDMach

CPU and GPU-accelerated Machine Learning Library

Stack

Scala Maven

Updated: 2022-10-04
Lists: 2 list mentions
First commit: 2012-10-22
History: 3 history points
License: BSD-3-Clause
Issues: 67 open

919

stars

Forks: 170
Commits: 3,024 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

Normation/rudder

Rudder is a configuration and security automation platform. Manage your Cloud, hybrid or on-premises infrastructure in a simple, scalable and dynamic way.

Stack

Scala React Warp Cargo Maven npm

GitHub topics

#auditing #automation #compliance #configuration-management #continuous-configuration #devops

Updated: 2026-06-11
Lists: 1 list mention
First commit: 2011-10-06
History: 3 history points
License: GPL-3.0
Issues: Disabled

680

stars

Forks: 87
Commits: 28,810 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

tumblr/collins

groovy kind of love

Stack

Scala pip

Updated: 2021-03-01
Lists: 1 list mention
First commit: 2011-10-28
History: 3 history points
License: Apache-2.0
Issues: 67 open

575

stars

Forks: 96
Commits: 2,471 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

twitter/storehaus

Storehaus is a library that makes it easy to work with asynchronous key value stores

Stack

Scala

Updated: 2020-07-17
Lists: 1 list mention
First commit: 2013-01-22
History: 3 history points
License: Apache-2.0
Issues: 78 open

465

stars

Forks: 81
Commits: 960 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

foursquare/twofishes

MOVED - The project is still under development but this page is deprecated.

Stack

Scala pip

Updated: 2019-01-15
Lists: 1 list mention
First commit: 2012-03-02
History: 3 history points
License: NOASSERTION
Issues: 15 open

434

stars

Forks: 61
Commits: 1,581 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

nymanjens/facto

Family Accounting Tool

Stack

Scala npm

GitHub topics

#accounting #family #personal #tool #website

Updated: 2026-02-08
Lists: 1 list mention
First commit: 2016-04-23
History: 3 history points
License: NOASSERTION
Issues: 6 open

350

stars

Forks: 7
Commits: 2,913 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

Hydrospheredata/mist

Serverless proxy for Spark cluster

Stack

Scala pytest pip

GitHub topics

#apache-spark #api #big-data #serverless

Updated: 2026-04-13
Lists: 2 list mentions
First commit: 2016-02-01
History: 3 history points
License: Apache-2.0
Issues: 32 open

325

stars

Forks: 70
Commits: 2,020 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

linkedin/isolation-forest

A distributed Spark/Scala implementation of the isolation forest and extended isolation forest algorithms for unsupervised outlier detection, featuring support for scalable training and ONNX export for easy cross-platform inference.

Stack

Scala pytest Gradle PEP 517 pip

GitHub topics

#anomaly-detection #isolation-forest #linkedin #machine-learning #onnx #outlier-detection

Updated: 2026-06-12
Lists: 2 list mentions
First commit: 2019-08-12
History: 3 history points
License: NOASSERTION
Issues: 1 open

259

stars

Forks: 54
Commits: 101 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

d2iq-archive/chaos

A lightweight framework for writing REST services in Scala.

Archived

Stack

Scala

GitHub topics

#dcos #dcos-orchestration-guild

Updated: 2019-04-15
Lists: 1 list mention
First commit: 2013-06-25
History: 2 history points
License: Apache-2.0
Issues: 12 open

249

stars

Forks: 35
Commits: 224 commits
Star growth, last 7 days: No 7-day history
Commit velocity, last 7 days: No 7-day history

GitHub

cequence-io/openai-scala-client

Scala client for OpenAI API and other major LLM providers

AI dev

Stack

Scala

GitHub topics

#anthropic #anthropic-api #aws-bedrock #chatgpt #gemini #gemini-ai

Updated: 2026-06-03
Lists: 1 list mention
First commit: 2023-01-25
History: 2 history points
License: MIT
Issues: 9 open

247

stars

Forks: 37
Commits: 1,175 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

Treode/store

The DB that's replicated, sharded and transactional.

Stack

Scala npm

Updated: 2015-10-31
Lists: 1 list mention
First commit: 2013-09-25
History: 3 history points
License: Apache-2.0
Issues: 0 open

177

stars

Forks: 21
Commits: 910 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

helgeho/ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

Stack

Scala

GitHub topics

#archivespark #internet-archive #spark #spark-framework #warc #web-archiving

Updated: 2025-10-08
Lists: 1 list mention
First commit: 2015-08-06
History: 3 history points
License: MIT
Issues: 5 open

161

stars

Forks: 19
Commits: 154 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stack

Scala Maven

GitHub topics

#analysis #apache-spark #big-data #big-data-analytics #dataframe #digital-humanities

Updated: 2025-12-05
Lists: 1 list mention
First commit: 2013-07-13
History: 3 history points
License: Apache-2.0
Issues: 5 open

158

stars

Forks: 34
Commits: 1,032 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

amplab/velox-modelserver

No description.

Stack

Scala Maven

Updated: 2017-04-17
Lists: 1 list mention
First commit: 2014-07-14
History: 3 history points
License: Apache-2.0
Issues: 23 open

110

stars

Forks: 26
Commits: 154 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

Website GitHub

nymanjens/piga

List editor for power users, backed by a self-hosted server

Stack

Scala npm

GitHub topics

#selfhosted #todolist #website

Updated: 2026-04-11
Lists: 1 list mention
First commit: 2018-07-07
History: 3 history points
License: NOASSERTION
Issues: 1 open

101

stars

Forks: 1
Commits: 1,315 commits
Star growth, last 7 days: 0 0.0%
Commit velocity, last 7 days: 0 0.0%

GitHub

Search awesome repositories

Find repositories

Put your repository first

How it works

Pricing

How it works

Pricing