Awesome

GitHub projects from awesome lists

Search awesome repositories

Search names, descriptions, topics, tags, and stacks, then tune results by ecosystem, freshness, health, and cross-list signal.

Continue with GitHub Browse awesome lists Request a list

Repos indexed: 9,945
Awesome lists tracked: 76
Current results: 183

Find repositories

Start broad, then narrow by ecosystem, freshness, health, and growth.

Clear 1 refinement

Search repositories

Search mode

Keyword Semantic

Tune results

The controls most people need first.

Awesome list

Language

Freshness

Sort

Direction

More filters Topics, generated tags, stack, age, archive status, and growth.

Ecosystem

GitHub topic

Generated tag

Framework or stack

Package manager

Health

Minimum stars

Repository age

Uses known first-commit dates.

Archive status

AI development signals

Momentum

Unmaintained for

Commit velocity

Star growth

Reset filters

183 repos shown

List: awesome-bigdata

Highlighted

Open highlighted repo slot

Put your repository first

Promote a GitHub repo at the top of Awesome repository list views for 7 days.

tarantool/tarantool

Get your data in RAM. Get compute close to data. Enjoy the performance.

Lua #appserver #database #disk #in-memory #lua 1 awesome list 19821 commits first commit 2010-08-12 2 history points updated 2026-05-29

★ 3,638

Website ↗ GitHub ↗

apache/incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Java npmpip #heron #messaging #streaming archived 1 awesome list 3308 commits first commit 2015-12-19 11 history points updated 2023-03-01

★ 3,634

Website ↗ GitHub ↗

Netflix/atlas

In-memory dimensional time series database.

Scala 1 awesome list 2899 commits first commit 2014-08-05 2 history points updated 2026-04-21

★ 3,553

GitHub ↗

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Go gRPC Go Vite Vue Go modulesnpmPEP 517 #bigquery #copy-database #data-ingestion #data-integration #data-pipeline AI dev signals 1 awesome list 2358 commits first commit 2024-02-12 3 history points updated 2026-06-02

★ 3,550

Website ↗ GitHub ↗

ml-tooling/ml-workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.

Jupyter Notebook #anaconda #data-analysis #data-science #data-visualization #deep-learning 3 awesome lists 847 commits 2 history points updated 2024-07-26

★ 3,538

Website ↗ GitHub ↗

twitter/scalding

A Scala API for Cascading

Scala 2 awesome lists 4195 commits first commit 2012-01-10 2 history points updated 2023-05-28

★ 3,522

Website ↗ GitHub ↗

apache/linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Java Spring Boot Vite Vue Mavennpm #application-manager #context-service #engine #hive #hive-table 1 awesome list 4315 commits first commit 2019-07-23 11 history points updated 2026-06-03

★ 3,410

Website ↗ GitHub ↗

mozilla-services/heka

DEPRECATED: Data collection and processing made easy.

Go archived 1 awesome list 4118 commits first commit 2012-10-10 2 history points updated 2024-01-23

★ 3,403

Website ↗ GitHub ↗

WeBankFinTech/DataSphereStudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Java #airflow #atlas #azkaban #dataworks #davinci 1 awesome list 12523 commits first commit 2019-11-24 2 history points updated 2025-11-04

★ 3,258

Website ↗ GitHub ↗

facebookarchive/beringei

Beringei is a high performance, in-memory storage engine for time series data.

C++ CMake archived 1 awesome list 182 commits first commit 2016-11-18 12 history points updated 2018-07-11

★ 3,155

GitHub ↗

benedekrozemberczki/pytorch_geometric_temporal

PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)

Python #deep-learning #gcn #gnn #graph-convolution #graph-convolutional-networks 3 awesome lists 2062 commits first commit 2020-06-27 4 history points updated 2026-05-30

★ 2,983

GitHub ↗

baidu/bfs

The Baidu File System.

C++ 1 awesome list 2139 commits first commit 2014-11-13 2 history points updated 2018-12-03

★ 2,849

GitHub ↗

airbnb/airpal

Web UI for PrestoDB.

Java archived 1 awesome list 448 commits first commit 2014-05-07 2 history points updated 2021-05-20

★ 2,750

Website ↗ GitHub ↗

samizdatco/arbor

a graph visualization library using web workers and jQuery

JavaScript 2 awesome lists 12 commits first commit 2011-01-12 2 history points updated 2020-04-10

★ 2,660

Website ↗ GitHub ↗

FeatureBaseDB/featurebase

A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase

Go Cobra Echo Gin gRPC Go Go modulesnpmYarn #big-data #bitmap #database #go #index archived 1 awesome list 5576 commits first commit 2013-10-16 10 history points updated 2024-02-21

★ 2,527

Website ↗ GitHub ↗

numaproj/numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

Rust #data-processing #hacktoberfest #k8s #kubernetes #map-reduce 1 awesome list 1791 commits first commit 2022-05-20 2 history points updated 2026-05-29

★ 2,487

Website ↗ GitHub ↗

griddb/griddb

GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

C++ CMake.NET SDKMaven #bigdata #database #fast #griddb #iot 1 awesome list 500 commits first commit 2016-02-24 12 history points updated 2026-03-19

★ 2,471

Website ↗ GitHub ↗

influxdata/kapacitor

Open source framework for processing, monitoring, and alerting on time series data

Go #kapacitor #monitoring #time-series 1 awesome list 2056 commits first commit 2015-08-31 2 history points updated 2026-05-26

★ 2,368

GitHub ↗

benedekrozemberczki/karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Python #2vec #community-detection #deepwalk #gcn #graph-clustering 3 awesome lists 2319 commits first commit 2019-12-05 4 history points updated 2024-07-17

★ 2,281

Website ↗ GitHub ↗

datopian/portaljs

🌀 AI-native framework for building data portals. Scaffold a full portal from a brief and load datasets in minutes with agentic skills — any backend (CKAN, GitHub, Frictionless).

TypeScript Express Next.js React Tailwind CSS npmYarn #ai #ai-agents #ckan #data-fabric #data-management-platform AI dev signals 1 awesome list 3050 commits first commit 2011-03-09 10 history points updated 2026-06-01

★ 2,280

Website ↗ GitHub ↗

apache/gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Java Gradlepip #apache #data #ingestion #management #replication 1 awesome list 6552 commits first commit 2014-02-04 11 history points updated 2026-06-01

★ 2,265

Website ↗ GitHub ↗

microsoft/GraphEngine

Microsoft Graph Engine

C# CMake.NET SDK #distributed-computing #dotnet #graph-engine #graph-query-language #in-memory-computations 1 awesome list 1795 commits first commit 2017-02-09 10 history points updated 2024-10-08

★ 2,252

GitHub ↗

twitter/summingbird

Streaming MapReduce with Scalding and Storm

Scala archived 2 awesome lists 1793 commits first commit 2012-09-25 2 history points updated 2022-01-19

★ 2,126

Website ↗ GitHub ↗

mara/mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

Python pytest PEP 517pip #data #data-integration #etl #pipeline #postgresql 1 awesome list 172 commits first commit 2018-04-08 12 history points updated 2023-12-15

★ 2,085

GitHub ↗

tinkerpop/gremlin

A Graph Traversal Language (no longer active - see Apache TinkerPop)

Java 1 awesome list 1227 commits first commit 2009-11-07 2 history points updated 2021-08-16

★ 1,952

Website ↗ GitHub ↗

baidu/tera

An Internet-Scale Database.

C++ #baidu #bigtable #c-plus-plus #data #database 1 awesome list 2252 commits first commit 2014-03-26 2 history points updated 2024-06-05

★ 1,904

GitHub ↗

biokoda/actordb

ActorDB distributed SQL database

Erlang 1 awesome list 397 commits first commit 2014-01-21 2 history points updated 2022-11-10

★ 1,889

GitHub ↗

pinterest/secor

Secor is a service implementing Kafka log persistence

Java #kafka 1 awesome list 3338 commits first commit 2014-04-15 2 history points updated 2026-03-10

★ 1,859

GitHub ↗

gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties

Java #accumulo #aggregation #big-data #graph #graph-database archived 1 awesome list 7332 commits first commit 2015-12-14 2 history points updated 2025-06-06

★ 1,794

GitHub ↗

linkedin/ambry

Distributed object store

Java AI dev signals 1 awesome list 4223 commits first commit 2013-09-11 2 history points updated 2026-05-30

★ 1,788

Website ↗ GitHub ↗

Search awesome repositories

Find repositories

Put your repository first

How it works

Pricing

How it works

Pricing