Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
Curated list of the best truly open-source AI projects, models, tools, and infrastructure.
GitHub stars and default-branch commits for alvinreal/awesome-opensource-ai.
767 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
Dead simple FLUX LoRA training UI with LOW VRAM support
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
Train, inspect, edit, automate, and export 3D Gaussian Splatting scenes from a single native application.
Scalable and efficient data transformation framework - backwards compatible with dbt.
Visualize and compare datasets, target values and associations, with one line of code.
(Crystal is now Nimbalyst) Run multiple Codex and Claude Code AI sessions in parallel git worktrees. Test, compare approaches & manage AI-assisted development workflows in one desktop app.
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
Postgres extension for vector search (DiskANN), complements pgvector for performance and scale. Postgres OSS licensed.
RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology.
OneTrainer is a one-stop solution for all your Diffusion training needs.
An open-source visual programming environment for battle-testing prompts to LLMs.
The Security Toolkit for LLM Interactions
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
A framework for prompt tuning using Intent-based Prompt Calibration
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data integrations. Use Python, R, and SQL locally in your favorite IDE, then scale to Deepnote cloud for real-time collaboration, Deepnote agent, and deployable data apps. https://deepnote.com/
Elegant easy-to-use neural networks + scientific computing in JAX. https://docs.kidger.site/equinox/
Data manipulation and transformation for audio signal processing, powered by PyTorch
🦜💬 Web app for interacting with any LangGraph agent (PY & TS) via a chat interface.
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
PyTorch native quantization and sparsity for training and inference
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
The hub for EleutherAI's work on interpretability and learning dynamics
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.
pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidb.ai
A library for debugging/inspecting machine learning classifiers and explaining their predictions
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
The pretty much "official" DSPy framework for Typescript
Ruler — apply the same rules to all coding agents
Minimalistic large language model 3D-parallelism training
Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation to GPU/TPU/CPU.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
nanoflann: a C++11 header-only library for Nearest Neighbor (NN) search with KD-trees
llama.cpp fork with additional SOTA quants and improved performance
The memory-first coding agent
Modular Python framework for AI agents and workflows with chain-of-thought reasoning, tools, and memory.
Kubernetes-native Job Queueing
Maid is a free and open source application for interfacing with llama.cpp models locally, and with Anthropic, DeepSeek, Ollama, Mistral and OpenAI models remotely.
[NeurIPS 2025] OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to intricate anime characters.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.
An Open Standard for lineage metadata collection