Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome-list intelligence for GitHub
Discover projects curated by awesome-list maintainers, then narrow them by stars, age, freshness, archive status, language, topics, generated tags, detected stacks, package managers, and source list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
The modern replacement for Jupyter Notebooks
CLI task management & automation tool
A Julia machine learning framework
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
Machine learning with dataframes
Keras community contributions
MLBox is a powerful Automated Machine Learning python library.
The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.
Python library for time series forecasting using scikit-learn compatible models, statistical methods, and foundation models
moDel Agnostic Language for Exploration and eXplanation
visual data prep powered by python
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.
Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.
[CONTRIBUTORS WELCOME] Generalized Additive Models in Python
Epsilla is a high performance Vector Database Management System
👩🏫 👨🏫 The open-source curriculum of Enki!
A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.
Easy pipelines for pandas DataFrames.
RAGLight is a modular framework for Retrieval-Augmented Generation (RAG). It makes it easy to plug in different LLMs, embeddings, and vector stores, and now includes seamless MCP integration to connect external tools and data sources.
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
Core meta for awesome-public-datasets. Contribute new data here!
Datasets & Analyses for Formula 1 World Championship
Spatial Representations for Artificial Intelligence - a Python library toolkit for geospatial machine learning focused on creating embeddings for downstream tasks