Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
Probably the best curated list of data science software in Python.
GitHub stars and default-branch commits for krzjoa/awesome-python-data-science.
351 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
An Open Source Machine Learning Framework for Everyone
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Open Source Computer Vision Library
Apache ECharts is a powerful, interactive charting and data visualization library for browser
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Extremely fast Query Engine for DataFrames, written in Rust
A toolkit for developing and comparing reinforcement learning algorithms.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Caffe: a fast open framework for deep learning.
Visualizer for neural network, deep learning and machine learning models
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.
A game theoretic approach to explain the output of any machine learning model.
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Graph Neural Network Library for PyTorch
matplotlib: plotting with Python
Open standard for machine learning interoperability
Interactive Data Visualization in the browser, from Python
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Train transformer language models with reinforcement learning.
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Datasets, Transforms and Models specific to Computer Vision
Network Analysis in Python
🎨 Python Echarts Plotting Library
🦉 Data Versioning and ML Experiments
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Image augmentation for machine learning experiments.
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
NLTK Source
A toolkit for making real world machine learning and data analysis applications in C++
A very simple framework for state-of-the-art Natural Language Processing (NLP)
Python package built to ease deep learning on graph, on top of existing DL frameworks.
A hyperparameter optimization framework
Statistical data visualization in Python
Parallel computing with task scheduling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Lime: Explaining the predictions of any machine learning classifier
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
Low-code framework for building custom LLMs, neural networks, and other AI models
Always know what to expect from your data.
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Statsmodels: statistical modeling and econometrics in Python
LAVIS - A One-stop Library for Language-Vision Intelligence
NumPy & SciPy for GPU
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
An elegant PyTorch deep reinforcement learning library.
Modin: Scale your Pandas workflows by changing a single line of code