dirty-cat/dirty_cat
Machine learning on dirty tabular data (legacy clone of skrub)
Machine learning with dataframes
Machine learning on dirty tabular data (legacy clone of skrub)
scikit-learn: machine learning in Python
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
PySpark + Scikit-learn = Sparkit-learn
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
1 capture since 2026-05-25