Open highlighted repo slot
Put your repository first
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Awesome List
A topic-centric list of HQ open datasets.
GitHub stars and default-branch commits for awesomedata/awesome-public-datasets.
75 repos currently saved from this list.
Open highlighted repo slot
Promote a GitHub repo at the top of Awesome repository list views for 7 days.
Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
A repository of data on coronavirus cases and deaths in the U.S.
World countries in JSON, YAML, CSV and XML. Any help is welcome!
:globe_with_meridians: List of all countries with names and ISO 3166-1 codes in all languages and data formats.
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
FMA: A Dataset For Music Analysis
Bruteforce database
ATP Tennis Rankings, Results, and Stats
🗺 High Quality GeoJSON maps programmatically generated.
World’s single largest Internet domains dataset
Twitter NLP Tools
Uber trip data from a freedom of information request to NYC's Taxi & Limousine Commission
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
Tate Collection metadata
Collection of open data resources for traffic information
MOVED - The project is still under development but this page is deprecated.
⚽️ Extract, prepare and publish Transfermarkt datasets.
Core meta for awesome-public-datasets. Contribute new data here!
source{d} datasets ("big code") for source code analysis and machine learning on source code
Data for Automatic Keyphrase Extraction Task
WTA Tennis Rankings, Results, and Stats
Large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain
Ultra-deep search for novel viruses
This repo is designed to gather bike share data best practices AND socialize a list of open and free tools to hack on bike share data. This grows from Council Member Brad Lander introducing Int. No 1117-2013 on 24 July 2013. This is a local law to amend the administrative code of the City of New York, in relation to requiring the complication of Citi Bike usage data.
Collection Data for Cooper Hewitt, Smithsonian Design Museum
All-Age-Faces (AAF) Database.
No description.
This data set includes Landsat 8 images and their manually extracted pixel-level ground truths for cloud detection.
The Turing Change Point Dataset - A collection of time series for the evaluation and development of change point detection algorithms
Global Biotic Interactions provides access to existing species interaction datasets
Java-Based Context-aware Recommendation Library
The first realistic and public dataset with rare undesirable real events in oil wells.
American Gut open-access data and IPython notebooks
Datasets of the daily Twitter output of Congress.
Proton Exchange Membrane (PEM) Fuel Cell Dataset
COVID-19 datasets are constructed entirely from primary (government and public agency) sources
Lemons quality control dataset
Mia collection metadata
Datos sin procesar extraído, limpiado, y normalizado de los informes de la situación nacional frente a la Emergencia Sanitaria SARS-CoV2 (COVID-19) de SNGRE, MSP, Registro Civil, e INEC.
A repo containing various data (demographics, employment, etc.) in JSON form.
An air travel dataset consisting of user reviews from Skytrax (www.airlinequality.com)
This repo contains a set of Arabic newspaper articles alongwith metadata, extracted from various Saudi newspapers.
The Washington Post's analysis of NOAA climate change data for the contiguous United States
Cube++ is a novel dataset collected for illumination estimation problem. It has 4890 raw 18-megapixel images, each containing a SpyderCube color target in their scenes, manually labelled categories, and ground truth illumination chromaticities.
Simple but fast reverse geocoding up to city granularitiy level
No description.
🗳️+👀 A platform to protect elections in a disinformation world.
The tracebase appliance-level power consumption data set
No description.
Accumulated shadow data computed for New York City