src-d/datasets
source{d} datasets ("big code") for source code analysis and machine learning on source code
Steam Co-Review Network: 82K game catalog + 48K-node co-review network from 128M user reviews
source{d} datasets ("big code") for source code analysis and machine learning on source code
Cross-platform attention tracking - Wikipedia pageviews, GDELT events, Google Trends (2025 weekly)
FIPS-keyed county-level US inequality data - food deserts, healthcare, housing, veterans
Disability demographics, web accessibility compliance, assistive technology usage, and related data
JanusGraph: an open-source, distributed graph database
⚽️ Extract, prepare and publish Transfermarkt datasets.
2 captures since 2026-05-23