Here we provide recommendations of places to find real datasets for a project to develop your data science skills. This list is definitely not exhaustive, but can provide some direction.
Repositories
- Data.gov
- California Open Data
- Los Angeles Open Data
- NYC OpenData
- Data is Plural
- UCI Machine Learning Repository
- Harvard Dataverse
- UN data
- Million Song Dataset
- IPUMS survey data from around the world
- CORGIS: The Collection of Really Great, Interesting, Situated Datasets
- PISA data
- Museum of Modern Art
- Google Dataset Search
Tips
There are thousands of datasets at some of these sites. We recommend using filters when possible and appropriate. For example, California Open Data has a lot of spatial datasets. You can use the filter options to only consider data in the form of: CSV, TXT, and XLSX files.