by boxplot    Dec 18, 2018 11:02 pm  

A list of freely available data on the web. The first list is sites we think are the best for accessing quality datasets. Below that are additional sources by category.

Best Sources

Kaggle

By far our personal favorite! There are dozens if not hundreds of quality datasets available here.

ICPSR

You have to create an account, but it’s free and they have a pretty extensive list of datasets you can download directly from their site.

US Open Data

The open data site for the United States government. There are dozens if not hundreds of datasets linked here.

Yelp Open Data

This is a single dataset (but a large one) offered by Yelp.

General Social Survey

This is another single dataset (and another large one!) offered by the University of Chicago.

Federal Committee on Statistical Methodology

You’ll have to work a bit harder to use one of these datasets – you have to choose the release, then choose the data tab, and then you may get the data in a relatively dirty format and have to clean it. But there are a lot of options on this site, and more datasets are added regularly.

Police Data Initiative

A list of datasets from local police departments.

Makeover Monday

This is the website for a data visualization book. They make the data easy to access, but there might be duplicate datasets from some of the other sources on this page (like Data.World).

KDNuggets

They have a few pages on their website that offer lists of datasets. There’s “Datasets for Data Mining and Data Science“, “Data: APIs, Hubs, Marketplaces, and Platforms“, and “Data: Government, State, City, Local and Public“.

Data.World

You have to dig a bit to find the datasets on this site. You also may need to create an account to access them/get the full list of datasets.

This Quora Post

Somebody asked this exact question on Quora and got a pretty extensive list of data sources as a response!

Stats2Labs

This site maintains a sorted list of datasets.

A Facebook Page

Do you manage or know someone that manages a Facebook page? Maybe a local nonprofit? You can easily download the data as a .csv.

Other Sources by Category

GOVERNMENT DATA

  • FEC Candidate SPending – Dataset

CITY-SPECIFIC DATA

FINANCE DATA

EDUCATION DATA

  • NAEP – National Assessment of Educational Progress – aka “The Nation’s Report Card” . NAEP is the largest nationally representative and continuing assessment of what America’s students know and can do in various subject areas.

HEALTHCARE DATA

  • The Behavior Risk Factor Surveillance System (BRFSS). Nation’s premier health-related telephone surveys that collect data about U.S. residents health-related risk behaviors, chronic health conditions, and use of preventive services. >400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.

SOCIAL MEDIA DATA

  • Google Trends
    • Tip: Refine your search, click on the icon on the upper right-hand corner of page, and download as a CSV.

SPORTS DATA

WEATHER DATA

RELATIONAL DATA

OTHER

  • Astrostatistics.org – Interested in Astrostatistics? This source is for you! It’s a list of astrostatistics-related datasets.
  • Pewresearch.org – Pew is well-known for its surveys and it publishes some of the results. However you will need to make an account to access the data, and most datasets are in .sav format (which is for SPSS software). You may need to convert them in order to use them.

Need help applying these concepts to your data? Ask your department or organization to subscribe to our DataChat program.

Subscribe  

Book A Single DataChat  


Continue to make data-driven decisions.

Sign up for our email guides that contains relevant tips, software tricks, and news from the data world.

*We never spam you or sell your information.

* indicates required

"Installing & Running Jupyter Notebook"

"100th Anniversary of Variance"


2 Shares
Share
Tweet
Share2