Want to try predicting business categories with a fancy clustering algorithm? How about predicting star ratings using sentiment analysis? Or maybe you want to build a cool visualization of great local businesses?
Yelp is providing all the data and reviews of the 250 closest businesses for 30 universities for students and academics to explore and research. We've provided some examples to get you started. You can check out the source on our GitHub page.
You'll need to have an active Yelp account, access to the Yelp API, and agree to the dataset access agreement to access the dataset. Once you've completed all those steps, you can download the dataset from this page.
To request access, you'll need a valid Yelp API token, called a YWSID.
Create a Yelp API AccountUsage of this dataset is governed by the Academic Dataset Terms of Use
The dataset is a single gzip-compressed file, composed of one json-object per line. Every object contains a 'type' field, which tells you whether it is a business, a user, or a review.
Business objects contain basic information about local businesses. The 'business_id' field can be used with the Yelp API to fetch even more information for visualizations, but note that you'll still need to comply with the API TOS. The fields are as follows:
{
'type': 'business',
'business_id': (a unique identifier for this business),
'name': (the full business name),
'neighborhoods': (a list of neighborhood names, might be empty),
'full_address': (localized address),
'city': (city),
'state': (state),
'latitude': (latitude),
'longitude': (longitude),
'stars': (star rating, rounded to half-stars),
'review_count': (review count),
'photo_url': (photo url),
'categories': [(localized category names)]
'open': (is the business still open for business?),
'schools': (nearby universities),
'url': (yelp url)
}
Review objects contain the review text, the star rating, and information on votes Yelp users have cast on the review. Use user_id to associate this review with others by the same user. Use business_id to associate this review with others of the same business.
{
'type': 'review',
'business_id': (the identifier of the reviewed business),
'user_id': (the identifier of the authoring user),
'stars': (star rating, integer 1-5),
'text': (review text),
'date': (date, formatted like '2011-04-19'),
'votes': {
'useful': (count of useful votes),
'funny': (count of funny votes),
'cool': (count of cool votes)
}
}
User objects contain aggregate information about a single user across all of Yelp (including businesses and reviews not in this dataset).
{
'type': 'user',
'user_id': (unique user identifier),
'name': (first name, last initial, like 'Matt J.'),
'review_count': (review count),
'average_stars': (floating point average, like 4.31),
'votes': {
'useful': (count of useful votes across all reviews),
'funny': (count of funny votes across all reviews),
'cool': (count of cool votes across all reviews)
}
}
We're hiring! Check out available jobs at our jobs page.
Yelp's dataset includes information for businesses near these 30 schools:
Don't see your school on the list? Any other feedback? Send us an email at dataset@yelp.com
.