Data McDonalds-Yelp-Sentiment-DFE.csv is get from https://www.crowdflower.com
The dataset contains 1525 abservations of 10 variables:
A sentiment analysis of negative McDonald’s reviews. Contributors were given reviews culled from low-rated McDonald’s from random metro areas and asked to classify why the locations received low reviews. Options given were:
The column names of data
## [1] "X_unit_id" "X_golden"
## [3] "X_unit_state" "X_trusted_judgments"
## [5] "X_last_judgment_at" "policies_violated"
## [7] "policies_violated.confidence" "city"
## [9] "policies_violated_gold" "review"
Summary the data, the content of ‘review’ column is too long. So we ignore it
## X_unit_id X_golden X_unit_state X_trusted_judgments
## Min. :679455653 Mode :logical finalized:1525 Min. :3
## 1st Qu.:679456040 FALSE:1525 1st Qu.:3
## Median :679456428 NA's :0 Median :3
## Mean :679456981 Mean :3
## 3rd Qu.:679456819 3rd Qu.:3
## Max. :679501402 Max. :3
##
## X_last_judgment_at policies_violated policies_violated.confidence
## 2/21/15 0:36: 108 na :295 1 :679
## 2/21/15 0:38: 81 RudeService :177 1.0\n1.0 : 93
## 2/21/15 0:22: 72 SlowService :127 : 54
## 2/21/15 0:29: 72 OrderProblem:116 0.6667 : 18
## 2/21/15 0:25: 54 BadFood :101 1.0\n0.6667 : 17
## 2/21/15 0:13: 45 ScaryMcDs : 71 1.0\n1.0\n1.0: 11
## (Other) :1093 (Other) :638 (Other) :653
## city policies_violated_gold
## Las Vegas :409 Mode:logical
## Chicago :219 NA's:1525
## Los Angeles:167
## New York :165
## Atlanta :130
## Houston :105
## (Other) :330
The detail of data
## X_unit_id X_golden X_unit_state X_trusted_judgments X_last_judgment_at
## 1 679455653 FALSE finalized 3 2/21/15 0:36
## 2 679455654 FALSE finalized 3 2/21/15 0:27
## 3 679455655 FALSE finalized 3 2/21/15 0:26
## 4 679455656 FALSE finalized 3 2/21/15 0:27
## 5 679455657 FALSE finalized 3 2/21/15 0:27
## 6 679455658 FALSE finalized 3 2/21/15 0:13
## policies_violated policies_violated.confidence city
## 1 RudeService\nOrderProblem\nFilthy 1.0\n0.6667\n0.6667 Atlanta
## 2 RudeService 1 Atlanta
## 3 SlowService\nOrderProblem 1.0\n1.0 Atlanta
## 4 na 0.6667 Atlanta
## 5 RudeService 1 Atlanta
## 6 BadFood\nSlowService 0.7111\n0.6444 Atlanta
## policies_violated_gold
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
and the content of ‘review’ column
## [1] I'm not a huge mcds lover, but I've been to better ones. This is by far the worst one I've ever been too! It's filthy inside and if you get drive through they completely screw up your order every time! The staff is terribly unfriendly and nobody seems to care.
## [2] Terrible customer service. ξI came in at 9:30pm and stood in front of the register and no one bothered to say anything or help me for 5 minutes. ξThere was no one else waiting for their food inside either, just outside at the window. ξ I left and went to Chickfila next door and was greeted before I was all the way inside. This McDonalds is also dirty, the floor was covered with dropped food. Obviously filled with surly and unhappy workers.
## 1518 Levels: "And on the seventh day, he forsook rest, but opened THE FIRST McDONALDS to quest his famished soul, yea." ξI may be exaggerating on the power of Mickey Ds, but only cause they can sue. Anyway, this particular McDonalds is inside the Wal Mart on Forest Ln., and forms a nice reprieve before or after your purchases. No drive-thru, but there is another McDonalds just down the road, on Abrams, that does. Anywho, don't go here often, folks. It's nice for a forgotten lunch or a celebratory match, but don't make it a habit. Deuces, doc. ...
There are some fields that there is no meaning. Such as:
Let’s see the detail after cleaning
## [1] 1525 4
## [1] "policies_violated" "policies_violated.confidence"
## [3] "city" "review"
Summary the data
## policies_violated policies_violated.confidence city
## na :295 1 :679 Las Vegas :409
## RudeService :177 1.0\n1.0 : 93 Chicago :219
## SlowService :127 : 54 Los Angeles:167
## OrderProblem:116 0.6667 : 18 New York :165
## BadFood :101 1.0\n0.6667 : 17 Atlanta :130
## ScaryMcDs : 71 1.0\n1.0\n1.0: 11 Houston :105
## (Other) :638 (Other) :653 (Other) :330
How many records that each city has?
There are relationship between ‘policies_violated’ vs. ‘policies_violated.confidence’. We should create new fields base on ‘policies_violated’ with value from ‘policies_violated.confidence’
Firstly, we should remove the missed value. policies_violated = ‘na’ or policies_violated = ’’ is no meaning. So remove those records
## [1] 295 4
## [1] 54 4
Let’s see the result
## [1] "policies_violated" "policies_violated.confidence"
## [3] "city" "review"
## [5] "RudeService" "OrderProblem"
## [7] "Filthy" "SlowService"
## [9] "BadFood" "ScaryMcDs"
## [11] "MissingFood" "Cost"
## [13] "na"
## RudeService OrderProblem Filthy SlowService BadFood ScaryMcDs
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## 7 0 0 0 0 0 0
## MissingFood Cost na
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 5 0 0 0
## 6 0 0 0
## 7 0 0 0
## [1] "policies_violated" "policies_violated.confidence"
## [3] "city" "review"
## [5] "RudeService" "OrderProblem"
## [7] "Filthy" "SlowService"
## [9] "BadFood" "ScaryMcDs"
## [11] "MissingFood" "Cost"
## RudeService OrderProblem Filthy SlowService BadFood ScaryMcDs
## 1 1.0 0.6667 0.6667 0 0 0
## 2 1 0 0 0 0 0
## 3 0 1.0 0 1.0 0 0
## 5 1 0 0 0 0 0
## 6 0 0 0 0.6444 0.7111 0
## 7 0 0 0 0.6562 0 0.6562
## MissingFood Cost
## 1 0 0
## 2 0 0
## 3 0 0
## 5 0 0
## 6 0 0
## 7 0 0
Next:
Let’s see the result:
## RudeService OrderProblem Filthy SlowService BadFood ScaryMcDs
## 1 0.8370684 0.8590107 0.9361200 0.8636744 0.8655926 0.7032000
## 2 0.8856359 0.8812910 0.8800533 0.8869545 0.8445385 0.8957571
## 3 0.8353583 0.8759167 0.7656571 0.8304500 0.9057643 0.8073900
## 4 0.8446567 0.8720238 0.9164625 0.8834364 0.8355577 0.7997000
## 5 0.8641813 0.8832980 0.8514467 0.8560000 0.8407037 0.8460267
## 6 0.8886724 0.8328750 1.0000000 0.8966667 0.9103333 0.6654000
## MissingFood Cost city
## 1 0.7445800 0.8905667 Atlanta
## 2 0.7487538 0.7504200 Las Vegas
## 3 0.6752333 0.7980600 Dallas
## 4 0.6909000 0.7322167 Portland
## 5 0.6639600 0.8399625 Chicago
## 6 0.8921333 0.8996000 Cleveland
In-progress!