Ph.D. student David Vinson and Professor Rick Dale discovered people use richer language in Yelp reviews when they are giving businesses extreme ratings — either one or five stars. Patrons giving businesses three-star ratings tend to use plainer language that doesn’t convey as much information, Vinson said.
“It makes sense,” said Vinson, who is interested in information and information theory. “If I’m trying to communicate rich information, I’m going to use uncommon language.”
The paper “Valence Constrains the Information Density of Messages” was selected by Yelp as a round-two winner for the Yelp Dataset Challenge, which asks researchers to use a dataset in an innovative way. Vinson and Dale received a $5,000 prize, though it was the complex and well-organized data that attracted them to the contest.
The free, 300-megabyte dataset — roughly 330,000 reviews of more than 15,000 Phoenix businesses — has been downloaded by thousands of students around the world, the company said.
“A team of our data mining engineers selected David’s paper from many qualified submissions based on its rigor, applicability and novelty,” said Scott Clark, Yelp software engineer and Dataset Challenge director. “We are excited to research applications of the technique within our infrastructure.”
Past winning papers came from researchers with Virginia Tech, University of Toronto, Stanford University, UC Berkeley and Carnegie Mellon University.
The work is an example of how a 21st century university allows researchers to look for answers in innovative places, such as in crowd-sourced review databases.
Vinson and Dale used a computer script to crunch the data. The script looked at the variety of words used in each review — a complex task that’s only possible with a big dataset and fast processors. The dataset had 33 million words, 243,240 of them unique. The researchers then contrasted the word selection with the nature of the review — one through five stars.
While the finding is interesting, the researchers want to know if people’s emotional states change how much information they communicate in other situations. If so, the finding could be important for understanding how people organize and share information.
“People’s attitudes may change the amount of information they encode,” Dale said.
Vinson and Dale plan to conduct more research with the dataset and submit the paper to academic journals for publication.