The limitations of online ratings and reviews

When I travel to the U.S. on business, I always turn to Yelp so I can find the best gym near me for a good workout. It never fails and I’m always satisfied.

Ori Reshef, head of data science, Clicktale

Crowd-sourced rating systems like Yelp and TripAdvisor have become part of many of our lives. Many of us rarely venture out to a new restaurant, or stay at a hotel, without first checking the rating on a crowd-sourced review app or website to see what others think about it. But the evolution of ratings has started to reveal a dark side. Let’s have a look at where we started and where we’re going.

The roots of ratings

Sites like Yelp and TripAdvisor grew out of the rating systems of the last century. These include Michelin guides, which first started rating restaurants in the 1920s; film grading that also began in the 1920s; and hotel star rating systems that got going in the 1950s. Historically, these ratings and others like them were based on an objective list of criteria and carried out by experts in their fields. Like their counterparts today, the focus of these ratings was to give customers information to help them choose the product or experience that was right for them.

Businesses saw in ratings the potential to learn more about their products and services from their customers, and leverage that information to improve. The Customer Satisfaction Score (CSAT) model entered the business world in the 1980s, offering a scientific and psychological model for measuring the gap between customer expectations before and after using a product or service. The idea was to retain customers, reduce churn and increase value. The Net Promotor Score (NPS) index, introduced in 2003, gauges customer loyalty. Major companies use NPS along with CSAT and other indexes to try to measure customer perceptions of product health.

Customers need to opt in to be surveyed, and they tend to opt in when they’re experiencing extremes of love or hate.

But these indexes are problematic. Customers need to opt in to be surveyed, and they tend to opt in when they’re experiencing extremes of love or hate. Meanwhile, those in the middle remain unaccounted for. What’s worse is that companies can manipulate results in a variety of ways, for instance, by knowing exactly when to survey people to achieve the best results. Overall, it’s difficult for businesses to act on results because no actual causes are associated with customers’ attitudes.

The power of the crowd

The evolution of rating systems was bumped up with the advent of the internet and social media. Almost overnight, any human with a data package could be a critic of, well, anything—businesses, books, hotels, restaurants, taxi drivers, and the list goes on.

Though rooted in new technology, these rating systems serve two purposes simultaneously—guidance for customers and information for businesses. They also give “the crowd” tremendous influence. The mere fact that any anonymous user can have power over a business is where it starts getting scary.

One low rating too many for an Uber driver and he can be deactivated. For every star a restaurant loses on Yelp, revenue falls by between 5-10 percent; reach one star, and you can pack up your pots and pans because your restaurant will likely go out of business.

When it came out that the parents of Ahmad Khan Rahami, charged with setting bombs in New York and New Jersey, were the owners of a small New Jersey restaurant, the crowd took revenge…via Yelp. Reviewers rated the restaurant, not for the food but for the owners’ connection to their son, despite the fact that they had earlier alerted police. The result was 13 pages of one-star reviews that forced the family to close their restaurant. This is just one of many politically motivated acts of revenge on businesses via Yelp. (See article)

Big brother gets in on the act

Looking across to China, we see another example of how ratings can turn nefarious when the government gets involved. Alibaba and the Chinese government are working together to create a social credit system that will eventually give every Chinese citizen a trustworthiness rating. By mining Big Data, the government will create a score for each citizen based on their online activity (purchases, payments), medical history, education, social behavior and more. Good scores can increase people’s chances of finding a good job, meeting a partner on a dating site, getting VIP hotel reservations without a deposit, and so on. Bad scores could prevent them from receiving a bank loan, signing a lease, or finding a life partner. Here, quite easily, a person’s entire life can be ruined by an unfavorable rating.

Transforming ratings into a force for improvement

With the quantity of data and our ability to store and process it growing every minute at a phenomenal rate, and open source algorithms democratizing data processing, the possibilities are endless. The data exists. What we choose do with it can ultimately be either Orwellian or Utopian.

When all you have are ratings numbers, your take-away is by definition, reductive. While these ratings may be useful to users, they are of limited value to companies. For complex human behavior, such as interactions on a web or mobile site, you need more complex summary data on which to base corrective action.

As head of data science at Clicktale, I think long and hard about how customer data can be used as a force for good, while protecting individuals and helping the businesses they frequent improve. Here are some lessons I’ve learned and think others could benefit from:

1) Always use an automated, unbiased method (with objective criteria) that can measure user satisfaction levels on digital channels. The best way to accurately measure satisfaction is by not mentioning that this is what you are measuring. In other words, focus on what people are doing—their behavior on your site or app – and not only on what they are saying. For example, take all the customers who left your sales funnel and then go back to the data you have, and using machine learning or other methods, try to find patterns. Let the data teach you why it happened, instead of asking questions of your data.

2) Use qualitative and quantitative methods in conjunction to gain more accurate results.

3) Use more than one index. If you don’t have the resources to go over all the data, use CSAT, CES (Customer Effort Score), and Voice of Customer. Combine your data to find contradictions. Don’t forget that you’re looking at the extremity of the curve of satisfaction. In a normal distribution curve—see example below—you’ll have extremities on both sides. If you want to move your dissatisfied customers on the left, you have to move the whole curve over so that fewer people are dissatisfied.

By literally seeing what makes customers happy, and what makes them frustrated, you’ll gain a far deeper understanding of what’s working (and what isn’t), far better than any rating could express. This is not to say that ratings are obsolete—for people looking for the best workout or restaurant in an unfamiliar city, resources like Yelp and TripAdvisor are crucial. But as a business, there is very little to learn from ratings scores. As the saying goes, talk is cheap—companies should pay more attention to what their customers are doing than to what they are saying.

Favorite

The limitations of online ratings and reviews

For complex human activities, such as how consumers shop on a website, you need more data than just what people say in reviews.

The roots of ratings

The power of the crowd

Big brother gets in on the act

Transforming ratings into a force for improvement