Plot Twist

Michael Wagstaff • 15 January 2024

Deriving insight from Trustpilot data

The use of ratings and review sites is a well known source of insight for brands. Previous articles, such as this one on harnessing publicly available data, have shown the valuable insight that can be derived from techniques such as sentiment analysis.


In this article we take a slightly different twist by looking at the insight that can be derived from analysing the behaviour of reviewers.


In the first part we discuss tree plots. This is a visual technique that provides insight on the buying behaviours of reviewers.  In the second part, we look at how regression analysis can be used to derive drivers of loyalty.


Why is a tree plot good for insight generation?

A tree plot, also known as a tree map, is a data visualisation technique used to display hierarchical data using nested rectangles. When thinking of a tree plot, picture a big box that represents all data, divided into smaller boxes for each category, with the size of each box showing how big or important that category is. These boxes can be further split into even smaller ones to show sub-categories, letting you see at a glance how different parts of the data compare in size and importance.


When applied to customer reviews, a tree plot can provide insights into the brand's audience and their interests or buying behaviours by showing the categories that a brand's reviewers also review. 


  1. Customer Interests: By seeing the other categories that customers of a brand review, that brand can understand what other products or services their customers are interested in. This can guide cross-promotional activities, partnerships, or expansions.
  2. Market Positioning: The brand can see how it's positioned in the market in relation to other sectors. If brand reviewers often review high-end clothing stores, for example, it might suggest that the brand appeals to a more affluent demographic.
  3. Cross-Selling Opportunities: Understanding the other interests of their customers can help the brand identify opportunities to cross-sell related products or services.
  4. Targeted Marketing: Insights from the tree plot can inform targeted marketing strategies. If a significant number of customers are also reviewing health and wellness services, for instance, marketing campaigns could be tailored to this interest.
  5. Customer Segmentation: The brand can segment its customer base more effectively by using the insights to create specific profiles based on the other categories of products or services they review.
  6. Competitive Analysis: It can provide an understanding of the competitive landscape by showing what other types of businesses attract the brand's customers.
  7. Product Development: Insights into what other categories customers are interested in can inform new product development strategies.


Tree Plot case study method


To illustrate the insight that a tree plot can provide, we have applied the technique to reviews left on Trustpilot for supplements and sports nutrition brand Myprotein. 


To generate the tree plot, we downloaded the review history of all Myprotein reviewers to identify what other brands they reviewed. From this we are able to plot the categories that they shop and their relative importance. Trustpilot uses a hierarchical system to categorise brands. Trustpilot categorises Myprotein in the Beauty & Well-being category and Personal Care and Fitness and Nutrition sub categories.


The categorisation enables comparisons to be made with direct rivals in the supplement and sports nutrition market but also more widely with brands in adjacent markets. It should be noted that the categorisations used are those recorded by Trustpilot and it is possible that some brands could be incorrectly assigned - for example JD Sports is assigned to the Events and Entertainment/Wedding & Party/Gift Shops categories. This and other examples or obviously incorrectly categorised brands were manually reclassified.

The Myprotein Tree Plot

The interactive graphic below shows the relative importance to Myprotein reviewers of each category and sub-category. The graphic is fully interactive so clicking on the tiles takes you down further to individual brands.


For Myprotein, the Shopping and Fashion category has the highest of reviews left by people who also reviewed Myprotein. Within that, the Clothing and Underwear sub category is the most important with Clothing Shops the most important with this sub category. Superdry, Next, Very and ASOS are the most commonly reviewed brands. This makes a lot of sense because as well as selling protein powders and nutrional supplements, Myprotein also sells activewear. Immediately, therefore, we see the brands that are in the minds and shopping carts of people who buy Myprotein products.


We also see the importance of less tangental markets. For example, online markets, price comparison websites and jewellery shops all feature among the reviews left by Myprotein reviewers. This is useful information because from these we can start to infer some of the characteristics of Myprotein purchasers that we are not able to get directly from Trustpilot. So for example, the use of price comparison sites suggests people that are keen on saving money. The Baby Shop tile suggests a young family demographic. In other categories we see airport car parking brands which points to people going on holidays abroad. There are categories that point to lifestyles as well as life stages. This is great for segmentation. The tree plot also identifies other brands that are important to Myprotein shoppers. This is great information for partnerships and brand associations.


It's important to remember that the data discussed above relates to Myprotein customers who leave reviews on Trustpilot. These may not necessarily be representative of Myprotein customers but should be assumed to be indicative of them.


Using Trustpilot data to identify the drivers of loyalty

We can also use Trustpilot data to identify factors that can influence loyalty. We define loyalty when a reviewer leaves more than one review for the brand with a minimum gap of ten days between reviews. We decided to leave this gap based on a widescale analysis of reviews over time, in order to cover for the situation where a person edits or adds to their initial review because there was a problem.


Using a logistical regression, we modelled the relationship between the categories that Myprotein reviewers also leave reviews for and whether they leave multiple reviews (a proxy for purchases) of Myprotein products. For the regression, we used data from all reviewers from when the brands Trustpilot account was established until late 2023. We also decided to look at the lowest hierarchy level of review category, as we believed that would gives us the most detailed analysis.


While some potentially important variables were unavailable (e.g age, income, location), we managed to generate a gender variable using a text analysis model. We accounted for factors which also have a relationship with recurring reviews of Myprotein products such as gender (men tend to more frequently leave multiple Myprotein reviews) and review frequency (people who review more frequently tend to more frequently leave multiple Myprotein reviews) by including them in the regression.


The top 5 drivers are shown in the table below. The results are expressed as odds ratios. These represent the odds that a Myprotein customer who has reviewed a product in a different category will leave multiple reviews for Myprotein, compared to the odds that a Myprotein customer will leave multiple reviews for Myprotein in the absence of a review in another category.


The table below presents the mean estimate of the odds ratio for the customers included within the regression and the range in which we would expect the true value of the odds ratio to fall 95% of the time given our regression criteria.


Category Odds ratio (mean) Odds ratio (low) Odds ratio (high)
Gym 1.85 1.59 2.15
Fitness and Nutrition 1.6 1.5 1.71
Exercise Equipment shop 1.43 1.27 1.62
Activewear 1.41 1.29 1.54
Health Food Shop 1.41 1.22 1.62


These odds ratios suggest that reviewers who are active in categories related to gym, fitness and nutrition, exercise equipment, activewear and health food are more likely to leave multiple reviews for Myprotein. This indicates a stronger engagement with the brand among customers interested in these lifestyle areas.


It is also important to keep in mind that there might be other intervening factors at play whether that be a variable not considered within the regression (ie age or location) or an artificial influence on the customer reviews. One such artificial influence could be a surge in reviews for a particular company as a result of a considered strategy by the company rather than a natural result of more people buying their product and leaving reviews. Thus the categories these customers also review may be subject to an unforseen time dependence, as a certain category may be particular popular at the same time as the companies strategy or the company may be trying to actively gain a customer base within a particular niche as part of its strategy. Some of this could potential be mitigated by looking at results over multiple different time periods, which is something that we would look at in a more thorough analysis.


Despite the caveats, the information derived from the analysis is  valuable. Myprotein could use this information to further target and tailor their marketing efforts to these segments to encourage repeat purchases and reviews.


Overall

There are some limitations to a tree plot analysis and regression analysis derived from Trustpilot data. The major limitation is the lack of demographic data such as age and location that might explain purchase behaviour and review behaviour better. Inferences can be made based on categories purchased, but these will only ever give partial demographic insight.


Another drawback is that many reviewers only leave one review and so there may be too few reviews in some of the smaller sub categories to derive any meaningful insight. There are issues relating to mis-categorisation that need to be addressed.

And finally, given that this is just a surface level analysis rather than something more comprehensive there are still other routes for analysis such as investigating reviews over different time periods, looking into seasonality or the topics expressed in review content, to name a few.


Despite these limitations, this case study has shown the depth of insight that can be derived from an analysis of Trustpilot data and the way the information can be used to explain the market and a brands place in it, who buys from that brand and what other brands they buy. It can also give a good steer on which category buyers can be targeted and which brands are ideal for associations and partnerships.


by Michael Wagstaff 8 April 2025
The huge volume of data available through consumer comments, reviews and surveys can make cutting through the noise difficult. In this article we discuss how text analytics combined with human expertise can uncover the insight.
by Michael Wagstaff 10 March 2025
Market research agencies are going all in on AI based models to generate next level consumer insight. But are these just more illusion than substance?
by Michael Wagstaff 3 March 2025
With the online survey already on the ropes due to poor quality, has data science finished it off?
by Michael Wagstaff 24 February 2025
Research agencies are pinning their futures on AI. Are they right to do so or are we missing trick by ditching the human?
by Michael Wagstaff 12 February 2025
Online surveys suffer from fake, ill considered and unrepresentative responses. What can be done to improve their reliability? Triangulation is the key.
by Michael Wagstaff 11 February 2025
With so many agency panels riddled with fake respondents resulting in poor quality data, are we witnessing the end of the online survey?
by Michael Wagstaff 6 February 2025
With the January transfer window closed, we run our predictive model to work out the probabilities of where each team will finish in the final Premier League table.
by Michael Wagstaff 5 February 2025
In this latest article in our series on the power of text analytics we look at how sentiment analysis can be used to really understand what customers think about product offerings.
by Michael Wagstaff 23 January 2025
The true value of review sites lies in going beyond the stars and analysing what reviewers are actually saying.
by Michael Wagstaff 17 January 2025
Start making sense of customer review and feedback data by using text analytics. In the first of a series of articles we discuss how it can help your business.
Show More