Posted: 27 Oct. 2015 10 min. read

The diversity algorithm: rethinking your customer segments

Approaches to customer segmentation have evolved beyond categorising people into broad demographic-based groups like ‘women’ or ‘youth’.

Advances in data capture and analysis help to segment customers in detailed and unique ways, enabling brands to personalise the customer experience. This article gives a 101 on using algorithms and discusses how they can be applied to customer diversity to create competitive advantage in a customer-first world.

Over the last two years, the digital world has been deploying feature after feature intended for diverse customers. For example, Facebook and Google+ changed ‘gender’ from a binary option to a free text field; Apple and WhatsApp added emojis with multiple skin tones; Samsung launched a video call centre with staff trained in sign language to cater for customers with a hearing disability.

For these high-profile brands, any change to an interface or a channel requires money, effort from specialist teams and executive endorsement. But how did these brands know that personalisation would make a difference to the customer experience and deliver a return on investment?

Led by Dr Pedro Quelhas Brito (University of Porto / Universidade do Porto), researchers in Portugal explored the topic of personalisation by applying a mix of algorithms to large sets of customer data to find subgroups that were valuable; meaning the subgroups were i) distinct ii) high volume and iii) adequately detailed so that the marketing or product development team could act with confidence.

By analysing large volumes of data from – a global e-retailer of custom-made shirts – they were able to pinpoint the buying patterns of very specific customer segments, e.g. overweight men in the UK are more likely to want to buy white shirts for work, and women in France are more likely to use vouchers for discretionary spending on evening tops.


This research aimed to explore how analysis or large sets of customer data can find valuable groups of customers. They used data from whose key offering of shirt customisation “renders the production and logistics processes of the company very complex”.

“[It] also presents complex challenges to the company, specifically those with regard to achieving and maintaining its profitability, efficiency and productivity goals while keeping up with the needs and trends of its customers in order to satisfy them.”

The researchers hypothesised that a detailed analysis of their data would provide “different and complementary perspectives on the customers” to the extent that the results would be useful and actionable for the business to optimise their products and how they sell them.


The researchers used ‘data mining’ techniques to identify patterns and trends based on data collected from past customer interaction

They used 10755 customer orders, which contained many possible data variables including:

  • Product characteristics (what they bought): Type of fabric, Fabric colour, Collar type, Fabric structure
  • Demographic and biometric data (who they are): Gender, Age group, Collar size, Body Mass Index (BMI)
  • Geographic data (where they live): Country
  • Psychographic (how they behave): Lifestyle/Purpose
  • Behavioural (why they buy): Price sensitivity

With two algorithms called ‘clustering’ and ‘subgroup discovery’ they looked for valuable customer subgroups.

i) Clustering
This algorithm assigns a ‘representative’ (called a K-Mediod) for a cluster, then assesses how similar all other data is to this representative. As similar data joins a cluster, the representative is realigned. Then the process repeats, until distinct clusters can be observed.
This was executed in two stages, first by clustering just the product data, then by adding the customer-related variables.

ii) Subgroup Discovery
The researchers used an algorithm (called CN2-SD) which aims to “discover subgroups of the population that are statistically more interesting and unusual”. This algorithm requires a ‘target variable’ which the managers were consulted on. Then all the data are compared to the target variable, with the outcome being a series of “rules” which can be split into categories of ‘interesting for marketing’ and ‘interesting for design’.
The target variable for this investigation was Body Mass Index (BMI), based on the managers’ experience with market demand.


The analysis resulted in six market segments and 49 rules that provided some detailed groupings of the customers. The following are some key observations from each model.

i) Clustering model groupings

Age, fit and pocket: The older the customer, the higher the preference to a loose fitting shirt and for a top pocket.

Lifestyle and geography: Customers in France used the company to buy evening tops, whereas customers in Germany would seek out smart casual shirts, and customers from other locations would generally look for work shirts.

Price sensitivity and gender: Female customers would primarily shop directly from, not one of their affiliates, and with a voucher.

ii) Subgroup Discovery model groupings:

Interesting for marketing:

  1. Customers from the UK tend to be overweight
  2. Customers who made their purchases at a certain affiliate were significantly overweight
  3. Customers aged between 35-44 tend to be overweight

Interesting for design:

  1. Those with a collar size less than 36cm who choose back yoke contrast tend to be overweight
  2. Those who choose curved hems and a particular fabric tend to be overweight


There are several implications for this research.

1) The segments and rules provide actionable insights on’s customers
By identifying their customers with this level of detail, the marketing and design teams have clear and supported direction when it comes to predicting and designing for future customer experiences.

2) Volume is a key aspect of customer subgrouping
When looking for patterns across a large number of variables, the subgroup discovery algorithm inherently factors in volume as a means to deeming if a group is valuable. This can take a subgroup from interesting trivia to a key opportunity.

3) These approaches to understanding customers are relevant across many industries
“The types of variables used in the segmentation made here (product characteristics, demographic and biometric, geographic, psychographic and behavioural) are common in many other business areas, such as banking and automotive. Thus, the process can easily be adapted.”

4) Cooperation between a data expert and the domain expert is crucial
The application of the subgroup discovery algorithm needs help from someone who understands the business environment. Their selection of the target variable helps focus the analysis. The researchers concluded that “the close involvement of the domain experts is essential for the success of the project.”

5) Competitive advantage awaits
A robust analysis of customer data “allow[s] companies to be more efficient and responsive to customer requests and gain a competitive advantage”.

Leaders who look beyond traditional segments, and analyse their data in a serious and open-minded way will get the multidimensional view of their customers that is needed to design truly personalised experiences. As organisations become more sophisticated in this “age of choice”1, data and inclusive thinking are key tenets for organisations to win over customers’ hearts and minds.

For more information contact Tom Champion.

To read the full research paper including details of the algorithms, see Brito, P. Q., Soares, C., Almeida, S., Monte, A., Byvoet, M. (2015) Customer segmentation in a large database of an online customized fashion business, Robotics and Computer Integrated Manufacturing, Vol 36, Pages 93-100. Available here.

1 Deloitte Media Consumer Survey 2015

This blog was authored by Tom Champion.

More about the author