How to find patterns in supply chain data?
Obtaining insight by using clustering techniques
In this case study the application of clustering techniques is described, to give us insight into data. Clustering, in general, aims to divide a dataset into groups with similar properties, which is often very difficult to do manually when there are a large number of variables involved.
As an example, consider the case of a coverage rate. Figure 1a shows that from a coverage pattern of a single order it is easy to distinguish what the drivers of the performance for this particular order are. It shows that at the end of the ‘Consolidation for export’ and ‘Shipping from export terminal’ processes, the coverage rate decreases sharply. This is a reason to further investigate these processes.
With a large number of coverage patterns however, it becomes more complicated. A single coverage pattern provides the insights you need for a single order, but going through every order individually is a time consuming and cumbersome activity. Not only would this take too much time, it would also be very easy to lose track of the big picture and get caught in details. This is shown in Figure 1b.
Obtain insight in the coverage patterns
How can we obtain insight in the coverage patterns of a large number of orders? Averaging the coverage patterns over a large time period, for example one year, would seem like a logical step, but this inevitably leads to a loss of information as shown in Figure 1c. This panel shows the average coverage of one year of data. While the coverage rate varies a bit over the year, the overall information in this graph is limited. For example, it is hard to see in which month, for which product, etc. the coverage rate decreased or increased.
A lot of information can be obtained if we automatically group the patterns in buckets, where each bucket contains patterns of similar shape. This can be achieved by using k-means clustering. With this method the first step is to randomly select k points to be the center points, where each point resembles a single coverage pattern.
Then you assign the other points to its nearest center point, based on whether or not the point contains similar characteristics. After that, you determine for each group which data point is the actual center from that group and use these k points as the new center points. These last two steps are repeated until the center points no longer change.
The end result is clusters of points that are similar to each other. As said before, each point resembles a coverage pattern and thus we end up with a cluster of similar coverage patterns. Two of the resulting clusters are shown in Figure 1d. It is clear that each cluster contains coverage patterns of similar shape and a cluster now represents a coverage pattern that is similar across several orders.