SONAR: Find labelling errors in databases

Case study

SONAR: find labelling errors in databases

Case 2 out of 16 projects of applied AI

Just under a year ago, Deloitte was approached by a major retailer. Its range consisted of over 30,000 products, and the commodity codes provided by suppliers had to be checked manually for around 600 new products every month. In addition, information had to be entered relating to the VAT rate and any local levies, such as the battery tax that applies in Belgium for products containing batteries. It was not unusual for something to go wrong when it came to this labelling. The retailer asked Deloitte for assistance in checking the information entered by human staff members.

“Previously, we would have carried out spot checks,” says Gerhard Smit, information architect and data analyst at Deloitte. “But then we thought, can’t we automate the checks?” Within one week, he and his team members created a proof of concept: Similarity Observant Network Analytics Report, or SONAR for short. It is a tool that predicts the likelihood that the entered information relating to VAT, the commodity code and local levies in a product database is correct.

AI email alert

Receive the latest AI cases

Sign-up

Comparing data

It works like this: a client supplies a data file containing as many details as possible – the commercial product description, the VAT rate, the commodity code and an indication of whether or not each local levy applies. But it also contains, for example, the barcode and other information that can assist with understanding the nature of the product.

SONAR compares this information against a customs database containing all commodity codes, a textual description for each commodity code, and the applicable rate of VAT. The comparison results in a percentage to indicate the likelihood that the label added by the client is correct. If a label is more than 80 per cent likely to be incorrect, for example, the product can be checked by a person.

A great deal of label-related work is simple, but new, innovative products often require additional attention. Smit: “Legislation often fails to keep up with reality,” remarks Smit. Take smartphones. Should we classify them as a phone, or as a navigation system, for example?” Such cases need to be assessed by an expert. SONAR allows checking of the vast majority of products to be automated, so that additional attention can be paid to the difficult cases.

Bicycle lights

The SONAR team went to a shop together with the client to test the tool, and carried out a random check on a shelf of bicycle lights. In the case of one bicycle light, SONAR indicated that something was likely to be incorrect regarding the battery tax. Smit: “Upon closer inspection, it turned out that there was indeed a small battery included in the packaging, although that wasn’t included in the description,” recalls Smit. We thought it was highly amusing: something we had built within a week had an immediate impact.”

SONAR was developed for a client, but Smit believes the technology is generic enough to be implemented for other problems. It works particularly well with databases containing at least 2,500 products, and a reference database must be available. Smit: “SONAR allows you to check the information entered by humans far more quickly and accurately,” asserts Smit. “And the best part about it is, the more often you use the technology and the more product information that becomes available, the more accurate the results will be.”

*) This case is part of the series of 16 Artificial Intelligence projects from Deloitte. Other cases in the series are in random order:

  1. TAX-I: A virtual legal research assistant
  2. AI Benchmark 
  3. SONAR: Find labelling errors in databases
  4. Transaction detector with regard to the Dutch work cost regulations
  5. GRAPA: assistance with risk strategies
  6. Chatbot as a handy search tool for the online technical library
  7. Argus: an eye for detail
  8. PostNL: optimising delivery times
  9. Virtual assistants: beyond the hype
  10. HR agent Edgy: the future of Human Resources
  11. Using machine learning to assess risks for insurance policies
  12. Predicting payment behaviour
  13. DocQMiner: contract analysis performed in no time at all
  14. Combating welfare fraud with machine learning
  15. Using machine learning and network analytics to search for a needle in a haystack
  16. Clustering unstructured information in BrainSpace

Sign up for the email alert to get all cases through email.

Vond u dit nuttig?