More Effective Deep Learning with Deep Label

Deloitte’s QA for AI offering

Deloitte’s QA for AI tool “Deep Label” alleviates a major source of inaccuracy for deep neural networks: mislabeled input data.

The Need

Deep Learning is arguably the driving force of AI over the past decades, to which nearly all major economically interesting contributions of AI can be attributed. Deep Neural Networks (DNN) have enabled data scientists to uncover complex relationships between input data and target outputs – without theorizing about formula or heuristics, but letting the data speak for itself. Therein lies both the strength and the weakness of deep learning: strong results when the data quality is high, misleading results if not. This is generally true for all Machine Learning [ML] models. Deep Learning, however, is particularly sensitive to the quality of “labeling” or “tagging”.

Deep Learning fundamentally works by associating outcomes with complex combinations of input information. The outcomes can be simple, binary: “good risk” or “bad” (in the case of credit scoring)... or “it is me” or “is not me” (in the case of face recognition for security access). Outcomes can also be multi-class: “pedestrian” / “car” / “tree” / “road sign” (in the case of autonomous vehicles). The DNN learns these associations by being trained on a rich dataset – both deep (many observations) and wide (many variables).

What if the labels are wrong? How are outcomes labeled anyway, before they are used to train the Deep Neural Network?

The reality is, labels can be and often are wrong. Training data is generally labeled by human beings, often “gig economy” / crowd-sourcing workers. Quality can vary person to person, from time of day or day of the week, or simply due to complexity of the data. In cases of image data, subjectivity may even come into play – to you it looks like a rock, to your neighbor it looks like a pothole in the road.

While neural networks are relatively stable for occasional mislabeled associations, the greater the percent of mislabeled data, the lower the accuracy of the resulting DNN-powered model. Inaccuracy can manifest itself in many ways, depending on the nature of the mislabeling: consistently wrong - in case of systematic mislabeling, or unable to classify at all - in the case of high variation in labeling quality.

Here you can download the Deep Label fact sheet:

Our Solution: Deep Label

Deloitte’s QA for AI solution Deep Label takes a cutting-edge approach to isolate suspected mislabels in image and text classification tasks. The challenge of quality assurance for labeling is to achieve it in a fully automated fashion – without any prior knowledge about the underlying data, nor requiring an understanding of the labels themselves.

Labeling is a type of abstraction of detail into higher level categories (in images: “that is a pencil”, in text: “that’s an article in French about nature”). Humans can do this effortlessly, relying on life experience, which is effectively hugely deep learning process over massively large training sets.

Deep Label records and exploits differences in the training dynamics of clean and mislabeled samples, based on the theory behind two recent research papers, which the Deloitte aiStudio dissected, tested, further developed, and transformed into a new Python library. Experienced coders can thereby easily invoke the advanced Deep Label capabilities through directly importing the required functions into their own Python code. The application adds a graphical user interface, enabling a wider base of users to enjoy the same functionality without requiring coding skills.

Deep Label creates a ranking of all samples, where a higher position in the ranking corresponds to a higher obscurity of the designated label to a sample. Such obscurity correlates directly with the state of being mislabeled. Moreover, Deep Label learns a threshold which isolates the predominant share of mislabeled data from correctly labeled data. In effect only a fraction of the whole dataset need be reviewed to identify the vast majority of mislabels.
The identified mislabeled data demonstrably represent the most detractive to proper training a Deep Neural Network with high (generalizable) prediction accuracy on unseen data.


  • Improve the accuracy of existing deep learning models by ensuring training on correctly labeled input-output pairs.
  • Resolve instances of mislabeling (prompted by the Deep Label application) in a fraction of the time required to review the whole dataset.
  • Requires no prior knowledge about the underlying data, nor the historical associations between inputs and outputs.

Example Use Cases

  • Application in computer vision: autonomous vehicles, face recognition.
    Application in natural language processing: suspicious mail detection, text classification, sentiment analysis.

David Thogmartin

David Thogmartin

Director | aiStudio

David Thogmartin leads the aiStudio internationally and the “Analytics, Data and Artificial Intelligence” practice for Risk Advisory in Germany. He has 20 years of professional experience in Analytics... Mehr