Solutions

Deloitte’s Automated Machine Learning Solution: DNAi

Accelerating the Implementation of AI

In automating and optimizing the majority of the data science pipeline, Deloitte’s clients benefit from unparalleled modeling and data analytics support.

The Need

The irony of data science is degree of trial and error. Building the right ML model for the task at hand is laborious process, highly iterative, and somewhat subjective. Much of the data science pipeline is characterized by preparatory step up to 85% of the model development process.

To produce truly optimal solutions, many factors must be taken into account. The modeller must take great care to avoid the many design perils ranging from under-fitting (poor correlations) to over-fitting (poor generalization). To manage between these goal posts, the data scientist must ask questions, such as: How many features provide the best predictive power? Which features are most important? Have they been properly regularized? Which are the most efficient optimization strategies? Which hyper parameters need tuning and how to tune them?

Questions such as these and more can be answered with experimentation and time. Under pressures to deliver to tight lead times, optimization becomes a greater challenge. Even without optimization, pre-processing, data exploration, and necessary cleaning consume substantial resources. The rapidly evolving AI landscape presents the modeller with ever more options, quickly mushrooming into a wide spectrum of possible combinations. Business requirements that change throughout the project duration only serve to exacerbate the problem.

No more! With care, many of which can be automated, liberating the data scientist from late night legwork, freeing the mind for implementation strategy and orchestration. Enter DNAi.

Here you can download the DNAi fact sheet:

Our Solution: DNAi

Deloitte’s Machine Learning rapid prototyping tool “DNAi” puts science into the term data science, automating the majority of preparatory steps and thereby greatly accelerating the full machine learning process. It achieves this by applying advanced techniques such as genetic algorithms – letting models defined by feature chromosomes iteratively battle it out over epochs to reveal a winning combination.

DNAi automates well above 50% of the data science pipeline, encompassing such activities as data imputation (for missing data), feature selection, feature engineering & regularization, algorithm selection and hyper-parameter tuning. Using DNAi, the data scientist can efficiently and methodically test hypotheses, analyse results and implement learnings in rapid succession – more quickly and reliably arriving at an optimal design.

DNAi proves its value from the start, allowing the data scientist to quickly pilot first results, gaining an appreciation for intricacies of the problem to be solved. By streamlining preparatory steps, its value is multiplied with each successive revision and refinement of the model. The efficiency value of DNAi is only eclipsed by its optimization capabilities. Accuracy and performance levels compare very favourably in tests against numerous competitor AutoML solutions.  

Advantages/Benefits

  • ML models can be generated overnight with 80% accuracy before manual optimisation / tuning
  • Accuracy on par with leading AutoML solutions
  • Reduced man hours required to run a full “data science pipeline”
  • Quickly arriving at indicative pilot results – accelerating development / feedback cycles
  • Reproducibility of the process and results

Example Use Cases

  • Behavioural scoring for client fraud risk
  • Stockpile prediction (coal stockpiles)
  • Determine combination of factors behind hospital patient satisfaction
  • Imputation of missing edition information to ensure software license compliance

David Thogmartin

David Thogmartin

Director | aiStudio

David Thogmartin leads the aiStudio internationally and the “Analytics, Data and Artificial Intelligence” practice for Risk Advisory in Germany. He has 20 years of professional experience in Analytics... Mehr