Solutions

Enhancing the quality and accuracy of OCR results

Deloitte's Optical Character Recognition tool "DocuMend" helps with identifying and fixing OCR issues

The Deloitte tool uses a sophisticated ensemble approach to countering OCR problems, packaged into an intuitive, no-code graphical interface. DocuMend first identifies issues and then provides users a quick and easy way to correct them.

The Need

Text mining tools powered by natural language processing (NLP) have become indispensable for modern businesses. The quality of NLP solutions heavily depends on the accuracy of the optical character recognition (OCR) engine used. OCR is a technology based on sophisticated computer vision algorithms for recognizing characters from printed books, handwritten papers or images in all possible fonts, sizes, and orientations. With this technology, companies can quickly transform document images into electronic, machine-readable text - the input for many downstream processes, among which text mining solutions.

OCR accuracy has improved over the years and is considered by many as a solved problem. Nevertheless, errors still occur in practical application, which, if left undetected, can skew results – especially where machine learning NLP algorithms are involved. Sources with tiny font sizes, blurred copies or colored paper can trip up OCR algorithms. The resulting electronic text errors are easy to detect for human readers, who are generally able to infer something is wrong. NLP algorithms, however, could completely misinterpret the text, leading to failure of the text mining tool, which requires manual work finding and correcting the failed conversion

Here you can download the DocuMend fact sheet:

Our solution: DocuMend

Deloitte DocuMend identifies the issues by assessing the accuracy of OCR layers both at a document and individual word level. Users may choose among several OCR engines and set the confidence threshold for OCR quality. The accuracy assessment is overlayed onto the original text in the PDF as a sort of textual heat map – ranking the OCR-processed words from lowest to highest confidence in OCR quality.

Up until now, quality control of OCR required humans to proof-read, finding errors either through context or intuition. DocuMend takes advantage of multiple OCR engines to cross-validate, highlighting discrepancies between the engines as likely sources of error. To correct the identified errors, users navigate through the textual heat-map document or work directly from a list where identified errors are ranked from most to least certain, down to the user-specified threshold.

Advantages/Benefits

Automated assessment of OCR quality across multi-page documents
Visual feedback via word-level confidence heat-maps
Intuitive, no-code graphical interface allows business users to quickly navigate documents and gain confidence in the quality of their OCR layers
Direct user-interaction to correct errors on the spot

Example Use Cases

Improve quality of CV screeners for AI-supported HR recruiting processes
Improve reliability of fit-and-proper screening for banks with regulators
Increase confidence in digital document processing adoption within the business
Lay a strong foundation for countless AI applications making use of text-mining / NLP

Back to main page

Your Contact

David Thogmartin

aiStudio | AI & Data Analytics

dthogmartin@deloitte.de

+49 211 87722336

David Thogmartin leads the aiStudio internationally and the “AI & Data Analytics” practice for Risk Advisory in Germany. He has 20 years of professional experience in Analytics and Digitization, large... Mehr

Audit & Assurance

Risk Advisory

Tax

Legal

Financial Advisory

Consulting

Deloitte Private (Mittelstand)

Spotlight

Sustainability & Climate

Consumer

Energy, Resources & Industrials

Financial Services

Government & Public Services

Life Sciences & Health Care

Technology, Media & Telecommunications

Jobsuche

Berufserfahrene

Studierende

Karriere bei Deloitte

Schüler:innen

Absolvent:innen

Enhancing the quality and accuracy of OCR results

Deloitte's Optical Character Recognition tool "DocuMend" helps with identifying and fixing OCR issues

The Need

Our solution: DocuMend

Advantages/Benefits

Example Use Cases

Your Contact

David Thogmartin

aiStudio | AI & Data Analytics

Auch interessant

Deloitte aiStudio

More Effective Deep Learning with Deep Label

Enhancing the quality and accuracy of OCR results

Deloitte's Optical Character Recognition tool "DocuMend" helps with identifying and fixing OCR issues

The Need

Our solution: DocuMend

Oh, das funktioniert leider so nicht.

Advantages/Benefits

Example Use Cases

Your Contact

aiStudio | AI & Data Analytics

Auch interessant