Solutions

Unleashing the Power of Words

Deloitte’s WordsWorth Text-Mining Solution

Deloitte’s WordsWorth is a state-of-the-art, Cloud-capable text-mining offering from scanning through named-entity recognition, document mark-up, semantic search, text translation & summarization, table & invoice extraction.

The Need

Today’s digital world has resulted in a Cambrian explosion of documents, ever easier to produce and to transmit. While offices worldwide have indeed become increasingly “paperless”, paper reports have only partially been superseded by electronic data exchanges of structured, tabular data. Instead, unstructured narratives have migrated from paper to PDF, or equivalents. The volume of textual documents has soared, lifted by improved editor tools, automated text generation, and a dramatic increase in the options an author has to disseminate a message: emails, chats, blogs, social media, cloud drives, collaborative document-sharing suites, to name a few. Furthermore, multiple providers in each of these formats compete to best serve the need for humans to tell a story, to instruct or explain, or simply to express views.

Proliferation of documents, types and formats poses a significant challenge to the reader. It is increasingly difficult to discern useful signals from spurious noise, or even facts from opinion. Digitization of processes and businesses, the “always-on” reachability through mobile devices has raised expectations for quick results. There is simply not enough time to wade through the flood of documents, to discern which is important or which is reliable. Fortunately, machines armed with Natural Language Processing (NLP) algorithms can help. NLP promises efficiency, quality and exhaustive coverage in working with unstructured, textual documents.

Our Solution

Pre-trained on universally applicable language models and enhanced with case-specific vocabulary, the text-mining solution WordsWorth excels in accurate interpretation of text documents. It achieves this by combining the most advanced underlying methods from multiple cloud providers (AWS, Azure, GCP) with the flexibility of multiple, dedicated open-source algorithms. It offers users two means to interact with the functionality, either through the intuitive graphical user interface (GUI) or through dedicated Python libraries, which may be invoked via the command line or embedded within custom applications.

WordsWorth performs a wide spectrum of text-mining services:

OCR – converts scanned text into machine-readable flowing text
language detection & translation – covers all European languages, plus Chinese and Japanese
topic modeling – discerns whether a document is relevant to the reader’s subject of interest
named entity recognition - identifies and extracts names, places..., classifying them into personally identifiable information (or not)
table recognition - identifies tables within the document, exportable into standard formats (.xslx, .csv)
invoice extraction – finds all relevant invoice elements and maps them to their text content and coordinates in the original document.
document mark-up – color-codes identified words, or blacks-out / anonymizes sensitive passages
version comparison – highlights differences between multiple versions of a document
semantic search – finds relevant passages associated with the meaning of input search words, beyond exact matches of key-word search
summarization – paraphrases documents to maximally condense while preserving most relevant topics
export – converts into a variety of popular editable text files (format selected depending on content)

Here you can download the WordsWorth fact sheet:

Advantages/Benefits

Speed: quickly determine whether entire documents are relevant, quickly scan for passages of interest
Quality: find the best available cloud API or open-source Python library, rivaling human performance
Cost: process everything, avoiding rework or errors common to sample-selection or fatigue
Integration: work with documents as you would with other files

Example Use Cases

Tagging documents (metadata) according to contents – in order to route them to the right recipient
Providing advanced concept search (or document similarity) to document management systems
Redacting (blacking out) of sensitive information from confidential / legal documents
Generally structuring concepts (& associated quantities) from narratives into tables / databases
Populating systems with data automatically read from invoices / forms
Providing abstracts / summaries of large documents

Back to main page

David Thogmartin

aiStudio | AI & Data Analytics

dthogmartin@deloitte.de

+49 211 87722336

David Thogmartin leads the aiStudio internationally and the “AI & Data Analytics” practice for Risk Advisory in Germany. He has 20 years of professional experience in Analytics and Digitization, large... Mehr

Audit & Assurance

Risk Advisory

Tax

Legal

Financial Advisory

Consulting

Deloitte Private (Mittelstand)

Spotlight

Sustainability & Climate

Consumer

Energy, Resources & Industrials

Financial Services

Government & Public Services

Life Sciences & Health Care

Technology, Media & Telecommunications

Jobsuche

Berufserfahrene

Studierende

Karriere bei Deloitte

Schüler:innen

Absolvent:innen

Unleashing the Power of Words

Deloitte’s WordsWorth Text-Mining Solution

The Need

Our Solution

Advantages/Benefits

Example Use Cases

Back to main page

David Thogmartin

aiStudio | AI & Data Analytics

Auch interessant

Deloitte aiStudio

More Effective Deep Learning with Deep Label

Unleashing the Power of Words

Deloitte’s WordsWorth Text-Mining Solution

The Need

Our Solution

Oh, das funktioniert leider so nicht.

Advantages/Benefits

Example Use Cases

aiStudio | AI & Data Analytics

Auch interessant