Posted: 22 Feb. 2024 4 min.

RAG is here: Ready to take the next step with Generative AI?

Topic: New Tech

Even though new generative AI use cases pop up almost by the hour, some organisations are still reluctant to implement AI applications powered by Large Language Models (LLMs) or Foundation Models. Why is that?

We have already seen how generative AI can translate and compose text quite convincingly; they are good – though not perfect – at everyday chat; they can write computer code very fast; and they can automate a vast number of cumbersome tasks that have traditionally been performed by humans. For example, last year IBMs HR department claimed that it had saved 12,000 hours in 18 months (!) by using AI to automate systems that previously demanded laborious exchanges between managers and employees.

So, it is not the quality of generative AI that is holding organisations back. It is often risk, compliance, and security concerns. I get that. Depending on what kind of organisation you are, there might be limitations or even restrictions to what technology and delivery models you are allowed to use. Some of the market’s most powerful and well-functioning generative AI models are cloud-based. For some organisations that is a red flag. Even though cloud technology is widely acknowledged to be secure – in many cases more secure than most companies can ever hope to achieve in their own environment – internal and external factors can still turn the thumb down on the use of cloud-based generative AI.

The good news is that there is a way to work around organisational cloud reluctance and still benefit from the undoubtedly huge potential in LLMs.

RAG on your preferred infrastructure
Suppliers are launching Retrieval-Augmented Generation (RAG) that can not only mitigate risk, security, and compliance issues in generative AI, but also deliver a cost-effective way to get context-based and precise answers from an LLM.

RAG is a design pattern implemented in an organisation that augments an LLM with updated data retrieved from internal knowledge bases, enterprise systems, and user applications like the Microsoft suite. You can build and access your RAG using the infrastructure that you trust and prefer – be it cloud or not.

That means organisations no longer need to factor in delivery model constraints related to generative AI; they can access it and use it according to their specific IT policies and procedures.

Ultimate knowledge sharing
The challenge with LLMs will always be how they operate and with that, how they tend to make up answers if their training data does not provide enough information to return trustworthy answers. From an organisational perspective it is expensive to train an LLM to make it useful for business purposes. When you use RAG, you optimise the results from the underlying LLM – without modifying it – by leveraging your own data. So, when a user creates an AI prompt, the answer gets filtered through RAG which makes it more precise.

Without going further into the technical details of RAG, it overcomes the issue of feeding the LLM with a massive amount of text to sift through. Most LLMs cannot handle more than 200 pages, so what if you want to query tens of thousands of pages? RAG uses a staggeringly effective digital representation of the semantic (the meaning) content of query and the data. This makes the ultimate knowledge sharing process across the entire organisation feasible – it is very useful and hugely timesaving.

Gain full control of your AI applications
In Deloitte, we are testing RAG in our own operation. We are doing experiments on simulated ESG reporting, synthetic compliance tasks, and general language support. Instead of having colleagues go through thousands of documents, spreadsheets, notes, and other formats, it is faster to run it by RAG and get updated answers straightaway. Think of it as chatting with a document that holds exact information on what you are desperately trying to find out in a specific context right now.

That is another positive using RAG: updated information. Whereas LLMs are limited by presenting possibly out-of-date or generic information (which leads to incorrect answers hence “hallucination” issues), RAG offers an opportunity to add current data sources to the original training data to maintain relevancy.

While organisations alike are still trying to figure out how – I think it is fair to say that we can scratch if – they want to leverage generative AI to gain a competitive edge or be more productive, there are still areas they need to address. Not at least governance, risk, and compliance matters. Last year, IBM launched watsonx.governance to help organisations manage AI to meet safety and transparency regulations and policies, and proactively detect and mitigate risk.

In Deloitte, we not only offer to implement our co-developed management platform, but we can also deliver a set of tools for monitoring your generative AI model with built-in digital workflows of the relevant personas to make sure that information is automatically processed in a secure digital environment.

Forfatter spotlight

Jacob Bock Axelsen

Jacob Bock Axelsen

CTO

Jacob Bock Axelsen (Snr Manager) is CTO in Deloitte Risk Advisory and is an expert in mathematical modeling and a specialist in artificial intelligence. Jacob is educated in mathematics- economics (BSc), biophysics (MSc) and physics (PhD) with nine years of research experience abroad. His scientific background has proven useful in advising both private companies and public institutions on AI, AI governance, Quantum Computing, Organizational Network Analysis, Natural Capital Management and much more. After six years in Deloitte he has developed a strong business acumen. He holds the IBM Champion title for the fourth year in a row and is part of Deloitte’s global quantum computing initiative.

$(document.head).append(''); $(document.head).append('