Four data and model quality challenges tied to generative AI

As gen AI programs scale, they’re spotlighting data integrity and model accuracy challenges that may require new strategic solutions to maintain overall software quality and trust

Ashish Verma

United States

Prakul Sharma

United States

Parth Patwari

United States

Ahmed Alibage

United States

Generative artificial intelligence success asks leaders to do more than adopt technology. They need to seamlessly collaborate across data, engineering, and business teams throughout the software development life cycle.

Building on our first article in this research series, “How can organizations engineer quality software in the age of gen AI?,” where we examined ways to overcome software development life cycle (SDLC) challenges, this piece identifies four engineering obstacles that organizations could address to help enhance data and model quality and fully unlock gen AI’s potential, whether for SDLC use cases or beyond.1

From the need for clear data architecture to the management of uncontrolled model drift, these challenges underscore the importance of designing robust systems that address AI’s probabilistic nature while fostering trust and consistency. In some ways, AI and gen AI are raising the bar for quality data and changing the software engineering life cycle in ways that can enable the next generation of AI and gen AI–powered applications, such as AI agents.

In this second installment, we investigate how organizations can build AI and data environments that address gen AI data integration, privacy, and model accuracy needs. The research draws on an extensive literature review, specialist interviews, and analysis of the survey data from Deloitte’s third- and fourth-quarter 2024 State of Generative AI surveys (see "Methodology". ). We explore how technology leaders, especially data architects, engineers, scientists, and security leaders can act to address four emerging challenges.
  1. Lack of clear data strategy: Gen AI strategies may struggle without a clear data architecture that cuts across types and modalities, accounting for data diversity and bias and refactoring data for probabilistic systems.
  2. Data system design limitations for probabilistic models: Data systems may not be designed for probabilistic models which can make the cost of training and retraining high, without data transformation that includes data ontologies, governance and trust-building actions, and creation of data queries that reflect real-world scenarios.
  3. Inconsistent information retrieval, chunking, and integration across multimodal solutions: Retrieval solutions and multimodal approaches can introduce data integration and engineering challenges, which can be resolved by setting up processes for retrieval augmented generation (RAG) integration with human oversight and improving chunking and advanced retrieval methods at all integration points.
  4. Hallucinations and impaired trust throughout the model life cycle: Model opacity and hallucinations that impair trust can be mitigated with human oversight, emerging tools, and prompt engineering parameters and training. Uncontrolled model drift can threaten production quality. It can be managed by maintaining data freshness in production using prompts as a feedback loop, ensuring continuous adaptation to evolving data patterns.

There’s no one-size-fits-all approach, but a clear data strategy, optimized training data integrations, and continuous model tuning are all important components to helping engineer quality gen AI–enabled solutions. Failure to do so can risk financial losses and reputational damage, not to mention potentially impacting the successful scaling of gen AI programs. Given that many organizations have made substantial gen AI investments, neglecting data and model integrity can turn potential gains in productivity, efficiency, and revenue growth into costly missteps.

Challenge 1: Gen AI strategies may struggle without a clear data architecture and regulatory alignment

Gen AI strategies often demand massive data sets across a variety of sources, including public and licensed external sources, synthetic and internal datasets, and multiple formats and content types (for example, image files, documents, code, languages). Although accounting for these different types of data may create additional complexity, it is important to have a full picture of real-world conditions, which are often as diverse as the data sets themselves.

However, this data complexity can raise challenges. According to the analysis of Deloitte’s third quarter 2024 State of Generative AI in the Enterprise survey, organizations that use public, private, and open-source large language models (LLMs) are concerned about data privacy, security, data sovereignty, and regulations. These leaders are also worried about risk management, data governance, data breaches, inappropriate usage, and more (figure 1).2

Without adopting modern data infrastructure—such as vector databases to manage embeddings and semantic frameworks like knowledge graphs to establish context and relationships—organizations may face higher costs, slower deployment, and diminished performance in their gen AI initiatives. If an organization does not have a clearly established data architecture, these complexities can lead to a less-than-optimal data environment, characterized by silos, static schemas, a lack of integration, and high training and retraining costs. While this issue is not specific to generative AI solutions, gen AI applications can be especially sensitive to this issue given they are often multimodal solutions that require strong architecture principles.

Furthermore, real-time data processing is often vital for gen AI applications, similar to contextual AI, which refers to systems designed to comprehend and adapt to specific situational, environmental, or operational nuances.3 These systems rely on dynamic, contextual data such as user preferences, historical trends, and real-time conditions to deliver highly relevant responses or actions. In applications like chatbots and real-time translation tools, real-time data processing can allow the AI to not only respond to inputs but also adapt its responses based on factors like the user’s emotional tone, past interactions, and immediate needs. This can help create a more personalized and effective interaction, aligning with the core principles of contextual AI.

Beyond the challenge of modernizing data infrastructure, gen AI–enabled software programs are often further complicated by complex data security, access, transparency, and regulatory needs. For instance, stringent personal data regulations in some regions emphasize privacy compliance,4 while others focus on transparent and ethical data use.5 In Asia Pacific, Singapore has implemented the Model AI Governance Framework, to raise data transparency, fairness, and accountability standards, impacting the work of data engineers.6

One challenge in regulated industries involves data reconciliation, traditionally simpler with structured data where row counts between source and target could be compared. However, gen AI often involves unstructured data that is chunked and tokenized, making it difficult to confirm no information loss during transformation. For example, token counts between the original document and its chunked versions can be compared, but even this method may not be foolproof due to potential overlapping chunks. These reconciliation gaps, along with regional nuances, can present complex challenges across data privacy and architecture, extending to gen AI models, including those often sourced from third parties. Moreover, as organizations seek to collaborate and share data across regions, gen AI solutions should operate with data sovereignty and regulatory standards in mind.

What approach can leaders consider?

Developing multimodal architectures

A proper framework for data management is important for addressing the multifaceted challenges organizations face in their data transformation journey which now also covers data for gen AI applications. Beyond the traditional considerations of data types, usage, structure, security, and governance that apply to all AI applications, gen AI solutions are unique, in that they often require multimodal architectures. Early research underscores the significance of these architectures in enhancing gen AI performance and versatility, particularly for tasks requiring complex cross-modal reasoning and content generation. For instance, a study on multimodal gen AI models explores the integration of LLMs with diffusion models to handle multiple data modalities, emphasizing how such architectures expand the capabilities of gen AI systems beyond unimodal tasks. The study highlights how the combination of different data representations improves zero-shot learning and cross-modal content generation, making the models more adaptable to a broader range of tasks.7

Another study examining generalist multimodal AI frameworks provides a review of architectures designed to process and generate multimodal data, emphasizing the importance of modular components that can seamlessly switch between different types of input and output data. The research points out that effective data management strategies should align with the growing complexity of these architectures, helping to ensure that diverse data types can be efficiently integrated and managed.8 The following are emerging practices related to gen AI architecture.

  • Fine-tuned models: Develop and fine-tune models for specific vertical domains (for example, health care, finance, retail) to enhance their performance and relevance in those areas. This involves training models on domain-specific data and tasks to improve accuracy and effectiveness.
  • Vertical domain-based agents: Create specialized agents tailored to the needs of specific industries or domains. These agents leverage fine-tuned models to provide skilled insights and solutions, addressing unique challenges and requirements within each vertical.
  • Real-time processing: Design the system to support real-time processing and inference, particularly for applications requiring immediate responses.
  • Cross-modal reasoning: Develop models capable of leveraging the complementary strengths of different data types to perform complex tasks like image captioning, video summarization, and multimodal sentiment analysis.

Challenge 2: Legacy data environments are not designed for probabilistic systems like gen AI

Many organizations still rely on deterministic data systems (true/false, 0s and 1s) which can be fine for traditional AI’s rigid structures and predefined schemas. These systems are well suited for handling structured data and rule-based queries but are increasingly inadequate for the demands of modern AI, particularly gen AI. Organizations operating within such legacy frameworks may face challenges in integrating diverse and complex data sets, especially those involving unstructured or multimodal data and complex workloads.9 This misalignment can lead to costly preprocessing efforts, increased training time, and inefficiencies in handling real-world data.

Gen AI systems use probabilistic models and, therefore, require data to be standardized and vectorized differently to support semantic understanding and pattern inference. Tools like vector databases, which store and retrieve high-dimensional representations of data, and knowledge graphs, which provide a semantic framework to contextualize data, are vital in addressing these challenges.10

What actions can organizations consider?

Use data ontologies and knowledge modeling to help optimize gen AI model performance and reduce context ambiguity

Ontologies can serve as a structured “language” to standardize concepts and relationships, much like how a glossary provides a consistent reference for understanding terminology, reducing redundancies, improving consistency, and simplifying integration and data preparation.11 While ontologies may not directly reduce training costs, they can play an important role in knowledge modeling during inferencing by improving the precision of context retrieval. LLMs often struggle to identify precise context, especially when working with extensive datasets where context overlap can reduce accuracy.

By implementing a well-defined taxonomy—and, when necessary, ontologies—to structure and tag data, organizations can narrow the surface area of search, helping enable more accurate and efficient inferencing. For example, in a use case involving querying 11,000 articles, accuracy declined significantly as the data set expanded due to context overlap. By introducing a taxonomy and tagging the articles, the surface area of the search was reduced, improving the precision of results.12 Similarly, tailored data ontologies—such as vectorized representations for knowledge graphs—can provide a structured approach to organizing data into standardized formats based on attributes, allowing gen AI models to interpret and integrate multimodal data sets more effectively.13 However, certain data formats, such as hierarchical or nested structures like JSON or XML files, can be challenging to vectorize and may require additional preprocessing steps.14 Different ontologies and semantic frameworks can help optimize gen AI workflows, improve inferencing accuracy, and unlock the full potential of contextualized meaningful outputs.

Establish automated reasoning approaches

Automated reasoning is another approach emerging to enhance and verify model outputs, thereby mitigating issues like hallucinations. Cloud hyperscalers are building and launching products in the market which include automated reasoning checks to validate the accuracy of responses from LLMs. These rigorous validation mechanisms employ formal logic to ensure outputs align with established facts, providing verifiable proof of correctness and constraining uncertain outputs across a variety of results.15 Increasingly, neurosymbolic AI approaches are being explored, blending neural and symbolic AI techniques to strengthen reasoning capabilities. Historically treated as separate schools of thought, neural and symbolic AI are now converging with gen AI advancements, providing more robust validation mechanisms. This trend is evident in recent innovations like Amazon Bedrock Guardrails, which introduced automated reasoning checks for LLMs in 2024.16 Agentic AI is one fast-emerging area where solutions are emerging to drive more deterministic outputs and consistency in actions that AI agents take.

Establish data queries that reflect real-world scenarios

Rather than risk allowing public LLMs to access private data, some organizations have explored using synthetic data, informed by real-world data, to supplement and expand their training, testing, and commercial application needs.17 For example, a leading health insurance company created a synthetic data platform in 2022 to generate two petabytes of synthetic data for fraud prevention, informed by probabilistic risk models to optimize data synthesis.18 Similarly, an energy company has utilized synthetic images to enhance grid inspection models trained via drones, combining probabilistic methods to assess synthetic data reliability with real-world inspection metrics.19

The data is generated based on probabilities, replicating patterns in the data. A word of caution when employing this strategy: over-training on synthetic data can lead to models that do not generalize well to real-world scenarios, as synthetic data may not fully capture the nuances, variability, and complexities of real-world environments.20 To help manage this risk, decision-makers can employ techniques such as data auditing and scenario testing to gauge how well each data type aligns with their strategic objectives and operational constraints.

Challenge 3: Data integration and engineering challenges come with RAG and multimodal needs

Retrieval augmented generation (RAG), a technique that combines information retrieval and text generation, can help bring external and internal authoritative knowledge bases to public generative models to supplement the model’s foundational training data.21 While important to many gen AI data sets, RAG can also introduce data integration difficulties,22 real-time retrieval relevancy, and maintenance challenges.23

Moreover, another pervasive data engineering challenge is document or dataset chunking—in other words, breaking up inputs (documents, code, images) into smaller, more manageable pieces (“chunks”).24 While retrieval tools can bring in new data to help improve accuracy, dataset chunking configures it to be processed and produced. Inadequate chunking can cause gen AI–enabled systems to fail,25 potentially resulting in loss of context, redundant or irrelevant information, decreased coherence, slower performance, and higher resource consumption.26

How can leaders address these issues?

Use automated approaches to review the quality of RAG information

Some leaders have started to use automated approaches to review data pulled into the environment from RAG solutions.27 For example, agentic RAG, which integrates AI agents into the RAG pipeline, can orchestrate simple information retrieval, enhancing the system’s ability to handle complex queries and maintain context.28 Automated checks and human oversight can complement technological precision (figure 2).29

Improve chunking and advanced retrieval methods at all integration points

Leaders can also break large data sets into manageable chunks that preserve relevant context for gen AI model inputs and outputs.30 Many chunking techniques can be considered based on format, integration points, and where and how the context is maintained.31 Figure 3 outlines various chunking, retrieval, RAG optimization, and efficiency techniques important for managing information processing systems. Each section highlights a specific set of techniques, along with their descriptions and use cases, providing a structured view for better decision-making and system design.

A thoughtful approach to chunking across various techniques can create structural consistency and contextual alignment across multiple modalities.32 This may mean chunking and retrieving information differently based on form factors, language, code, and more. For example, graph RAG promises to capture semantic nuances and contextual dependencies within data sets, while semantic chunking might focus on intelligent paragraphs or sentence-level hierarchical chunks.33 Chunking and retrieval-based techniques can be applied together to help improve results.34

Challenge 4: Model opacity and hallucinations impair trust

Model overfitting, a phenomenon where the model’s underlying patterns add noise to the training data, can be an issue at the training phase.35 For instance, gen AI models trained on synthetic data can exhibit limited diversity, resulting in performance defects.36 Similarly, models trained on synthetic images instead of real-world data have been found to produce lower-quality outputs.37

Another issue is the occurrence of hallucinations—errors or biases that emerge when a model trained on generic data is applied to specific internal datasets, resulting in inaccurate outputs and unreliable performance.38 In July 2024, a study conducted by a group of researchers from a public university based in China found that the rate of hallucination for LLMs was between 20% and 30%.39 Recent advancements and the implementation of technologies like RAG have helped reduce this rate.40

Gen AI hallucinations can vary and manifest in numerous forms, including visual, textual, and contextual deviations from factual accuracy, which can range from benign creative expressions to problematic misinformation.41 The challenge is further complicated by a lack of transparency in the training processes of public gen AI models. This opacity can make it difficult for organizations to fully trust or understand the underlying foundations of these models.42

Maintaining model accuracy and trust

To better understand the importance of model accuracy and trust as the primary reasons for selecting a gen AI model, we further analyzed executives’ responses to Deloitte’s third quarter 2024 State of Generative AI survey. The analysis revealed that organizations with a high emphasis on overall trust and accuracy prioritize concerns related to compliance with regulations, governance models, and risk management. These strategic focus areas can enhance the overall adoption and efficacy of AI initiatives within their organizations (figure 4). 

 

Show more

How can organizations work to combat model opacity and hallucinations?

Reduce hallucinations with human oversight and emerging tools

Strong, continuous human oversight is vital to tackling hallucinations, which stem from gaps in contextual awareness across digital and AI transformation efforts.43 Among respondents in a Deloitte digital transformation benchmarking survey, the chief technology officer is still the most common primary owner and driver of digital transformation.44 However, according to our interviews, many organizations are discussing if a chief strategy, transformation, AI, or data officer may be needed to augment digital transformation efforts given the scale of data transformation and quality review required of AI and gen AI solutions.45

Below the C-suite and at the task level, a human-in-the-loop approach brings in skilled individuals to oversee outputs and identify when they may deviate from expected outcomes.46 Once identified, the organization can use a hallucination ontology framework to categorize and document different types of hallucinations and their metadata for further analysis and solutions.47

Data and automation are another approach organizations are taking. For example, automated prompt writing tools can help reduce model hallucinations by generating queries that provide clear instructions and context. Google’s “generate prompt” feature creates ready-to-use prompts with helpful placeholders, guiding the model to produce more accurate and trustworthy outputs.48 Likewise, automated context-limiting features can restrict conversations based on set parameters (time, word count, lines of code, etc.),49 allowing time for human oversight and intervention.50

More advanced machine learning approaches are also emerging, including the use of adversarial solutions, unsupervised learning, and anomaly detection to detect potential inaccuracies and threats, enabling models to self-correct and improve their robustness without constant human involvement.51

Finally, leaders can put quality assurance controls in place. One example is recursive abstractive processing for tree-organized retrieval and conformal abstention (which bounds hallucination rates). This can integrate verified information and provide mechanisms to abstain from uncertain outputs.52 Other leaders have constrained a prompt’s scope to predefined parameters to keep responses grounded in factual information or known concepts.53

Contribute to prompt engineering parameters and training

Simple “summarize this” prompts may require multiple iterations to get the user’s intended results. From a usability standpoint, having to ask the same question 15 different ways can be taxing and can impair adoption. Moreover, during the training stage, this issue can contribute to considerable retraining costs, which leaders at RBMA, an ontology platform, estimated for their business could be as much as US$2.8 million per major model iteration.54 The computational overhead of processing repeated prompts during inference can also add up significantly, with the company reporting an average of 3.5 prompt iterations per user query and costs of roughly US$0.03 per inference. Given the company may handle 100,000 monthly requests, they could face an additional US$105,000 in monthly compute costs from inefficient prompting alone.55

A library of approved pretrained prompts and set search and prompt instructions and limitations can also help.56 Preconfigured prompts can reduce this troubleshooting by limiting the universe to prompts that work, improving user experience, and lowering that workload footprint. One interviewee spoke about a generative AI project where this approach yielded efficiencies of as much as 15 times compared to using nonstandardized or ad-hoc prompts.57 Confidence in accuracy may dictate whether the organization allows gen AI systems to process only pretested and approved queries.58

Moreover, some emerging model transparency startups are offering solutions aimed at addressing this challenge. They draw on advanced statistical and explainability frameworks to dissect and analyze the decision-making processes of AI (and gen AI) models. Detailed interpretability reports identify potential biases, inconsistencies, or vulnerabilities in model outputs, providing greater transparency and trust in the underlying logic of AI systems.59 Given it is impossible to predict every potential outcome of a probabilistic system like generative AI, transparency into how decisions are made, and an audit trail of prompts and outputs can help build trust over time.

A model to consider

The rapid adoption of gen AI represents both a transformative opportunity and a strategic challenge for organizations across industries. This research has highlighted four emerging challenges at the intricate interplay between data integrity, multimodal integration, model accuracy, and the governance frameworks required to sustain high-quality outcomes in gen AI–powered initiatives. Practical solutions such as data ontologies, advanced retrieval-augmented generation techniques, contextual awareness, and human oversight are all important components of unlocking gen AI’s full potential while working to avoid costly missteps.

While leaders are navigating these challenges and developing emerging solutions, it’s important to not lose sight of the fact that there is also tremendous progress happening with generative AI implementations. One important area is agentic AI solutions, which can drive actions analyzed, planned, and orchestrated by gen AI–enabled systems. Automated reasoning solutions are one way to make the processes that AI agents automate more deterministic. These agentic solutions should also integrate with the organization’s technology landscape to drive certain actions—for example, analyzing sales and prospect data to automatically trigger a meeting with customers. Trust is expected to be important to allow these solutions to work autonomously with human oversight. Organizations that take steps toward addressing these four challenges may be paving the way for success in each area.

Looking ahead, Part 3 of this series will highlight four risk dimensions that enterprises may face due to gen AI—including risks to the enterprise, gen AI capabilities, adversarial AI, and the marketplace—and the solutions and strategies that can help manage these risks. Together, these insights can empower leaders to engineer solutions responsibly and sustainably in the age of gen AI.

Methodology

The research combines both qualitative and quantitative approaches to identify the top data and model quality challenges and leading practices emerging from gen AI implementations. The qualitative analysis is based on in-depth, structured interviews with 12 Deloitte leaders as well as one external specialist, where structured thematic analysis offers nuanced insights into key challenges and opportunities associated with gen AI adoption. These findings are complemented by an extensive literature review across academic and industry publications, original qualitative analysis that included interviews with 24 global leaders, and quantitative analysis of data from two large-scale surveys conducted for Deloitte’s reports of The State of Generative AI in the Enterprise: one conducted with 1,410 global leaders from September to October 2024 and another with 2,770 global leaders from April to June 2024. The analysis examined top concerns specific to organizations with high or very high LLM use (figure 4).

 

To explore the importance of model accuracy and trust as the primary reasons for selecting a gen AI model, we analyzed executives’ responses to Deloitte’s third quarter 2024 State of Generative AI survey and classified them by their primary concerns. We filtered the responses to include only those organizations that prioritized these factors. We then reflected on responses concerning the top five barriers to gen AI adoption. This is to ensure that the groups were mutually exclusive, focusing the analysis on two key aspects—those highly concerned with model accuracy and the overall level of trust in the model. Together, these methods provide a multidimensional view of emerging data and model quality challenges and solutions, supported by large-scale empirical evidence.

Show more

by

Ashish Verma

United States

Prakul Sharma

United States

Parth Patwari

United States

Diana Kearns-Manolatos

United States

Ahmed Alibage

United States

Endnotes

  1. Faruk Muratovic, Jasmeet Gill, Diana Kearns-Manolatos, and Ahmed Alibage, “How can organizations engineer quality software in the age of generative AI? Deloitte Insights, Oct. 28, 2024.

    View in Article
  2. Jim Rowan, Beena Ammanath, Costi Perricos, Brenna Sniderman, and David Jarvis, “Now decides next: Moving from potential to performance,” Aug. 20, 2024.

    View in Article
  3. Oliver Brdiczka, “Contextual AI: The next frontier of artificial intelligence,” Adobe Experience Cloud, April 09, 2019.

    View in Article
  4. Ana Mishova, “Data protection laws around the world: A global perspective,” GDPR Local, Aug. 16, 2024.

    View in Article
  5. Ibid.

    View in Article
  6. Personal Data Protection Commission Singapore and Infocomm Media Development Authority, “Model artificial intelligence governance framework: Second edition,” 2020.

    View in Article
  7. Hong Chen, Xin Wang, Yuwei Zhou, Bin Huang, Yipeng Zhang, Wei Feng, Houlun Chen, Zeyang Zhang, Siao Tang, and Wenwu Zhu, “Multi-modal generative ai: Multi-modal LLM, diffusion and beyond,” arXiv: 2409.14993 (2024).

    View in Article
  8. Sai Munikoti, Ian Stewart, Sameera Horawalavithana, Henry Kvinge, Tegan Emerson, Sandra E. Thompson, and Karl Pazdernik, “Generalist multimodal AI: A review of architectures, challenges and opportunities,” arXiv: 2406.05496 (2024).

    View in Article
  9. Captivate, “Connecting the dots: How ontologies and knowledge graphs empower AI,” podcast, Oct. 22, 2024.

    View in Article
  10. Hogan, A. et.al., “Knowledge graphs,” ACM Computing Surveys 54, no. 4 (2021): pp. 1–37.

    View in Article
  11. Muratovic, Gill, Kearns-Manolatos, and Alibage, “How can organizations engineer quality software in the age of generative AI?

    View in Article
  12. SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, and Jiawei Han, “Improving retrieval in theme-specific applications using a corpus topical taxonomy,” In Proceedings of the ACM on Web Conference (2024): pp. 1497–1508.

    View in Article
  13. John Edge (operating partner, Broadhaven Capital Partners), video interview with the authors, Nov. 13, 2024.

    View in Article
  14. Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux, “Why do tree-based models still outperform deep learning on typical tabular data?” Advances in Neural Information Processing Systems 35 (2022): pp. 507–520.

    View in Article
  15. Antje Barth, “Prevent factual errors from LLM hallucinations with mathematically sound automated reasoning checks (preview),” blog, Amazon Web Services, Dec. 3, 2024.

    View in Article
  16. YouTube, “AWS re:Invent 2024- Introducing automated reasoning checks in Amazon Bedrock Guardrails (AIM393-NEW),” video, Dec. 9, 2024.

    View in Article
  17. Deloitte interview conducted on August 29, 2024.

    View in Article
  18. Isabelle Bousquette, “Anthem looks to fuel AI efforts with petabytes of synthetic data,” The Wall Street Journal, May 17, 2022.

    View in Article
  19. NVIDIA, “Replicator,” Nov. 27, 2024.

    View in Article
  20. Deloitte interview conducted on Aug. 28, 2024.

    View in Article
  21. Amazon Web Services, “What is RAG (retrieval-augmented generation)?” accessed January 2024.

    View in Article
  22. Deloitte interview conducted on August 28, 2024.

    View in Article
  23. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” Advances in Neural Information Processing Systems 33 (2020): pp. 9459–9474.

    View in Article
  24. Lucas Mearian, “Biggest problems and best practices for generative AI rollouts,” Computerworld, April 2, 2024.

    View in Article
  25. Ibid.

    View in Article
  26. Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih, “Dense passage retrieval for open-domain question answering,” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (2020): pp. 6769–6781.

    View in Article
  27. Jon Saad-Falcon, Omar Khattab, Christopher Potts, and Matei Zaharia, “Ares: An automated evaluation framework for retrieval-augmented generation systems,” arXiv: 2311.09476 (2023).

    View in Article
  28. Subham Sharma, “How agentic RAG can be a game-changer for data processing and retrieval,” Venture Beat. Nov. 12, 2024.

    View in Article
  29. Deloitte interview conducted on August 29, 2024

    View in Article
  30. Deloitte interview conducted on July 19, 2024.

    View in Article
  31. Eugenia Anello, “How to improve rag performance: 5 key techniques with examples,” DataCamp, April 12, 2024.

    View in Article
  32. Muratovic, Gill, Kearns-Manolatos, and Alibage, How can organizations engineer quality software in the age of generative AI?

    View in Article
  33. Deloitte interview conducted on July 28, 2024.

    View in Article
  34. Wei, Jason, Nguyen Karina, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, and William Fedus, “Measuring short-form factuality in large language models,” arXiv:2411.04368 (2024).

    View in Article
  35. ITU Online, “What is overfitting?” June 5, 2024.

    View in Article
  36. Ben Lutkevich, “Model collapse explained: How synthetic training data breaks AI,” Tech Target, July 7, 2023.

    View in Article
  37. Yanzhu Guo, Guokan Shang, Michalis Vazirgiannis, and Chloé Clavel, “The curious decline of linguistic diversity: Training language models on synthetic text,” arXiv:2311.09807 (2024).

    View in Article
  38. Marcellin Atemkeng, Sisipho Hamlomo, Brian Welman, Nicole Oyetunji, Pouya Ataei, and Jean Louis KE Fendji, “Ethics of software programming with generative AI: Is programming without generative AI always radical?” arXiv:2408.10554 (2024).

    View in Article
  39. Zouying Cao, Yifei Yang, and Hai Zhao, “Autohall: Automated hallucination dataset generation for large language models,” arXiv:2310.00259 (2024).

    View in Article
  40. Patrice Béchard and Orlando Marquez Ayala, “Reducing hallucination in structured outputs via retrieval-augmented generation,” arXiv:2404.08189 (2024). 

    View in Article
  41. NTT Data, “All hallucinations are not bad. Acknowledging gen AI’s constraints and benefits,” 2024.

    View in Article
  42. Deloitte interview conducted on July 30, 2024.

    View in Article
  43. Deloitte interview conducted on August 08, 2024.

    View in Article
  44. Analysis of data from a Deloitte digital transformation benchmarking survey of 121 C-suite—director-level transformation executives—conducted September 2024.

    View in Article
  45. Deloitte interview conducted on August 29, 2024.

    View in Article
  46. Navapat Nananukul and Mayank Kejriwal, “HALO: An ontology for representing hallucinations in generative models,” arXiv:2312.05209 (2023).

    View in Article
  47. Colby Hawker, “Use AI to build AI: Save time on prompt design with AI-powered prompt writing,” blog, Google Cloud, Nov. 15, 2024.

    View in Article
  48. Deloitte interview conducted on July 17, 2024.

    View in Article
  49. Deloitte interview conducted on July 30, 2024.  

    View in Article
  50. Deloitte interview conducted on July 18, 2024.

    View in Article
  51. Sukriti Gupta, “6 techniques to reduce hallucinations in LLMs,” Analytics India Magazine, July 15, 2024.

    View in Article
  52. Deloitte interview conducted on July 17, 2024.

    View in Article
  53. Wei, Karina, Chung, Jiao, Papay, Glaese, Schulman, and Fedus, “Measuring short-form factuality in large language models.”

    View in Article
  54. Edge interview.

    View in Article
  55. Ibid.

    View in Article
  56. Codecademy, “Detecting hallucinations in generative AI,” accessed January 2024.

    View in Article
  57. Deloitte interview conducted on July 17, 2024.

    View in Article
  58. Deloitte interview conducted on Aug. 27, 2024.

    View in Article
  59. Nick Payton, “Announcing our seed funding to make AI safe, reliable and secure,” Distribution, Dec. 14, 2023.

    View in Article

Acknowledgments

Thank you to our Deloitte research advisory board, which was instrumental in shaping this research.

The authors extend special thanks to Rajib Deb and Bojan Ciric for their exceptional contributions, including their insightful reviews and technical expertise, which greatly enriched the quality and depth of this research.

Additionally, the authors would like to thank the many Deloitte and industry experts interviewed for this research, including, but not limited to, Scott Holcomb, Dean Sauer, David Kuder, Bill Roberts, Nirmala Pudota, Sanmitra Bhattacharya, Scott Pobiner, Rajib Deb, and Ed Bowen. We would also like to thank John Edge, operating partner at Broadhaven Capital Partners, for being interviewed. Negina Rood for supporting the secondary research for this report series, Iram Parveen for project support, and Brenna Sniderman for her insights.

Finally, we’d like to thank our editor, Corrie Commisso, and production editor, Prodyut Borah, for their collaboration and commitment to excellence.

Cover image by: Jim Slatton; Adobe Stock