Generative artificial intelligence success asks leaders to do more than adopt technology. They need to seamlessly collaborate across data, engineering, and business teams throughout the software development life cycle.
Building on our first article in this research series, “How can organizations engineer quality software in the age of gen AI?,” where we examined ways to overcome software development life cycle (SDLC) challenges, this piece identifies four engineering obstacles that organizations could address to help enhance data and model quality and fully unlock gen AI’s potential, whether for SDLC use cases or beyond.1
From the need for clear data architecture to the management of uncontrolled model drift, these challenges underscore the importance of designing robust systems that address AI’s probabilistic nature while fostering trust and consistency. In some ways, AI and gen AI are raising the bar for quality data and changing the software engineering life cycle in ways that can enable the next generation of AI and gen AI–powered applications, such as AI agents.
In this second installment, we investigate how organizations can build AI and data environments that address gen AI data integration, privacy, and model accuracy needs. The research draws on an extensive literature review, specialist interviews, and analysis of the survey data from Deloitte’s third- and fourth-quarter 2024 State of Generative AI surveys (see "Methodology". ). We explore how technology leaders, especially data architects, engineers, scientists, and security leaders can act to address four emerging challenges.
There’s no one-size-fits-all approach, but a clear data strategy, optimized training data integrations, and continuous model tuning are all important components to helping engineer quality gen AI–enabled solutions. Failure to do so can risk financial losses and reputational damage, not to mention potentially impacting the successful scaling of gen AI programs. Given that many organizations have made substantial gen AI investments, neglecting data and model integrity can turn potential gains in productivity, efficiency, and revenue growth into costly missteps.
Gen AI strategies often demand massive data sets across a variety of sources, including public and licensed external sources, synthetic and internal datasets, and multiple formats and content types (for example, image files, documents, code, languages). Although accounting for these different types of data may create additional complexity, it is important to have a full picture of real-world conditions, which are often as diverse as the data sets themselves.
However, this data complexity can raise challenges. According to the analysis of Deloitte’s third quarter 2024 State of Generative AI in the Enterprise survey, organizations that use public, private, and open-source large language models (LLMs) are concerned about data privacy, security, data sovereignty, and regulations. These leaders are also worried about risk management, data governance, data breaches, inappropriate usage, and more (figure 1).2
Without adopting modern data infrastructure—such as vector databases to manage embeddings and semantic frameworks like knowledge graphs to establish context and relationships—organizations may face higher costs, slower deployment, and diminished performance in their gen AI initiatives. If an organization does not have a clearly established data architecture, these complexities can lead to a less-than-optimal data environment, characterized by silos, static schemas, a lack of integration, and high training and retraining costs. While this issue is not specific to generative AI solutions, gen AI applications can be especially sensitive to this issue given they are often multimodal solutions that require strong architecture principles.
Furthermore, real-time data processing is often vital for gen AI applications, similar to contextual AI, which refers to systems designed to comprehend and adapt to specific situational, environmental, or operational nuances.3 These systems rely on dynamic, contextual data such as user preferences, historical trends, and real-time conditions to deliver highly relevant responses or actions. In applications like chatbots and real-time translation tools, real-time data processing can allow the AI to not only respond to inputs but also adapt its responses based on factors like the user’s emotional tone, past interactions, and immediate needs. This can help create a more personalized and effective interaction, aligning with the core principles of contextual AI.
Beyond the challenge of modernizing data infrastructure, gen AI–enabled software programs are often further complicated by complex data security, access, transparency, and regulatory needs. For instance, stringent personal data regulations in some regions emphasize privacy compliance,4 while others focus on transparent and ethical data use.5 In Asia Pacific, Singapore has implemented the Model AI Governance Framework, to raise data transparency, fairness, and accountability standards, impacting the work of data engineers.6
One challenge in regulated industries involves data reconciliation, traditionally simpler with structured data where row counts between source and target could be compared. However, gen AI often involves unstructured data that is chunked and tokenized, making it difficult to confirm no information loss during transformation. For example, token counts between the original document and its chunked versions can be compared, but even this method may not be foolproof due to potential overlapping chunks. These reconciliation gaps, along with regional nuances, can present complex challenges across data privacy and architecture, extending to gen AI models, including those often sourced from third parties. Moreover, as organizations seek to collaborate and share data across regions, gen AI solutions should operate with data sovereignty and regulatory standards in mind.
What approach can leaders consider?
A proper framework for data management is important for addressing the multifaceted challenges organizations face in their data transformation journey which now also covers data for gen AI applications. Beyond the traditional considerations of data types, usage, structure, security, and governance that apply to all AI applications, gen AI solutions are unique, in that they often require multimodal architectures. Early research underscores the significance of these architectures in enhancing gen AI performance and versatility, particularly for tasks requiring complex cross-modal reasoning and content generation. For instance, a study on multimodal gen AI models explores the integration of LLMs with diffusion models to handle multiple data modalities, emphasizing how such architectures expand the capabilities of gen AI systems beyond unimodal tasks. The study highlights how the combination of different data representations improves zero-shot learning and cross-modal content generation, making the models more adaptable to a broader range of tasks.7
Another study examining generalist multimodal AI frameworks provides a review of architectures designed to process and generate multimodal data, emphasizing the importance of modular components that can seamlessly switch between different types of input and output data. The research points out that effective data management strategies should align with the growing complexity of these architectures, helping to ensure that diverse data types can be efficiently integrated and managed.8 The following are emerging practices related to gen AI architecture.
Many organizations still rely on deterministic data systems (true/false, 0s and 1s) which can be fine for traditional AI’s rigid structures and predefined schemas. These systems are well suited for handling structured data and rule-based queries but are increasingly inadequate for the demands of modern AI, particularly gen AI. Organizations operating within such legacy frameworks may face challenges in integrating diverse and complex data sets, especially those involving unstructured or multimodal data and complex workloads.9 This misalignment can lead to costly preprocessing efforts, increased training time, and inefficiencies in handling real-world data.
Gen AI systems use probabilistic models and, therefore, require data to be standardized and vectorized differently to support semantic understanding and pattern inference. Tools like vector databases, which store and retrieve high-dimensional representations of data, and knowledge graphs, which provide a semantic framework to contextualize data, are vital in addressing these challenges.10
What actions can organizations consider?
Ontologies can serve as a structured “language” to standardize concepts and relationships, much like how a glossary provides a consistent reference for understanding terminology, reducing redundancies, improving consistency, and simplifying integration and data preparation.11 While ontologies may not directly reduce training costs, they can play an important role in knowledge modeling during inferencing by improving the precision of context retrieval. LLMs often struggle to identify precise context, especially when working with extensive datasets where context overlap can reduce accuracy.
By implementing a well-defined taxonomy—and, when necessary, ontologies—to structure and tag data, organizations can narrow the surface area of search, helping enable more accurate and efficient inferencing. For example, in a use case involving querying 11,000 articles, accuracy declined significantly as the data set expanded due to context overlap. By introducing a taxonomy and tagging the articles, the surface area of the search was reduced, improving the precision of results.12 Similarly, tailored data ontologies—such as vectorized representations for knowledge graphs—can provide a structured approach to organizing data into standardized formats based on attributes, allowing gen AI models to interpret and integrate multimodal data sets more effectively.13 However, certain data formats, such as hierarchical or nested structures like JSON or XML files, can be challenging to vectorize and may require additional preprocessing steps.14 Different ontologies and semantic frameworks can help optimize gen AI workflows, improve inferencing accuracy, and unlock the full potential of contextualized meaningful outputs.
Automated reasoning is another approach emerging to enhance and verify model outputs, thereby mitigating issues like hallucinations. Cloud hyperscalers are building and launching products in the market which include automated reasoning checks to validate the accuracy of responses from LLMs. These rigorous validation mechanisms employ formal logic to ensure outputs align with established facts, providing verifiable proof of correctness and constraining uncertain outputs across a variety of results.15 Increasingly, neurosymbolic AI approaches are being explored, blending neural and symbolic AI techniques to strengthen reasoning capabilities. Historically treated as separate schools of thought, neural and symbolic AI are now converging with gen AI advancements, providing more robust validation mechanisms. This trend is evident in recent innovations like Amazon Bedrock Guardrails, which introduced automated reasoning checks for LLMs in 2024.16 Agentic AI is one fast-emerging area where solutions are emerging to drive more deterministic outputs and consistency in actions that AI agents take.
Rather than risk allowing public LLMs to access private data, some organizations have explored using synthetic data, informed by real-world data, to supplement and expand their training, testing, and commercial application needs.17 For example, a leading health insurance company created a synthetic data platform in 2022 to generate two petabytes of synthetic data for fraud prevention, informed by probabilistic risk models to optimize data synthesis.18 Similarly, an energy company has utilized synthetic images to enhance grid inspection models trained via drones, combining probabilistic methods to assess synthetic data reliability with real-world inspection metrics.19
The data is generated based on probabilities, replicating patterns in the data. A word of caution when employing this strategy: over-training on synthetic data can lead to models that do not generalize well to real-world scenarios, as synthetic data may not fully capture the nuances, variability, and complexities of real-world environments.20 To help manage this risk, decision-makers can employ techniques such as data auditing and scenario testing to gauge how well each data type aligns with their strategic objectives and operational constraints.
Retrieval augmented generation (RAG), a technique that combines information retrieval and text generation, can help bring external and internal authoritative knowledge bases to public generative models to supplement the model’s foundational training data.21 While important to many gen AI data sets, RAG can also introduce data integration difficulties,22 real-time retrieval relevancy, and maintenance challenges.23
Moreover, another pervasive data engineering challenge is document or dataset chunking—in other words, breaking up inputs (documents, code, images) into smaller, more manageable pieces (“chunks”).24 While retrieval tools can bring in new data to help improve accuracy, dataset chunking configures it to be processed and produced. Inadequate chunking can cause gen AI–enabled systems to fail,25 potentially resulting in loss of context, redundant or irrelevant information, decreased coherence, slower performance, and higher resource consumption.26
How can leaders address these issues?
Some leaders have started to use automated approaches to review data pulled into the environment from RAG solutions.27 For example, agentic RAG, which integrates AI agents into the RAG pipeline, can orchestrate simple information retrieval, enhancing the system’s ability to handle complex queries and maintain context.28 Automated checks and human oversight can complement technological precision (figure 2).29
Leaders can also break large data sets into manageable chunks that preserve relevant context for gen AI model inputs and outputs.30 Many chunking techniques can be considered based on format, integration points, and where and how the context is maintained.31 Figure 3 outlines various chunking, retrieval, RAG optimization, and efficiency techniques important for managing information processing systems. Each section highlights a specific set of techniques, along with their descriptions and use cases, providing a structured view for better decision-making and system design.
A thoughtful approach to chunking across various techniques can create structural consistency and contextual alignment across multiple modalities.32 This may mean chunking and retrieving information differently based on form factors, language, code, and more. For example, graph RAG promises to capture semantic nuances and contextual dependencies within data sets, while semantic chunking might focus on intelligent paragraphs or sentence-level hierarchical chunks.33 Chunking and retrieval-based techniques can be applied together to help improve results.34
Model overfitting, a phenomenon where the model’s underlying patterns add noise to the training data, can be an issue at the training phase.35 For instance, gen AI models trained on synthetic data can exhibit limited diversity, resulting in performance defects.36 Similarly, models trained on synthetic images instead of real-world data have been found to produce lower-quality outputs.37
Another issue is the occurrence of hallucinations—errors or biases that emerge when a model trained on generic data is applied to specific internal datasets, resulting in inaccurate outputs and unreliable performance.38 In July 2024, a study conducted by a group of researchers from a public university based in China found that the rate of hallucination for LLMs was between 20% and 30%.39 Recent advancements and the implementation of technologies like RAG have helped reduce this rate.40
Gen AI hallucinations can vary and manifest in numerous forms, including visual, textual, and contextual deviations from factual accuracy, which can range from benign creative expressions to problematic misinformation.41 The challenge is further complicated by a lack of transparency in the training processes of public gen AI models. This opacity can make it difficult for organizations to fully trust or understand the underlying foundations of these models.42
To better understand the importance of model accuracy and trust as the primary reasons for selecting a gen AI model, we further analyzed executives’ responses to Deloitte’s third quarter 2024 State of Generative AI survey. The analysis revealed that organizations with a high emphasis on overall trust and accuracy prioritize concerns related to compliance with regulations, governance models, and risk management. These strategic focus areas can enhance the overall adoption and efficacy of AI initiatives within their organizations (figure 4).
How can organizations work to combat model opacity and hallucinations?
Strong, continuous human oversight is vital to tackling hallucinations, which stem from gaps in contextual awareness across digital and AI transformation efforts.43 Among respondents in a Deloitte digital transformation benchmarking survey, the chief technology officer is still the most common primary owner and driver of digital transformation.44 However, according to our interviews, many organizations are discussing if a chief strategy, transformation, AI, or data officer may be needed to augment digital transformation efforts given the scale of data transformation and quality review required of AI and gen AI solutions.45
Below the C-suite and at the task level, a human-in-the-loop approach brings in skilled individuals to oversee outputs and identify when they may deviate from expected outcomes.46 Once identified, the organization can use a hallucination ontology framework to categorize and document different types of hallucinations and their metadata for further analysis and solutions.47
Data and automation are another approach organizations are taking. For example, automated prompt writing tools can help reduce model hallucinations by generating queries that provide clear instructions and context. Google’s “generate prompt” feature creates ready-to-use prompts with helpful placeholders, guiding the model to produce more accurate and trustworthy outputs.48 Likewise, automated context-limiting features can restrict conversations based on set parameters (time, word count, lines of code, etc.),49 allowing time for human oversight and intervention.50
More advanced machine learning approaches are also emerging, including the use of adversarial solutions, unsupervised learning, and anomaly detection to detect potential inaccuracies and threats, enabling models to self-correct and improve their robustness without constant human involvement.51
Finally, leaders can put quality assurance controls in place. One example is recursive abstractive processing for tree-organized retrieval and conformal abstention (which bounds hallucination rates). This can integrate verified information and provide mechanisms to abstain from uncertain outputs.52 Other leaders have constrained a prompt’s scope to predefined parameters to keep responses grounded in factual information or known concepts.53
Simple “summarize this” prompts may require multiple iterations to get the user’s intended results. From a usability standpoint, having to ask the same question 15 different ways can be taxing and can impair adoption. Moreover, during the training stage, this issue can contribute to considerable retraining costs, which leaders at RBMA, an ontology platform, estimated for their business could be as much as US$2.8 million per major model iteration.54 The computational overhead of processing repeated prompts during inference can also add up significantly, with the company reporting an average of 3.5 prompt iterations per user query and costs of roughly US$0.03 per inference. Given the company may handle 100,000 monthly requests, they could face an additional US$105,000 in monthly compute costs from inefficient prompting alone.55
A library of approved pretrained prompts and set search and prompt instructions and limitations can also help.56 Preconfigured prompts can reduce this troubleshooting by limiting the universe to prompts that work, improving user experience, and lowering that workload footprint. One interviewee spoke about a generative AI project where this approach yielded efficiencies of as much as 15 times compared to using nonstandardized or ad-hoc prompts.57 Confidence in accuracy may dictate whether the organization allows gen AI systems to process only pretested and approved queries.58
Moreover, some emerging model transparency startups are offering solutions aimed at addressing this challenge. They draw on advanced statistical and explainability frameworks to dissect and analyze the decision-making processes of AI (and gen AI) models. Detailed interpretability reports identify potential biases, inconsistencies, or vulnerabilities in model outputs, providing greater transparency and trust in the underlying logic of AI systems.59 Given it is impossible to predict every potential outcome of a probabilistic system like generative AI, transparency into how decisions are made, and an audit trail of prompts and outputs can help build trust over time.
The rapid adoption of gen AI represents both a transformative opportunity and a strategic challenge for organizations across industries. This research has highlighted four emerging challenges at the intricate interplay between data integrity, multimodal integration, model accuracy, and the governance frameworks required to sustain high-quality outcomes in gen AI–powered initiatives. Practical solutions such as data ontologies, advanced retrieval-augmented generation techniques, contextual awareness, and human oversight are all important components of unlocking gen AI’s full potential while working to avoid costly missteps.
While leaders are navigating these challenges and developing emerging solutions, it’s important to not lose sight of the fact that there is also tremendous progress happening with generative AI implementations. One important area is agentic AI solutions, which can drive actions analyzed, planned, and orchestrated by gen AI–enabled systems. Automated reasoning solutions are one way to make the processes that AI agents automate more deterministic. These agentic solutions should also integrate with the organization’s technology landscape to drive certain actions—for example, analyzing sales and prospect data to automatically trigger a meeting with customers. Trust is expected to be important to allow these solutions to work autonomously with human oversight. Organizations that take steps toward addressing these four challenges may be paving the way for success in each area.
Looking ahead, Part 3 of this series will highlight four risk dimensions that enterprises may face due to gen AI—including risks to the enterprise, gen AI capabilities, adversarial AI, and the marketplace—and the solutions and strategies that can help manage these risks. Together, these insights can empower leaders to engineer solutions responsibly and sustainably in the age of gen AI.
The research combines both qualitative and quantitative approaches to identify the top data and model quality challenges and leading practices emerging from gen AI implementations. The qualitative analysis is based on in-depth, structured interviews with 12 Deloitte leaders as well as one external specialist, where structured thematic analysis offers nuanced insights into key challenges and opportunities associated with gen AI adoption. These findings are complemented by an extensive literature review across academic and industry publications, original qualitative analysis that included interviews with 24 global leaders, and quantitative analysis of data from two large-scale surveys conducted for Deloitte’s reports of The State of Generative AI in the Enterprise: one conducted with 1,410 global leaders from September to October 2024 and another with 2,770 global leaders from April to June 2024. The analysis examined top concerns specific to organizations with high or very high LLM use (figure 4).
To explore the importance of model accuracy and trust as the primary reasons for selecting a gen AI model, we analyzed executives’ responses to Deloitte’s third quarter 2024 State of Generative AI survey and classified them by their primary concerns. We filtered the responses to include only those organizations that prioritized these factors. We then reflected on responses concerning the top five barriers to gen AI adoption. This is to ensure that the groups were mutually exclusive, focusing the analysis on two key aspects—those highly concerned with model accuracy and the overall level of trust in the model. Together, these methods provide a multidimensional view of emerging data and model quality challenges and solutions, supported by large-scale empirical evidence.