The importance of well-crafted rules should not be overstated when it comes to unlocking the potential of any market. In the case of generative AI, the absence of clear regulatory conditions may cause vendors, enterprise customers, and end users to hesitate. However, the European Union (EU) is expected to set the stage for global regulation of generative AI in 2024, not only influencing its own markets but also serving as a template for other regions.
In 2024 two EU regulations are expected to help shape the growth of the generative AI market in the region and further afield. These are the General Data Protection Regulation (GDPR),1 which has been applicable since 2018, and the upcoming EU AI Act (AIA), expected to be agreed to in early 2024. As generative AI opens up debates on how to manage issues of individual consent, rectification, erasure, bias mitigation, and copyright usage, the industry’s trajectory could be shaped by how organizations and regulators view, enforce, and manage areas of contention.
Despite potential challenges, collaboration in the form of open and transparent conversations between industry and regulators is likely to result in a pragmatic approach that balances regulatory compliance with fostering innovation in generative AI. This would continue the pattern of discussions held in 2023, which saw interventions by regulators in the European Union and other markets. Vendors have adjusted their approach to meet with regulators’ requests; regulators have enabled innovation.2 By addressing the concerns raised by EU regulations in 2024, while promoting the benefits of core technologies, the generative AI market is expected to continue to evolve productively.
This prediction is focused on EU regulations on generative AI as it is likely to be among the first set of agreed-to regulations that will have a global impact.3 In recent years, there has been a clear “Brussels effect,”4 with EU regulations having global ramifications, and we expect a similar impact from EU regulations that would cover generative AI.5 The extraterritorial impact is likely to have varied impacts:
The majority of EU regulation pertaining to generative AI should become relatively clear by the first quarter of 2024.
In 2024, the direction of European regulation on generative AI is likely to become far clearer. At this time, the industry should have sight of the agreed text of the AIA, which complements the GDPR.8 All companies looking to offer or deploy generative AI solutions should monitor developments in the AIA while also maintaining compliance with the GDPR.
The process for final agreement is in three stages; at the time of writing, two stages had been finalized, with the third and final phase pending the outcome of the “trialogue” between the EU Council, Parliament, and Council.
There are specific terms that are applicable to the European Union’s regulation of generative AI, and it is important to define these. The critical components and types of players that the European Union has defined for the purposes of regulating generative AI within the AIA are:
There are two types of entities that will be in scope:
This prediction will firstly focus on the GDPR, whose obligations are known, and then on the AIA, whose shape is forming but not yet finalized.
Generative AI is expected to need to comply with the GDPR on processing of personal data. The GDPR, which came into effect in May 2018,15 defines the rights of “data subjects”—that is, individuals whose personal data being processed could be used to identify those people.
A fundamental tenet of EU regulation is that individuals’ personal data use is grounded on applicable legal grounds, with lawfulness of processing to be maintained for each processing activity.16
This requirement may seem to clash with the core approach of generative AI, which is based on foundation models. Each model is trained on massive quantities of raw data—the more the better. A large proportion of this data—the exact share varying by model—may require consent per some interpretations of EU law. The largest foundation models may have been trained on petabytes (thousands of gigabytes, or GB) of data.17 Earlier ones, including GPT-3, were trained on 570 GB of data.18 Generative AI applications in any medium—text, image, code, or other—create content using the knowledge within each foundation model.
Given the vast number of people whose data may have been used, obtaining individual consent, where required, becomes a complex exercise. Furthermore, as each foundation model supports an effectively infinite number and range of applications, requesting permissions for each additional purpose is even more unrealistic.
However, obtaining individual consent might not be mandatory. “Legitimate interest” may prove to be a sufficient “lawful basis” that permits training of the foundation models that drive generative AI.19 A legitimate interest exists when there is a compelling reason for processing and that processing of the data is the only approach to achieve the desired outcome.20 Regulators are likely wanting to see that organizations have conducted the appropriate evaluations to ensure that claimed legitimate interests and individuals’ rights and freedoms are balanced.
Furthermore, obtaining individual permissions may well be considered a “disproportionate effort.” An acceptable middle way may be mass-market communication. This was one of the steps requested in April 2023 by the Italian regulator, the Garante, of OpenAI to permit it to reinstate service.21 It placed an obligation on the data controller (the nominated person responsible for the foundation model) to launch an awareness campaign in broadcast and online media. This was meant to inform users that personal data may have been used and explain how such data could be deleted via an online tool.
Regulators might view positively that the intent of training is specifically to create better inferential capability that can then be deployed in generative applications (such as OpenAI’s ChatGPT, Stability.ai’s DreamStudio, or Adobe’s Firefly).
The European Data Protection Board may provide more clarity on the issue of consent, among other contentious areas, in 2024.22
GDPR includes a suite of rights with regards to personal data. If data is incorrect, an individual can ask for it to be corrected. If the data subject no longer wants their personal data to be associated with or processed by that organization, they can ask for it to be deleted. These rights have been well known since GDPR came into force. Addressing such requests may cost organizations thousands of dollars.
The foundation models that underpin generative AI are trained on myriad websites that may contain errors. The training process is a single event during which errors can be absorbed into the model. Updating the model to reflect rectifications or other changes could be done most accurately by retraining the model, but this implies substantial costs and time.23
The approach that’s likely to be used to satisfy this requirement will be to use negative feedback loops to fine-tune the model.24 If an original data point is determined to be wrong, weighting applied to the erroneous data point can be changed to minimize the likelihood of that data point reappearing. Feedback loops are imperfect but may be considered appropriate. That said, it is not certain how this approach might work in the case of class action challenges, which may require large swathes of data to be deleted.
The idea of data minimization is that collection of personal information should be limited to what is strictly relevant and necessary to achieve a specific task, and as soon as this is complete, the data should be deleted.25 This approach may seem to be at odds with foundation models, with their efficacy related to how much data they can query, with more being better.
However, the principle of data minimization may still be compatible with generative AI if data is de-personalized, for example, by using approaches such as pseudonymization (swapping personal identifiers with placeholder data, which reduces, but does not eliminate, data protection risks) and anonymization (deleting identifiers, which means data is no longer “personal”).26 Using these approaches, the volume of training data can be maintained, but full anonymization may be challenging. Organizations should have an appropriate framework in place to assess and explain to and assure the regulator how they determine what is necessary.
The size of foundation models is linked to statistical accuracy, which is an element of proposed EU regulation included in the AIA.27 In an AI context, accuracy refers to the quality of outputs generated. With a foundation model, the greater the volume of good training data, the more accurate the results should be.28
The next part of the prediction considers the possible impact of the AIA on generative AI.
As mentioned earlier, the EU Parliament finalized its position in June 2023, and this included specific regulation for generative AI. The final version of the AIA, expected in early 2024, may include variations to the Parliamentary position.
The Parliamentary agreement included the following elements:
Additionally, providers of FMs used in generative AI systems and providers who specialize an FM into a generative AI system should:
The AIA aims to minimize bias within AI systems. This includes the suppression of human bias. Foundation models may have been trained on biased content such as biased text related to gender, race, or sexual preference for example.
Training data is also likely to include language biases, with most content written in English, with additional biases resulting from the preponderance of content ingested from writers of a specific gender, ethnicity, social class, degree of education, and income group.29 Historical biases used to train foundation models could, therefore, generate content that repeats or even accentuates those biases.
Regulators are likely to require that biases be mitigated via any of a variety of techniques, including weighting or the inclusion of synthetic data that can balance out bias.30 Data controllers—which could be both the AI developer and the AI deployer—are likely to be asked to document “traceability,” which explains steps taken.31
In 2024, further clarity is likely to be needed regarding the use of copyright content.32
Existing EU law may permit usage of copyright data for training, specifically “instances of text and data mining that do not involve acts of reproduction or where the reproductions made fall under the mandatory exception for temporary acts of reproduction.”33 The AIA draft requires that copyright works used for training be listed.
The EU recently, via the Digital Single Market Directive,34 introduced permissions for use of text and data mining for scientific research and for commercial lawful use; although for commercial use there is a right to “opt out” of that permission. Content owners, including several media companies, have exercised that right to opt their data out of AI training.35 As of April 2023, more than a billion items had been removed from a training set for the Stable Diffusion v3 model.36
The AIA focuses on risk assessment of each application. This runs counter to the general-purpose nature of foundation models.
However, there may be a distinction between systemic foundation models (SFMs)—those whose impact represents a system risk—and others, following an approach used in the EU Digital Services Act when categorizing types of online platforms and search engines.37 Designation as an SFM is likely to be made according to the quantity of computing resources required to train the model, the type and cost of training inputs used, and its likely market impact. SFMs are likely to have a greater degree of due diligence obligations.38
Another possible outcome is that the AIA may establish some baseline requirements applicable to all foundation models (e.g., around transparency and technical documentation), with additional requirements if foundation models are used for high-risk use cases.
European regulation matters. It is likely to have extraterritorial and regional impacts. At first glance, several existing principles of EU regulations that apply to digital services may have seemed to present major obstacles to the growth of the generative AI market. Indeed, some commentators had likely expected generative AI to be incompatible with EU guidelines.
How generative AI will shape up in the years ahead and what impact it could have is still unknown. It may be several years before the scale and nature of its impact is certain. In 2024, and beyond, vendors and regulators are likely to aspire to collaborate to attain an outcome that works for consumers, enterprises, vendors, and society in general. Governments are acutely aware of the importance of enabling innovation in generative AI—for example, via regulatory sandboxes.39
In 2024, as generative AI applications evolve and the resulting legal challenges become clearer, the direction of the regulatory response may become more evident. Generative AI is likely to remain an emerging sector this year, which can make it hard for regulation to be explicit at this stage. There will likely still be core questions to address, such as the responsibilities for providers of generative AI versus deployers, when each is a separate entity.