The work of the Danish Agency for Labour Market and Recruitment (STAR) lays the foundations for an efficient labour market policy, benefiting citizens and businesses in Denmark. Deloitte has delivered an ambitious synthetic data test project to STAR’s IT development organisation - namely STAR City.
STAR’s primary vision is to move the test environment from on-premises to the cloud while safeguarding the privacy and security of citizen data, thus complying with GDPR requirements. Deloitte helped develop a model for synthetic data that could scale across STAR’s applications and not least their complex data landscape. STAR’s ambition is further to implement extensive modernisation programmes.
STAR is currently conducting a wider streamlining of their development processes and technologies, thereby initiating their cloud journey. One of the technological frameworks STAR utilises is a so-called digital twin of all individuals older than 15 with a Danish CPR number. The digital twin is used to facilitate the handling of cases in municipalities and to develop new evidence-based legislation within the employment sector. This involves extensive research and analysis that plays a crucial role for politicians in shaping employment policies.
Faster testing and earlier error detection are the reasons synthetic data is utilised, with Deloitte being the partner delivering expertise in a complex environment. The benefit of a synthetic data model is not only compliance with GDPR requirements, where testing can be conducted without citizen data - a cloud-based test environment also offers greater security and efficiency.
The use of synthetic data has enabled STAR to start the modernisation of their applications. STAR aims to relocate their development processes to the cloud. In order to facilitate the use of STAR’s development tools and processes in the cloud, STAR is transforming their test data into synthetic data, thereby ensuring compliance with GDPR.
STAR will be the first authority in the public sector to take this approach. The long-term vision is to develop a service that can be utilised across the public sector with an innovative development model utilised to create a basis for a best case.
In the first phase, Deloitte’s data scientists developed an MVP (minimum viable product) of the synthetic data generator model along with a user interface. Deloitte also developed and implemented software for automatic data model documentation and reporting tools to check data quality and privacy.
Developing synthetic data as a new tech offering
Over the past three years, a multidisciplinary team has developed synthetic data as a new tech offering across Deloitte. Synthetic data allows our clients to reduce the need to work directly with sensitive data, thereby enhancing citizen privacy.
It also enables clients to generate supplementary data to train more scalable and robust AI models – which will require moving from current local AI pilots to actually doing AI at scale. In short, synthetic data can help clients lower the barriers to building AI-first products while at the same time maintaining privacy. We expect GDPR's data minimisation requirement to be a major adoption driver in the public sector.
Gartner estimates that by 2024, 60 per cent of all data used in the development of AI will be synthetic. Jesper Kamstrup-Holm, leader of Deloitte’s public sector team, says: “Synthetic data is a catalyst for unlocking AI’s potential, empowering better AI model training, and reinforcing AI security. Early adopters are organisations working with sensitive data, but we expect synthetic data to gain much wider adoption as a key enabler for AI development and data-driven innovation – a trend further accelerated by the development in generative AI”.
Deloitte is expecting synthetic data projects to shift towards more integrated plays, where synthetic data is a "peak capability" – combined with existing strongholds such as software development, AI development, and test automation.
Innovative use of generative AI
The creation of synthetic data employs new and powerful techniques, such as Generative Adversarial Networks (GANs), where two AI models compete against each other – one trying to generate lifelike data and the other trying to differentiate between fake and real data. Through many iterations, the generative model learns to mimic the real data well enough to deceive the other model without ever having had direct access to real data. Utilising this innovative approach, our team has generated synthetic images, tabular data, and even synthetic text in a project for Region Zealand. Uniquely, the team has also pioneered the development of a standardised measurement tool for reidentification risk in all types of anonymised data, allowing Data Protection Officers (DPOs) to make qualified decisions on data sharing.