Measuring fairness with GlassBox | Risk | Deloitte Netherlands

Blog

Measuring fairness with GlassBox

A toolkit to create transparent and responsible AI

Any reasonable person would agree that companies should treat their customers fairly. Many anti-discrimination laws exist to enforce this. However, ensuring that Artificial Intelligence (AI) algorithms create fair outcomes, is a complicated question. With GlassBox, Deloitte has developed a toolkit that allows companies to measure and objectively assess the fairness of their AI models

The importance of fairness

Which applicants will be granted a loan or a mortgage, and which ones will be denied? Which customers are more likely to commit fraud and should be put under extra scrutiny? Many financial service providers have outsourced decisions like these to AI algorithms, says Bojidar Ignatov, manager Financial Risk Management at Deloitte. AI-powered algorithms can make business processes faster and more efficient, help to improve customer experience and optimise profits. But the use of AI comes with risks, says Ignatov. “AI models can get very complex, which can make it hard or even impossible to grasp how it gets to a certain outcome. We call this the ‘black box’ of AI.”

These days, awareness for fairness is growing. With a mindset towards equal treatment for everyone, growing anti-discrimination movements and the enormous amounts of personal data that are being used in the corporate world, companies have to scrutinise the models they use for decision-making processes. “If they fail to do so, they risk breaching various anti-discrimination laws, which can result in fines of up to tens of millions of euros,” says Ignatov. “Worst of all, they risk losing the trust of their customers and suffer serious reputational damage. Discrimination is undefendable.”

Deciding what is fair in the context of AI-based decision-making tools, though, is not as easy as it may sound. Deloitte’s GlassBox toolkit can help companies in creating transparency and fairness in their AI models. Ignatov has been involved with developing the GlassBox toolkit. In a previous blogpost, he explained how GlassBox can help to ‘open the black box’ and create transparency in AI. In this blogpost he delves into the question of how GlassBox helps to create fairness in the context of AI-powered algorithms.

Sensitive attributes and proxy variables

How can companies ensure that their use of AI models is considered fair? First, they should check whether their use of AI-powered algorithms complies with anti-discrimination laws, says Ignatov. “Various anti-discrimination laws prohibit discrimination based on age, gender, religion, citizenship, disability, pregnancy, colour or national origin,” he says. “And customers expect companies to adhere to the highest ethical standards, which increases the number of sensitive features beyond the legal ones, for example educational level or whether the customer has children or not.”

To avoid discrimination, AI-powered algorithms should avoid decisions based on these sensitive attributes, he concludes. But this is not as simple as it may sound. “You need to be aware that models can unintentionally discriminate based on proxy variables,” explains Ignatov. Proxy variables are input variables that have a high correlation with a sensitive attribute. For instance, a company might work with client data with features such as gender, age, where the client lives and whether they watch Top Gear on a regular basis. “Gender and age can be excluded from the model, but sometimes, that’s not enough,” he says.

Where you live and whether you watch Top Gear or not might seem innocent. Until it turns out that some residential areas are occupied by specific groups of people that might be discriminated against. Or the fact that Top Gear is mostly watched by men, which makes this feature strongly correlated to gender. “If you want to create fair outcomes, you need to pay attention not only to sensitive attributes, but also to proxy variables,” Ignatov concludes.

Defining fairness

Deciding whether the application of AI-powered algorithms is fair or not is very case-specific. “Sometimes the use of sensitive attributes can be easily justified,” says Ignatov. “For instance, no one would argue that an algorithm that exclusively targets men with advertisements for men’s clothes is unfair.” But the lines blur fast, he warns. “In a credit application, should the same number of men and women get approved? If more men apply, should the approval rate be the same? What if the women that apply on average earn a lot more?” These questions do not necessarily have only one correct answer, he notes.

If you want to create fair outcomes with AI-powered algorithms, companies need to have a clear understanding of what they are aiming for, says Ignatov. “They need to define what fairness means for them in typical use cases. This is a delicate process that requires careful reflection.” How fairness is defined, subsequently determines how fairness can be quantified, and what methods can be used.

Three types of fairness

In line with the online handbook Fairness and Machine Learning (by Solon Barocas, Moritz Hardt and Arvind Narayanan), the GlassBox toolkit uses three types of fairness: Independence, Separation and Sufficiency. Independence requires that the acceptance rate is equal in all groups. The second type, Separation, requires that all groups experience equal false negative rates and/or false positive rates. The third, Sufficiency, requires consistency of positive/negative predictive values across all groups.

Which type of fairness applies depends how a company defines fairness for a specific use case, says Ignatov. For example, think of a situation in which a company wants to use an AI-powered algorithm to screen CVs for leadership positions. One way to define fairness is to strive for an equal outcome of men and women. “In that case, you use the first type of fairness, and you have to calibrate your model to create a 50/50 outcome,” he says.

The company can also decide to give men and women equal opportunity, regardless the outcome. “You can use the second type of fairness, Separation, for this view,” says Ignatov. “This requires that the model results in equal proportion of ‘mistakes’ for both men and women.” The third type of fairness, Sufficiency, can be applied as well. “In that case, the score should reflect the candidate’s capability of doing this job. Both men and women in a range of outcomes predicted by the model should find the same average realised value.”

Within each type of fairness, various methods can be used to mathematically represent fairness. Depending on the use case, well-known methods can be applied, such as Demographic Parity, Group Fairness, Disparate Impact, Statistical Parity, Equal Opportunity, Equalised Odds, and others. “These result in fairness metrics, which can be used to guarantee fair outcomes of AI-driven algorithms,” says Ignatov.

Fairness metrics

By now we know that AI-driven algorithms can be biased. We have argued that companies need to carefully think about how they want to define fairness in the context of a specific use case. And they must select a method to measure fairness, which will result in a number. But what does that mean? How can companies use fairness metrics to ensure that their AI-driven algorithms create fair outcomes?

There is one important last step to be made, says Ignatov: Companies have to define what fairness and discrimination mean to them in statistical terms. “It’s not only about calculating, but also about setting the limits,” Ignatov says. “What outcomes are still considered acceptable, what outcomes are unacceptable, and what are the consequences?” The GlassBox toolkit contains methodology that statistically defines limits for all fairness metrics, he explains. “By this, it becomes very clear what you aim for, and whether you are living up to your own ambitions.”

Last but not least, companies have to adopt a continuous evaluation process to ensure that the outcomes of AI-driven algorithms remain fair over time. This is particularly true for algorithms that use machine learning and periodically recalibrate themselves based on the fresh data, says Ignatov. “Machine learning adds extra complexity to AI-powered algorithms. Once a machine learning algorithm has been running and learning for a while, companies need to check whether they are still working as intended and whether the outcomes are still considered fair.”

Actions for companies

Creating fairness in AI algorithms is complicated, but it is possible to break it down and make it manageable, concludes Ignatov. Companies need to evaluate anti-discrimination laws, as well as their company policy and regulations, and check if they comply with this. They have to decide how to define fairness in a certain use case, choose the right method to quantify this definition of fairness, and define the statistical limits of what is acceptable. And finally, they have to adopt a continuous monitoring process which ensures that AI-algorithms stay in control.

Deloitte can offer guidance with all these steps, says Ignatov. “The GlassBox toolkit offers a wide variety of methods to create an objective standard to deal with the complex question of what is considered to be fair when using AI.”

Unbox the box of AI with GlassBox

Deloitte has developed GlassBox: a toolkit that looks inside the proverbial ‘black box’ of AI-powered algorithms. Glassbox is designed to create transparency in Artificial Intelligence.

More information

For more information about Measuring fairness with GlassBox, please contact Roald Waaijer, Bojidar Ignatov or Benjamin Chroneer via the contact details below.
 

Did you find this useful?