Article

Concepts on AI fairness

Reflections on how this can be applied to machine learning algorithms

This article will feature Paul Garel-Jones interviewing Michelle Seng Ah Lee on AI fairness and how the concept can be reflected in the algorithms used by businesses.

Featured: Michelle Seng Ah Lee, AI Ethics Lead at Deloitte UK and PhD Candidate at Cambridge Computer Lab 

Interviewer: Paul Garel-Jones, UK Risk Analytics Lead Partner & Global Financial Services Co-Lead for Data & Analytics, Deloitte 

Introduction to author

Michelle Seng Ah Lee is a Computer Science & Technology PhD candidate at the University of Cambridge in the Compliant and Accountable Systems group and a Senior Manager within Deloitte's Risk Advisory group focused on AI Ethics. My research focuses on developing a framework for an end-to-end, context-aware evaluation of algorithmic fairness in machine learning (ML) lifecycle.

Q: Why is algorithmic fairness an important topic now?

ML-driven products are increasingly being used to inform important decisions. Algorithms can predict who will default on a loan, who should be hired, and what price each customer is willing to pay for a product or service. An AI recruiting tool trained on historical data may be biased against women, and it has been found that facial recognition software in self-driving cars are worse at detecting darker skin colours. When trained with these biases, algorithms can replicate them at-scale, exacerbating inequalities at an unprecedented scale.

Q: Can’t you just exclude this information, such as gender and race, from the data set?

In the past, it may have been possible to avoid liability through an approach we call “fairness through unawareness,” but much academic work has been done to show that this is inadequate. Algorithms, especially complex ones using ML, can learn much more from data than humans can digest, identifying patterns in the predictions in unexpected ways. Those patterns are sometimes associated with who we are, including our race and gender. There is a risk that algorithms may identify patterns that exist because of past discriminatory and/or exclusionary practices.

Q: Fairness seems like an intuitive notion. Why not just define what we mean by fairness and apply it to the model?

Scholars have introduced numerous definitions of fairness and their corresponding mathematical formalisations, such as equal odds, positive predictive parity, and counterfactual fairness. Practitioners have adopted these definitions to produce reports to show pass/fail results for each of these conditions.

However, in reality, it is not that simple. No algorithm can pass all of these tests, as these definitions are mathematically incompatible to each other. Choosing one requires foregoing another. Selecting a definition is problematic in itself because fairness is not a binary and absolute, one-size-fits-all condition. It is a complex notion debated among philosophers for millennia, from Aristotle to Rawls.

Q: Then, is the reason why it is difficult to define because it is so context-specific?

Yes, and there is considerable disagreement among us as consumers on what it means to be fair, and there are often multiple competing objectives to consider in a decision.

In one of my latest papers ("From fairness metrics to Key Ethics Indicators"), I write about the gaps between how computer scientists are defining fairness compared to how it is defined in ethical philosophy and in welfare economics. Both fields have long debated the contexts, exceptions, and nuances of fairness, which is not something we can distill into a mathematical formula.

Q: Can you give a real-life example of when two fairness definitions are at odds with one another?

Yes, the idea is that by choosing one metric, the model can become biased in other ways. An algorithm widely used to forecast future criminal behaviour was under scrutiny because black offenders were found to be twice as likely to be incorrectly labelled as having a higher risk of repeat offending than white defendants. However, the company that created the algorithm maintains that it is non-discriminatory because the rate of accuracy for its scores is identical for black and white defendants. While both perspectives sound fair they are based on different perceptions of what fairness means, and it is mathematically impossible to meet both objectives at the same time.

Q: Is there a technical solution out there to fix unfairness?

There are “de-biasing” techniques out there, but for most real-life use cases, they are unsuitable. There are techniques for pre-processing (purging the data of bias prior to training the algorithm) and for post-processing (bias correction in predictions after algorithm build), sacrificing accuracy for greater equity. The idea is that if the unwanted bias can be measured mathematically, it can be surgically removed. As I said before, it is difficult to formally define fairness mathematically, especially when the proxies of demographic features are intertwined with proxies of the outcome.

Q: Are there non-technical solutions?

Yes, and they depend on where the bias comes from in the process. Bias doesn’t only occur in the model building stage, but also from data collection mechanism, differential treatment, and biased feedback. In such cases, de-biasing requires non-technical solutions, such as: a new outreach strategy, change in processes, training of human decision-makers, and an analysis of whether the data is representative of all potential customers.

Q: If AI can be so biased, does that mean we shouldn’t use AI at all in decision-making?

Not at all. These problems are not unique to ML. The alternative to a ML model may be a worse model with poorer performance and a worse impact on minority groups. Fairness should be considered in relation to an alternative, rather than as an absolute goal. It is important to move away from the false simplicity of fairness as a mathematical condition and take seriously the practical and ethical trade-offs in each decision-making model.

Existing human decision-making can also be mired in cognitive biases that are challenging to track; by contrast, an algorithm is inherently auditable, and when the ethical and practical objectives are clearly defined, it is possible to test whether it achieves the desired outcome. This is an opportunity for leaders and regulators to meaningfully define and formalise what it means to implement a fair decision-making system.

Q: Finally, who would you say is accountable for ensuring the risk of unfair outcomes is sufficiently considered and addressed?

A data scientist often focuses on the key performance metrics provided by the business. The initiative in identifying and managing these risks needs to come from the top. The principles of fairness, transparency, and explainability are important, but they are only meaningful when operationalised into risk management processes. Only when AI is appropriately governed will leaders have the confidence to innovate.

By acknowledging that fairness is not a universal concept, my research goal is to make transparent the values, risk and impact underpinning each algorithm. Thus, decision-makers can decide what kind of trade-offs they want to make and what controls and governance need to be put in place.

Did you find this useful?