Regulators’ perspectives on XAI
Most financial regulators do not mandate where and how banks can use black boxes, but some, such as Germany’s Federal Financial Supervisory Authority, have advised institutions to weigh the benefits of choosing a more complex model, and document why they decided against more interpretable options.17 In addition, financial watchdogs have recommended that banks run traditional models alongside sophisticated machine learning models, and assign analysts, or a “human in the loop,” to address major discrepancies that arise between the two.18
Engaging with regulators will be important for banks to continue developing advanced AI applications, since oversight groups are increasing their scrutiny of machine learning in every corner of the globe. Bank regulators in North America,19 for example, have solicited feedback on banks’ explainability practices, and the degree to which their limitations can impact model risk management. Recently, five US agencies formally requested information on how banks manage AI risks, including when a “lack of explainability” raises uncertainty about the soundness and reliability of their machine learning models.20 In addition, the watchdog for consumer protection in financial services expanded its policing of discriminatory practices to include the ways in which banks use models and algorithms to advertise and sell products and services.21 Meanwhile, some policymakers in the European Union22 and Asia23 have passed regulations that allow customers to request an explanation for any decision generated by AI and learn how their personal data was used to determine it.
Many regulators are taking a pragmatic approach, relaying that there is no “one size fits all” formula to assess these trade-offs. Instead, they suggest that banks should weigh the purpose of the model, the environment in which it will be deployed, and the goal of explainability. Some have indicated that it may be acceptable for banks to use opaque models to test theories, anticipate liquidity needs, or identify trading opportunities, so long as they use more interpretable models when acting on predictions.24
Other instances where explainability may not be a priority is in the application of optical character recognition (OCR) systems that extract information from scanned documents, or natural language processing technologies that wade through contracts and legal agreements.25 Similarly, banks may not need to seek a high degree of explainability for algorithms that yield accurate outcomes when identifying fraudulent transactions.26
A playbook for implementing XAI
Implementing XAI more broadly across the enterprise is a multifaceted and multistep process, requiring potential changes to data sources, model development, interface with various stakeholders, governance processes, and engagement with third-party vendors. However, this may be easier said than done, since there are no commonly accepted practices to delineate how much explainability is needed for different machine learning applications, and which techniques should be pursued in light of those considerations. Nevertheless, there are several goals that should be central to banks’ implementation of XAI:
- XAI should facilitate an understanding of which variables or feature interactions impacted model predictions, and the steps a model has taken to reach a decision.
- Explanations should provide information on a model’s strengths and weaknesses, as well as how it might behave in the future.27
- Users should be able to understand explanations—they should be intuitive and presented according to the simplicity, technical knowledge, and vocabulary of the target audience.28
- In addition to insights on model behavior, XAI processes should shed light on the ways in which outcomes will be used by an organization.29
Establishing XAI as a formal discipline can put banks on the fast track to achieving these objectives. This will likely mean introducing new policies and methods, from the premodeling stages to postdeployment monitoring and evaluation. It will also require every stakeholder who contributes to AI model development to act purposely and intentionally with each decision they make. For example, developers should apply explainability principles to their choice of training and input data for model prototypes. Instead of focusing solely on datasets that will maximize performance, they should also consider whether the input or training data may perpetuate hidden bias (e.g., historical lending data may favor certain demographics that had easier access to credit), whether the data contains customers’ personal information, and if it spans a long enough timeframe to capture rare or unusual events.
Model development teams should also conduct a preliminary assessment of model performance and interpretability, to get a sense of how accurate the model will be compared to simpler and more traditional analysis methods. This deliberation should begin in the premodeling stage, so designers can tailor machine learning architecture to the target explanation. In some cases, banks may want the model to be transparent to all users, and will prioritize an interpretable design (“glass box,” or “ante-hoc explainability”30). In others, they may build a complex model, and either apply XAI techniques to the trained model (post-hoc explainability) or create a surrogate model that emulates its behavior with easier-to-follow reasoning.
Either way, banks should assess which techniques and tools are most helpful in advancing explainability. There are several factors that should drive these decisions, leading with the target stakeholder: regulator, risk manager or auditor, business practitioner, or customer (figure 3). For example, underwriting officers can be served well by counterfactual explanations, which show the degree to which different aspects of a customer’s application should be tweaked to change the outcome (e.g., increase income by a certain amount to gain loan approval).31 Other bank employees may need “an explanation of an explanation,”32 or visualizations that map out patterns and flag anomalies in the data, such as groups of individuals that may be inappropriately segmented for marketing campaigns. There are also varying levels of explainability that should be taken into account (see sidebar, “Varying levels of explainability”).