I.

The need for explainable AI

When we ask artificial intelligence to perform a certain task, the algorithm learns what aspects of the information given to it (so-called features) are useful and how to use them to perform the task. For instance, when a bank algorithm is deciding whether or not to approve credit for a client, it can decide that how much the client earns per month is a relevant factor. However, although people sometimes compare neural networks to the brain, the process that algorithms and humans follow to determine what’s important for a task can actually be quite different. For instance, the algorithm may decide that gender is an important factor when granting credit, while banks in most countries would be legally forbidden from discriminating against clients based on gender. If this seems far-fetched, it really isn’t. Remember the Apple case in Chapter 2? Well, one of the people that the algorithm reportedly discriminated against was Jane Hill, the wife of the company’s co-founder, Steve Wozniak (1).

How do we know that Apple’s algorithm was using gender to make credit decisions? How do we know that it was discriminating? Well, in this case, Wozniak (and others) spoke to the media about their experience when they were comparing how the algorithm responded to their applications compared to their partners who had the same income and application data. But what if you’re affected by the algorithm and you’re not a big-shot public figure? There has to be a better way, right? This is where explainability and explainable AI methods come in. They help us understand how algorithms perform their tasks, without having to rely on real-world consequences or experiences from individuals to get this information. If Apple had used one of the methods described in this chapter before deploying their credit card algorithm, maybe they could have avoided situations like this one. In practice, an explainable AI algorithm may be doing the exact same thing that Steve did, feeding different types of information to the algorithm to see how it reacts (the counterfactual explanations mentioned in 3.2). But by doing it in a systematic way and without requiring actual individual data, it can alert machine learning developers about issues before it’s actually used in practice.

As you’ll see below, explainable AI (XAI) can be used to understand and improve algorithms by developers, but can also be an important tool to improve user experience. Sometimes people are more willing to use AI if they understand how it works. It can also have legal implications by making companies more accountable for the algorithms they use – they can no longer use the “black box” excuse if something goes wrong. In this context, it may seem that XAI is always useful. So you may be tempted to channel your inner Oprah Winfrey and say: “You get XAI, you get XAI, everybody gets XAI!”

Close-up of Oprah Windfrey with her hands spread out
Close-up of Oprah Windfrey with her hands spread out


However, XAI methods may also require a lot of processing power (often more than the AI algorithm itself) and can be complex to build, maintain, operate, and interpret. This involves both financial and environmental costs – so it’s also good to know when these methods are most useful. Therefore, this chapter will introduce you to XAI, the different methods for XAI, and when XAI methods should be used.

If you recall from the first chapter, XAI (and the evaluation of whether it’s needed) come into the picture already at the stage of building a machine learning model and also when training it.

Chart showing that explanation deployment happens in model training and XAI deployment happens in model deployment
Chart showing that explanation deployment happens in model training and XAI deployment happens in model deployment

Finally, it’s important to note that the balance between the performance of an AI model and its explainability can vary depending on the context of the application. For example, there are time-critical applications such as fraud detection where performance and immediate responses might be more important than explainability. For instance, if a construction company deploys a machine learning algorithm to inform its workers when there is a safety hazard, it’s probably more important that the worker gets the information immediately instead of the explanation about how the hazard was determined.

Commonly used AI algorithms as a basis for XAI

AI algorithms identify important data features from the samples which are used for learning. Here, a data feature depicts a pattern or characteristic from the data that helps the AI model make a decision. AI algorithms can learn in a supervised or unsupervised manner. As a reminder from Chapter 1, supervised learning means that the training data for the algorithm is labeled, i.e., mapped to the “correct” output. In unsupervised learning, the AI algorithm itself attempts to discover and learn from implicit patterns in data.

While some “black-box” AI algorithms are very hard to interpret without additional tools, not all of them are like that. Some others, called “glass-box” models, are quite interpretable by design.

So why not just always use glass-box algorithms? Well, there’s usually a trade-off between complexity and interpretability:

  • Complex models may have higher accuracy on problems with complex and nonlinear relationships, but they’re harder to interpret. They also usually take more time to compute.

  • Highly interpretable models, on the other hand, are more straightforward and represent simpler relationships, making them easier to explain and compute. A human can understand their decisions or predictions just by inspecting the structure of the model.

As you probably guessed, the best AI algorithm depends on the application. Complex models aren’t always the most accurate. As a rule of thumb, start with the simplest model that gets the job done. For simple applications where explainability is important, choose an interpretable (glass-box) model. For problems that are both complex and require explainability, you can use a complex model and complement it with an XAI method, but that can come with the cost of extra intricacy and processing power requirements. We’ll explore several XAI methods in the next section (3.2).

But first, let’s examine four AI algorithms in more detail so you’ll get an idea of their workings and what the accuracy-interpretability tradeoff means in practice. Neural networks, support vector machines, decision trees, and random forests differ in their accuracy and interpretability but are all found in various real-world applications. Some of them also power XAI methods, acting as “surrogate” models, standing in to explain a more complex model.

AlgorithmComplexityInterpretability
Neural networksVery highVery low
Random forestHighLow
Support vector machine (SVM)MediumMedium
Decision treesLowHigh

Inspired by https://ieeexplore.ieee.org/document/8844682 (2)


Neural networks

Neural networks are a type of machine learning algorithm that has been modeled and designed to emulate the structure and function of the human brain, although there are significant differences. They’re used for a wide range of applications, including image and speech recognition, natural language processing, and predictive analytics.

Neural networks are composed of interconnected nodes called "neurons", each processing outputs from inputs based on simple parameters called “weights”. The output of one neuron can be used as an input for another neuron, often forming several layers with different roles. This allows it to make complex computations to learn deep features and produce sophisticated decision-making.

For a neural network to be useful, the neurons’ weights have to be carefully adjusted. This is done by feeding it with lots of training data. Considering each entry in the data, a training algorithm tweaks the weights, layer by layer, to minimize the error between the neural network’s predictions and the correct results.

A neural network where inputs color, size, and texture pass three layers and become an output that is pineapple
A neural network where inputs color, size, and texture pass three layers and become an output that is pineapple

In theory, any decision made by a neural network could be explained simply by looking at the weights of its neurons, but there are two practical issues. First, it’s hard to understand the role of an individual neuron or what the numbers in its input or output mean. Second, a neural network can have a good deal of neurons and connections. Google’s Switch Transformer (3), currently one of the biggest, has more than a trillion parameters (that’s 1,000,000,000,000). Still tempted to start digging into a neural network?

You can learn more about the basics of neural networks from our free online course Elements of AI and dive deeper into them in Building AI.

Support vector machine (SVM)

SVM is a supervised machine learning algorithm and is used for classification in cases where there are two target (or output) classes, meaning that it’s possible to identify the most likely class that a particular input data belongs to. To illustrate this, consider a person trying to differentiate between different fruits – in this case, pineapples and apples in a basket containing both types. Here each fruit type depicts a specific class, and a single fruit unit represents a data input. A basket of fruit is then considered a collection of samples that SVM uses to learn the types of fruit. SVM categorizes each of the samples into a respective class (fruit) that’s labeled with the type of the fruit. To do this, each class defines attributes (features) such as color, size, texture, and taste to a specific type of fruit. With this feature information, SVM then draws a separation line called the decision boundary to divide the data samples into two groups. Despite its linearity, SVM isn’t inherently explainable because it often creates spaces with more dimensions than there are attributes in the data. For explaining the outputs of an SVM model, we need help from the explainability methods we’ll present later on in this chapter.

A chart where a bag of fruits becomes a table where apples and pineapples are listed based on texture and weight and given a label. From the table, an arrow points to a regression where apples are on one side and pineapples on the other side of the line.
A chart where a bag of fruits becomes a table where apples and pineapples are listed based on texture and weight and given a label. From the table, an arrow points to a regression where apples are on one side and pineapples on the other side of the line.

Decision trees

Decision trees are one of the most commonly used ‘explainable by design’ machine learning methods. A decision tree is a supervised machine learning algorithm that can also be used for classification (selecting from a set of options) and regression (calculating a value). Decision trees are commonly implemented to support autonomous decision-making for a wide range of different applications, including, for example, criminal justice and finance applications. The structure of a decision tree is rather simple. It starts from a point on top called the root, and from there it breaks down into branches (called nodes). The key idea is that the whole dataset used for learning can be represented as a tree. A tree contains decision nodes and leaf nodes. The decision nodes can split further into other branches until a leaf node is reached. The path to the leaf node from the root contributes to the overall decision which is performed at the end when reaching the leaf node. For instance, following our example about apples and pineapples in a basket, we can use a decision tree to predict the type of fruit that is in the basket based on its color and size. In this process, the decision starts at the root of the tree and passes through two decision nodes, one for color and one for size. After this, the decision path reaches the type of fruit and a decision is made by the algorithm.

Chart where a bag of pineapples and apples becomes a table where the fruits are listed separately with columns of color and size. An arrow from the table points to a figure that shows a decision logic: if the color red is true and if the size of 0.2 is true, the output is an apple. If the color of red is false, the output is a pineapple.
Chart where a bag of pineapples and apples becomes a table where the fruits are listed separately with columns of color and size. An arrow from the table points to a figure that shows a decision logic: if the color red is true and if the size of 0.2 is true, the output is an apple. If the color of red is false, the output is a pineapple.

Random forest

Random forest is a machine learning method that uses a collection of decision trees to perform an autonomous decision. The key idea is that decision trees are trained on subsets of the data and the data features used in the learning process are selected randomly. In our fruit basket example, the algorithm would build several decision trees out of randomly picked fruits in the bag. The random forest method can be used for both classification and regression. For classification, the Random Forest could predict the next fruit based on the average outcome of the different decision trees. For the purpose of regression, we’d get an average output value of the different trees, for example, “the likelihood of this being an apple is at average 95%”. A decision made by a random forest is aggregated from individual predictions performed by each decision tree in the collection. The end decision is determined via majority voting or averaging. In other words, each decision tree would have a ‘vote’ to give for the end result. The main advantage of using this method is that it provides higher accuracy than a single decision tree, especially for high-dimensional datasets with a lot of features (for which a bag of fruits is not the best example, we must admit), and usually performs better on unseen data. In real life, Random Forest could be used to for example predict an average bank user's behavior and the services they’ll use based on previous data (4). It’s not uncommon to see random forests containing hundreds of decision trees – and you can probably already guess what that means in terms of explainability. For explaining random forest models, we also need to make use of explainability methods presented later on in this chapter.

A chart that shows how a decision tree works. An input of color=red and size=0.3 divides into three decision trees and three predictions: one pineapple and two apples. The predictions go through a majority voting into a final prediction that is an apple.
A chart that shows how a decision tree works. An input of color=red and size=0.3 divides into three decision trees and three predictions: one pineapple and two apples. The predictions go through a majority voting into a final prediction that is an apple.

Next section
II. Types of explainable AI