II.

Types of explainable AI

So, now that we know why we need XAI, how do we actually get explanations out of these algorithms? Well, first of all, make sure that you dress in an intimidating manner, then ask your questions in a serious assertive way. After all, algorithms won’t spill their secrets so easily…actually, while this might be disappointing for the gangster movie fans among you, AI completes tasks based on mathematics and statistics – meaning explanations also need to come from mathematical and statistical methods. The notions behind XAI methods are often intuitive to us, like trying our different genders on credit card algorithms to see what makes the algorithm change its decisions, but it’s their mathematical and statistical nature that makes them applicable to different problems and their conclusions clear.

As discussed above, many machine learning algorithms are black boxes. They’re black boxes not only in the sense that many companies keep them secret but, even when the entire machine learning model is available online, the calculations are so complex that we can’t make sense of them by looking at them. Therefore, to understand how algorithms work, XAI methods tend to work with the parts that we have control over (the data and features that are provided to the algorithm), and the parts that we can observe and understand (the outputs).

Imagine that you’re a Youtube influencer (if you don’t have to imagine, please link to this course in your videos!) and you want to understand how Youtube’s algorithm works and what makes a video successful. How would you approach it? Well, there are a few options:

  • You could try to split your videos into parts and see what parts are more successful

  • You could try adding or removing certain types of content in your videos to see what kind of content contributes to increasing views

  • You can try changing different aspects of the video (for example length, lighting, background music, thumbnail) to see what things would make a video surpass a certain number of views

  • You could completely mess up parts of the video (for example in one video you only say made up words, in another one you put the image upside down) to see what actually matters for views and what doesn’t

  • You could look at all of the videos you have published, see which ones were successful and which ones weren’t, and try to figure out what they have in common

  • You could black-out or crop parts of your video to see what parts reduce the number of views

Some of these options sound more realistic than others, but they could all give interesting information about what YouTube is using to rank videos. Actually, each one of the approaches above would correspond to one of the model-agnostic XAI methods we discuss below. In order, each bullet point would correspond to LIME, SHAP, counterfactual explanations, permutation feature importance, T-SNE, and occlusion sensitivity. We focus on these for this chapter because, besides being widely used, the intuition behind them is easy to understand as you see from the YouTube example. In addition to these model-agnostic methods, which can be used for any model, we also explore a few model-specific methods such as layerwise relevance propagation, Grad-CAM, and a library combining some of these approaches (ELI5).

As you read the names of the XAI methods you may be thinking: “Wow, so many fancy names to impress my friends! But do I really need so many? LIME has a cool name, wouldn’t that be enough?” Well, if you’re taking this course to impress your friends, maybe LIME is indeed enough. However, if you really want to apply XAI methods, knowing a few and their specialties will help you choose a more suitable XAI for the task you’re working on. Again, each of these methods is trying to make a data-based guess into how the algorithm works and simplifies it in the process. Because each of these methods only gives you a specific insight into parts of the model, mastering and applying multiple methods is more likely to give you trustworthy information. If you can only look at what’s inside a building through its windows, you might as well peek from multiple windows to get a sense of what the inside looks like. Also, don’t let the Youtube example make you think that XAI is only useful from a user (influencer) perspective – it’s likely that YouTube itself needs to rely on these methods almost as much as its users to understand how its recommender system works.

Model-agnostic and model-specific methods

XAI methods can analyze models in two ways: model-agnostic or model-specific. Model-agnostic methods aren’t limited by the characteristics of an AI algorithm and can be used to reveal the decision process of any black-box model. Simply put, they’re a generic tool for developers to analyze any model. Model-specific methods, on the other hand, are tailored to analyze specific algorithm’s characteristics and provide a more detailed understanding of how a certain algorithm improves model decision-making. However, these methods require a higher degree of skill from developers to analyze the models.


Different XAI methods have varying perspectives on analyzing models and data. For example, the SHAP values of a model can be used to identify positive and negative impacts of data features on the model outcome – like are the impacts of salary and credit card score in making a decision to grant a loan to an applicant positive or negative, and how big is their significance? Another example is counterfactual explanations, where different cases can be generated that eventually show the inputs with which the model gives a specific result XAI methods can also be categorized into global or local explanations.

Local and global explainers

Local explainers are able to explain the model behavior in the making of a specific decision, whereas global explainers look at the general model behavior. To understand this further, let’s consider an example that highlights the difference between local and global explanations. Imagine that you’re using a book app that recommends a list of books to read. In this case, a local explanation of the AI model recommending the books can tell you why a particular book is suggested to you based on your past reading habits or lists that you’ve created. A global explanation, on the other hand, will look at all of the outcomes of a model and how it behaves generally, and for example what parts of the input are the most significant for its outputs – for example, do a given user’s favorited books impact the generated recommendations more than their most commonly read genre? A global explanation puts the model inputs and their respective outcomes together in such a way that it can let you know how they all play a role in the recommendations of books that the model generates. To summarize, global explainers give visibility into the model's behavior in general, whereas local explainers can be used to dig into specific cases of the model's usage and the reasoning behind them.

Explainable AI methods


Here we present some examples of the numerous methods out there. The decision about which of these tools to use depends on the situation and, for instance, the architecture that the model is built on. If you’re completely new to all of these methods, it’s just good to know some of the available methods that there are so that you can study them more closely if needed.

Local Interpretable Model-agnostic Explanations (LIME) is a widely popular technique used in interpreting the outputs of black-box models (5). It makes use of a local surrogate model. As the name suggests, LIME gives a local explanation, which means that it considers a subset of data when approximating explanations for model predictions — in other words, a single case of the model's inputs and outputs. LIME is easy to use and has gained a major uptake as it can interpret outputs irrespective of the type of black-box model (meaning it’s model-agnostic). LIME explains the outcome generated by an AI model for a particular input (sample) by modifying the input to recognize its importance towards the outcome.

LIME can be applied to many different types of data. Let’s take a look at an example of explaining a model that’s used for the classification of images. As you can see in the illustration, to explain the model outcome for the image input, LIME takes the image and segments it into different superpixels (groups of pixels) to reduce the image to smaller interpretable regions. LIME then generates a new dataset containing slightly perturbed, or modified, versions of these regions and feeds this new dataset into the black box model to see what it would output for those instead. Next, it trains a simple interpretable model – this can be any model that is inherently explainable – with the perturbed data in the superpixel segmentation as well as the corresponding outputs of the black-box model. The explainable surrogate model can then be used to create explanations on the explained black-box model, which shows which regions contribute most to the outcome provided by the AI model.

SHAP (Shapley Additive Explanations) is a model-agnostic technique that identifies the importance of each feature in a certain prediction. Feature importances are computed with so-called Shapley values that originate from game theory. The concept is based on distributing a reward (here, a prediction of an AI model) fairly among players (here features). Since the contribution of players to the winning team could be different, the reward should also be based on it. Shapley values are used to identify how features are contributing to the overall decision output by showing if the features are absent or present the steps leading to the decision (6). The picture below shows how SHAP creates an explanation for an image input (note, though, that SHAP can also be used, and often is, on data instances in a table or words in a text document). To explain the AI model output for the input image using SHAP, the image is modified by SHAP based on the pixels of the image. Each pixel is considered a feature to derive its importance to the AI output to be explained. The Shapley value of each feature is calculated to establish their contribution to the generated output by the model. After deriving the contribution of each feature, SHAP generates a visual explanation with a pixelated version of the input image showing important features (pixels) that contribute to the AI decision in red and pixels not relevant to the AI decision in blue.

Counterfactual explanations are a local model-agnostic XAI method that explains an AI model output by answering the question “How should my input X be different to achieve my desired outcome Y?” (7). Let’s try to make sense of it with a diagram. What the diagram shows is a hypothetical use case where a person is trying to apply for a bank loan and the application was rejected. As the figure suggests we can generate two counterfactual explanations such as: “had you increased your income by $10,000, you would have received the loan” or “had you increased your income by $5,000 and improved your credit score, you would have received the loan”. Both of these explanations show how the input features (annual income and credit score) have to be different from the current observed feature values in order for the model to generate the desired output (an approved loan).

Permutation feature importance (8) is a global, model-agnostic approach. It evaluates the predictive power of each individual input feature. This is done by generating the so-called “feature importance” score. This score is computed over the feature by randomly shuffling its values and observing the effects that this shuffling procedure will have on the prediction error. The diagram below illustrates the explanation procedure of permutation feature importance. The procedure is initiated after training the AI model and generating some outcomes (predictions) with the model. This outcome serves as the baseline for comparing subsequent training performance and outcome prediction. After the initial training and observation, the next step is to measure any feature of choice. The feature to be measured is selected from the same dataset used for initial training and permuted (reshuffled). This step can be seen in the second image of the diagram where feature 1 has been reshuffled. Next, another training and prediction is performed with the reshuffled dataset and compared with the initial ones to derive the importance of the feature. The importance score is derived by averaging the differences between the baseline outcome and the new (permuted) outcome. In the end, the importance scores are analyzed to understand which features are the most important.

Permutation feature importance. Inspiration taken from Permutation Feature Importance: Deep Dive

T-distributed Stochastic Neighbor Embeddings (T-SNE) is an unsupervised statistical tool for dimensionality reduction (9). It maps high-dimensional data into an alternative low-dimensional representation (typically a two or three-dimensional space), which makes visual interpretation of these data points easier (10).

Let's consider a simple example to understand why we do dimensionality reduction. Suppose you have a dataset containing information about houses, where each house is described by four features: size, number of rooms, price, and distance to the closest supermarket. Now, imagine you want to visualize the relationship between these houses and their prices. However, plotting a graph with four dimensions is challenging, as we can only visualize up to three dimensions. This means that we can't directly visualize the four features on a single graph.

This is where dimensionality reduction techniques come into play. By applying dimensionality reduction, we can compress the information from the three features: size, number of rooms, and distance to the closest supermarket, into a lower-dimensional representation that can be visualized in one or two dimensions. This compressed representation allows us to explore the relationship between these three features and house prices in a way that is human-visualizable.

By doing this, dimensionality reduction can help identify clusters of data points with similar features and can reveal the underlying structure of the data for the model interpretation. The diagram below illustrates a brief procedure using a number of dogs, horses, and cats image data to show how an AI model recognizes the different animals on a 2-D map.


Occlusion sensitivity is a model-agnostic method and visual-based explainability method. It generates local explanations using maps independently of the underlying model by covering the pixels of input data (in the case of an image) with an occlusion mask or patch, for model class prediction. The variation in the model’s prediction of the occluded input image from the original input image is shown in a saliency (feature) map or heatmap and numerically using metrics of feature relevance. The bigger the impact of masking a certain part of the image on the predicted class, the more relevant it is considered for the explanation. The figure below demonstrates the procedure with pet classification tasks using a dog image. To explain the outcome of the model for the image using this occlusion sensitivity, an occlusion mask is patched over the entire image to observe the impact of the probability of the class prediction. At the end, a heatmap is generated for highlighting different regions that are relevant to the model outcome. The heatmap is visualized alongside the heatmap’s color spectrum scale. The color spectrum scale indicates the contribution of different regions to the outcome of the model.

Occlusion sensitivity, inspired by Matthew D. Zeiler & Rob Fergus (11)

Layer-wise Relevance Propagation (LRP) is a local model-specific method for determining how each of a neural network’s inputs affects its single output. It was originally designed to create heatmaps showing which input pixels contribute most to an image classifier’s selection (12). Training an image classifier produces “weights” for each of the neural network’s connections. Next, when using the model to classify a particular image, those connections get “activated” to a different degree. LRP uses these weights and activations to explain an individual outcome. To do this, it starts from the output and works backward, layer by layer, all the way through the network to individual pixels in the input image. This “relevance propagation” generates relevance scores for each pixel that can be shown as a heatmap. Once the relevance scores are propagated to the input layer, we can generate a heatmap that indicates through different coloring the features that contributed the most to the classification decision made by the model.

Layer-wise Relevance Propagation (13). Inspiration taken from The Story of Heads.

Gradient-weighted Class Activation Mapping (Grad-CAM) is a local model-specific method used to explain convolutional neural networks (CNNs), which are commonly used for image classification. Similar to LRP, it produces heatmaps highlighting the image regions most important for the prediction of the model (14). However, its working mechanism is different. The last convolutional layer of a CNN image classifier contains “feature maps” representing learned features such as edges and other shapes. (See an example visualization of feature maps here.) For a particular input image, Grad-CAM combines the feature maps to decipher each part’s relevance. (15, 16, 17).

Grad-CAM

Explain it Like I’m 5 (ELI5) is an explainability tool that seeks to offer intuitive explanations for the outcome of AI models in a human-friendly manner. It provides an explanation by considering relevant features of the input that contribute to the final outcome of the model. The ELI5 library relies on explainable AI methods such as Shapley values, permutation feature importance, and LIME for generating explanations. The illustration below depicts the general procedures for explanation generation for AI model decisions. Suppose we trained an AI model to determine whether an iris flower was one of these three varieties of it: Iris Setosa, Iris Versicolor, or Iris Virginica. We can select a particular outcome – a prediction from our dataset where the model predicts that an iris flower with a sepal length of 5.1 cm and a petal length of 1.4 cm is Iris setosa. We next pass the prediction and associated features (in this case, sepal length and width as well as petal length and width) to ELI5 as in the illustration to provide an explanation for the prediction. ELI5 returns the contribution of each feature (sepal length, sepal width, petal length, and petal width) that will enable us to see the most important feature that led to the outcome generated by the AI model. We show this with a positive value in the contribution column in the figure below.

ELI 5. Inspiration taken from ELI5 documentation.

Surrogates and visualization

XAI methods produce explanations to understand the relevancy of the data and model structure. Different methods produce different types of explanations. Overall, XAI methods can be divided into two main categories based on what exactly is explained during the process:

  1. Surrogate model-based explainers generate explanations from an approximated model of the black-box model that’s trained in a similar way to mimic the original model’s behavior. Surrogate models are mostly inherently interpretable.

  2. Visualization techniques (heatmaps, graphs, etc.) are used on the original black-box model to explore their internal workings without using a representation. These XAI methods fall under the visualization category.

Case study

Litter identification

XAI methods can characterize the reasoning logic of AI models by identifying and quantifying the features from the data that the AI model used to reach a decision. For instance, let’s assume that we want to build an AI model that can identify litter. This type of AI model could be used in autonomous vehicle applications to pick up litter in urban areas or can be integrated into cameras used in recycling plants to separate different materials automatically. To build this model, we need a dataset that depicts the most common litter that can be found in public areas. To do this, we can either collect data ourselves by taking pictures of litter in the streets – but this will take time. Luckily, there is already available online a variety of different datasets that contain images of litter objects, like TrashNet. Below is a figure that shows a few samples of litter that can be found in the TrashNet dataset.

Data samples taken from open TrashNet dataset https://github.com/garythung/trashnet

As we build our model, we face a few problems. First, while the litter objects are visible, there is background information that introduces noise to the model inference process. Second, the litter objects aren’t fully visible, meaning the AI model can’t learn full representation – even if there are more samples. Third, litter objects can also be crushed, crumpled, and mixed with environmental factors like mud. Thus, the dataset is unable to depict all the possible situations that can be faced. Assuming that the AI model is built, we're then interested in knowing what features from the data the AI model used for learning. By using XAI methods, we can identify these data features.

Features quantified using LIME

Here, we can see that the data features that the model used for learning aren’t just from the litter objects but also from some background information. This is important to determine the robustness of our AI model to make decisions. This potentially suggests that for critical applications like in the case of autonomous vehicles. This model may be risky to deploy as it can confuse litter objects easily and can cause an accident by confusing a road line for litter.

XAI also can shed light on the possible changes that can be induced or non-induced in data. For instance, consider the case in which a photo of a litter is taken but the file gets corrupted as it’s stored (as shown in the figure below):

From the figures above, we can observe that slight changes in the data used for training can cause the AI models to learn from different data features. As a result, if the AI model presents abnormal behavior that causes a problem, it’s then possible to trace the source that made the AI model produce the wrong prediction. This constitutes explainable AI because it allows us to understand what parts of the image the algorithm is using to make decisions. By changing the image, we can check if the algorithm still makes the right decisions and uses the correct parts of the image to identify trash. If the file corruption also changes the important aspects of the image, this is an indication that the model can probably be improved.

Next section
III. How reliable are XAI explanations?