Using Counterfactual Instances for XAI

8 min readAug 22, 2021

Counterfactual explanation is a powerful but straightforward method to improve explanability of machine learning models

Intro

The biggest shortcoming of many machine learning models and neural networks is their “blackbox” nature. Which feature was most influential in this predicted output that we got for an instance? XAI which stands for Explainable Artificial Intelligence is the area of study that tries to tackle this blackbox issue of models.

There are mainly two streams of approaches to XAI. One is to directly explain the internal principles of how models work. Another way to explain a model is more of a post-hoc approach where you try to explain how the output has been generated or predicted. An intuitive way to do this is to provide an example of another instance which shares similar characteristics as the instance of interest. We call this “Case or Example Based Explanations”.

This book called “Interpretable Machine Learning” written by Christoph Molnar introduces five different methodologies that fall under the category of Example Based Explanations. Here, we touch upon of them — Counterfactual Explanations.

Counterfactual explanations aim to explain the model based on the following simple and clear statement:

“If X had not occurred, Y would not have occurred”.

Here, we can imagine X being some feature in the data and Y being the output or some predicted value for some instance. Unlike prototypes, which is another method within the Example Based Explanations category, counterfactuals do not have to be actual instances from the training data, but can be a new combination of feature values. [1]

Why Useful?

How are counterfactual explanations useful? Counterfactual explanations are useful because they enable us to understand what and to what extent changes should be made to certain features to be able to reach the outcome we desire. Christoph offers two scenarios which illustrate this point.

Case 1) Say there is a person named Peter who applied for a loan but got rejected? He wants to understand why is was rejected and how he can flip the outcome (i.e. get his loan application approved next time by improving his profile in some way). What is the smallest change to the features (income, number of credit cards, age, …) that would change the prediction from rejected to approved? [1]
Case 2) Anna rents out an apartment. She lets the model she made decide her rent. She expected rent to be 1000 euro or more but model tells her it’s 900 euros. By tweaking only the feature values under her control (built-in kitchen yes/no, pets allowed yes/no, type of floor, etc.) with her understanding in counterfactual explanations, she discovers that if she allows pets and installs windows with better insulation, she can charge 1000 Euro. Anna had intuitively worked with counterfactual instances to change the outcome. [1]

Requirements

What are some requirements of counterfactual instances? Christoph explains there are four requirements to counterfactual instances.

• Counterfactual instance produces the predefined prediction as closely as possible.

• Counterfactual instances should be as similar as possible to the instance regarding feature values.

• Counterfactual instances should also change as few features as possible

• Counterfactual instance should have feature values that are likely and realistic. (e.g. If a certain counterfactual instance tells Anna in case 2 to change her age, it’s outside of her control.)

Methodologies for Generating Counterfactual Instances

How do we generate counterfactual instances to aid our decision making moving forward? One method that everyone can think of is probably the brute-force method — manually tinkering with various features and see how the predicted output changes. This may be the most straightforward way but can be very time consuming and inefficient. Chapter 6 of the Interpretable Machine Learning book aforementioned introduces loss function based methods which identify counterfactual instances that minimize that function (similar to other algorithms like gradient descent that utilize loss functions and optimization).

Wachter’s Method

In Wachter’s method, the loss function L is defined as:

L=L_pred+ λ L_dist

More specifically:

Loss Function of Wachter’s Counterfactual Explanation Method

first part of the loss function

is the quadratic distance between model prediction for the counterfactual instance x’ and the desired predefined outcome y’.

second part of the loss function

Sum of all p feature-wise Manhattan distance weighted with the inverse median absolute deviation (MAD) of each feature where MAD is defined as the following.

median absolute deviation (MAD)

The proposed distance function has the advantage over the Euclidean distance that it is more robust to outliers. Scaling with the MAD is necessary to bring all the features to the same scale — it should not matter whether you measure the size of an apartment in square meters or square feet. [1]

Lastly, λ balances the distance in prediction (first term) against the distance in feature values (second term). A higher value of λ means that we prefer counterfactuals with predictions close to the desired outcome y’, a lower value means that we prefer counterfactual instance x’ that are very similar to x in the feature values. If λ is very large, the instance with the prediction closest to y’ will be selected, regardless how far it is away from x.

The algorithm that generates the counterfactual instances is as follows:

1. Select an instance x to be explained, the desired outcome y’, a tolerance ϵ and a (low) initial value for λ

2. Sample a random instance as initial counterfactual.

3. Optimize the loss with the initially sampled counterfactual as starting point.

4. While (f^(x’) — y’)² is greater than the threshold ϵ:

Increase λ
Optimize the loss with the current counterfactual as starting point.
Return the counterfactual that minimizes the loss.

5. Repeat steps 2–4 and return the list of counterfactual instances or the one that minimizes the loss.

This algorithm from Wachter has been implemented in a python packaged called Alibi. [2]

implementation of Wachter : https://docs.seldon.io/projects/alibi/en/stable/methods/CF.html

Example of implementation on MNIST Data: https://docs.seldon.io/projects/alibi/en/stable/examples/cf_mnist.html

This method has a clear disadvantage which is that it is not taking into account the third and fourth requirements of counterfactual instances (refer to requirements section above).

Dandl’s Method of Generating Counterfactual Instances

Dandl’s paper overcomes this limitation/disadvantage that Wachter’s method has by including two additional components to the loss function which correspond to the third and fourth requirements of counterfactual instances respectively. The original paper from Dandl can be found here.

Dandl’s method defines its loss function as the following [3]:

Loss Function for Dandl’s Method

Objective functions 1 and 2 are the same as Wachter’s method except that Dandl uses different distance metrics for those objectives. While Wachter uses Euclidean distance and Manhattan distance for O1 and O2, Dandl uses Manhattan distance and Gower distance for those two objectives. O1 and O2 in Dandl’s paper are defined as the following:

Objective One: prediction of our counterfactual x’ should be as close as possible to our desired prediction y’

Objective Two: counterfactual should be as similar as possible to our instance x

The two additional objectives in Dandl’s method that are not in Wachter’s method are:

Objective Three: sparse feature changes

Objective Four: counterfactual instances should have likely feature values/combinations

In O4, Dandl infers how “likely” a data point is using the training data or another data set (denoted as X^obs). We see in O4 function’s equation that Dandl is using the average Gower distance between x’ and the nearest observed data point to measure the “likeliness” of feature values of the counterfactual instance.

But note that compared to Wachter’s method, there is no balancing term λ. Since we do not want to collapse the four objectives into a single objective by summing them up and weighting them, we instead optimize all four terms simultaneously. In order to do this, Dandl uses the Nondominated Sorting Genetic Algorithm or short NSGA-II. NSGA-II is a nature-inspired algorithm that applies Darwin’s law of the “survival of the fittest”. The fitness of a counterfactual is expressed by its vector of objectives values (o1, o2, o3, o4). The lower a counterfactual instance’s four objectives, the “fitter” it is. [1]

Advantages and Disadvantages

The biggest advantage of counterfactual explanations is its clear nature. Unlike other XAI methods like LIME, no additional assumptions are required to understand how the method works. On a more practical perspective, the counterfactual method does not require access to the data or the model. The Interpretable Machine Learning book explains that the counterfactual method only requires access to the model’s prediction function, which would also work via a web API, for example. [1] This is attractive for companies which are audited by third parties or which are offering explanations for users without disclosing the model or data. A company has an interest in protecting model and data because of trade secrets or data protection reasons. Furthermore, this method works also with systems that do not use machine learning.

But counterfactual explanations does have a clear disadvantage. Counterfactual instances are often not unique. There can be multiple of them. Moreover, they can often contradict each other (e.g. One counterfactual instance tells the user to do increase values of feature 1 and 2 but another counterfactual instance tells the user to do maintain value of feature 1 while decrease value of feature 2.). In this case, the user needs to decide which counterfactual instance or explanation will be accepted depending on how actionable each counterfactual explanation is, how much resources are available, what other real-world restraints are there etc.

References

[1] C. Molnar, Interpretable Machine Learning (2021)

[2] Counterfactual Instances, Alibi Package Documentation (2019)

[3] S. Dandl, C. Molnar, M. Binder, and B. Bischl, Multi-Objective Counterfactual Explanations (2020), Department of Statistics in LMU Munich