II.

Methods for assessing and documenting model bias and fairness

Detection and mitigation

In the Trustworthy AI course, we learned that tackling bias can be split into two separate but interconnected activities: detection and mitigation. In this section, we’ll look at what this means in practice from a development perspective. Detection in this context can also be referred to as diagnosis, while mitigation can be referred to as a form of intervention. In order to mitigate, one must first be able to detect. Furthermore, even after taking a mitigating action, the results of the mitigation will have to be tested with further rounds of detection to ensure that a satisfactory result has been achieved.

Detection (diagnosis)

Diagnosing fairness in a system is an active process that can take place either during model development or in post-hoc testing once the model is already trained. The first step, however, is to always do an exploratory fairness analysis in the pre-processing stage. This can involve simple steps such as analyzing your dataset and asking yourself if it’s balanced. An unbalanced dataset, thanks to sampling bias or historical bias, is a big risk factor in creating a skewed algorithm. Thus, checking where the data is sourced from and what historical biases may exist within that context is a good thing to keep in mind, which can also further guide decisions in applying certain pre-processing mitigation techniques.

As for post-hoc testing, explainable AI (XAI) techniques that employ feature importance, such as LIME (Local Interpretable Model-agnostic Explanations) or SHAP (Shapley Additive Explanations), can be very useful tools to help analyze the factors leading to an algorithm’s decision. By utilizing feature importance, we can see the weight of different features in the final prediction value. XAI can also be used in the pre-processing stage for testing data balance. In this scenario, we’d employ a separate model using the protected classes in the dataset. In Chapter 3, we explore XAI analysis further.

There are also many tools available to help you test the fairness of your model. Google’s What-If tool is a very popular tool for detecting bias via counterfactual methods that the user themself can control. Via this tool, the user can manually make edits to the training data samples and see how these edits affect the algorithm’s predictions. This way, using the counterfactual fairness metric, the user can explore different scenarios which they might suspect to be vulnerable to bias. We’ll talk more about the What-If tool, along with a few other online tools, in the next section.

Mitigation (intervention)

If you’ve studied the Trustworthy AI course, you might remember that mitigation of biases can take place in one of three phases of training: pre-processing, in-processing, or post-processing. Pre-processing methods are ways in which an algorithm’s training data can be modified to overcome issues of bias. By contrast, in-processing methods involve modifying the algorithm, instead of the data, to allow for a classifier model that accurately represents and acts according to a developer’s desired fairness goals. Finally, post-processing methods focus on modifying the outputs of an already trained algorithm.

Pre-processing approaches

Relabeling method
Relabeling is a method of directly intervening in the training data to mitigate bias by manually changing an undesired (biased) outcome into a desired (fair) one. In essence, this approach changes the labels of discriminated objects and adjusts outcome data in the training dataset in order to allow the algorithm to learn from a more ideal dataset than what might be reflected in reality. For example, if the training data for a recruitment algorithm is based on a hiring record that’s known to have been biased in terms of gender, then ‘cleaning’ this data by adjusting outcomes for disadvantaged or discriminated parties will be a step towards ensuring the algorithm won’t carry on the cycle of gender bias. To put it simply: if fewer women were hired in the past, relabeling could involve changing some rejected applications of women to accepted applications. This method, however, comes with some drawbacks. First, it relies on the developer to be aware of the bias ahead of time. Second, it’s a method that’s vulnerable to the developer’s own biases being inserted during data relabeling.

Perturbation method
Perturbation means adding noise to data. This method can be used to help blind a model to certain features impacted by bias such as gender or ethnicity. A good example of this is facial detection models such as smile detection, where male faces are usually less likely to be detected as smiling. (7) With enough perturbation, the features that usually lead to the image being classified as male will go undetected, thus improving the chances of a smile being detected.

Terminology

Noise in the data

In terms of data, noise can be defined as meaningless information caused by distortion or corruption of the data’s original or ‘true’ state. Replacing random letters in a sentence with 🍦, for example, is a type of noise. In the example of facial recognition, noise can be added by altering pixels and adding imperfections to the image. This way, the image becomes grainier or has different colors than the input image.

Reweighing
Reweighing is the process of adding weights to the variables within a dataset. For fairness, this can be done as a way to ‘rebalance’ the dataset in a fairer way. Just like relabeling, this method looks at an already existing imbalance in the data and adjusts accordingly. In our recruitment example, reweighing would mean that the label ‘woman’ would be given greater weight, in other words an inherently higher scoring factor so that female prediction scores would be balanced against the male scores. With this method, the challenge is of course to find the ‘correct’ weight in order to find the desired balance.

Sampling method
Sampling or re-sampling is an approach by which bias is mitigated through better sampling of data. This can be done by something as simple as introducing a greater degree of diversity into the training data set. The main objective of such an approach is to correct the training data in order to eliminate existing bias. For example, if a facial recognition dataset is mostly made up of white male faces, re-sampling would involve diversifying the dataset by introducing more female and darker-skinned faces into the mix.

In-processing approaches

Regularization and constraint optimization
In machine learning, regularization techniques are used to avoid overfitting of the model by reducing the complexity of the learning algorithm. By regularization, one or more penalty terms are added to penalize the model for discriminatory behavior. Constraint optimization techniques use one or more fairness approaches in the loss function to be optimized during the learning process.

Terminology

Fairness constraint

Fairness constraint is a mechanism to remove unwanted behavior from a system by (as the name suggests) constraining the learning of the model from sensitive features in the data that might result in discriminative decisions. Fairness constraints are a powerful tool but also affect the performance of the model. You also don’t want to introduce too many constraints so that the system passes some wanted features.

The key challenge in this method is that the fairness constraint may not be convex in nature, meaning that the function we're trying to optimize doesn’t have a clear global minimum that we can use to measure our optimization against, and achieving an optimum between the model performance and the fairness constraint may be difficult. As an illustrative example, you can easily see where the lowest point in a round bowl is, but it would take a lot more exploration and measuring to find the lowest point in an uneven and jagged canyon, meaning you don’t know if you’ve found the local or global bottom. In addition, the fairness constraint and the regularization parameters may be affected differently during the learning process, making the model learning process even more difficult. (5) For an example of regularization in fairness, imagine our hiring example again. A model is trained on historical hiring data and deployed to assess incoming applications for a job. If the model is overfitted, it will perform very well on the training data but very poorly on new data that introduces variance. As such, a model overfitted on biased data will keep generating biased results and perform badly on underrepresented applicants – a regulariser may be applied to sensitive features such as gender to prevent this overfitting. The fairness constraints are meant to constrain the learning of the model so that it becomes better applicable to a more general and diverse applicant population.

Adversarial learning
This is a method to test the fairness of a model by introducing an adversary that effectively combats or challenges the learning process of a model. Ordinarily, adversarial learning is used for testing model robustness against security or integrity attacks, where the adversary takes the role of an attacker. For example, imagine that you have a camera in a supermarket that detects how many customers are in the store and sends this back to a central server. People’s faces are sensitive data that’s now at risk of leakage, so for privacy reasons you have an algorithm add noise to the data so that human bodies can be identified without individual identifying features before it’s sent forward. An adversary in this example would be a counter-model that tries to clean the data to see if identifying features can be restored. If the adversary fails in this task, then the model is performing well. For the context of fairness, a set of fairness constraints need to be introduced for the adversary to test. Adversaries come in the form of inputs that are meant to push and trick the model into making mistakes, thus testing how well the fairness metrics hold up for the model. If an adversary is able to predict that a descendant of a sensitive class or variable is being used in a decision, it can penalize the model. Feedback from the adversary can then be used to improve the model. For example, consider a neural network using census data such as education, occupation, and work type, as well as sensitive features like race and gender to predict income levels. We can create an adversarial network that tries to predict those sensitive features based on the first model’s income predictions. Now, we train the first model to predict income while not exposing the sensitive features to the adversarial model.

Post-processing approaches

Thresholding
Although there are several approaches to take in the post-processing phase, thresholding can be considered the most common one. This method determines threshold values utilizing the equalized odds metric, in order to determine an acceptable level of balance between true and false positive rates. What this basically does is set a separate threshold level for what counts as a positive prediction value for different classes in the data. In this way, if a system is biased against a certain race or gender by giving them a lower prediction score for a job, loan, or credit approval, we can simply set a lower threshold to counter this. This way, the positive prediction threshold for class ‘male’ may be at 0.8, but the threshold for class ‘woman’ could be at 0.7, making it effectively easier for women to get approved by the system, but for the purposes of countering existing system discrimination. The upside of this is that it can give the operator greater control over the fairness adjustment of the system. The downside, however, is that manual selection of the threshold value could be arbitrary and vulnerable to operator bias.

Next section

III. Tools for evaluating fairness in your AI model

II.