I.

Sources of bias in the evaluation methods for fairness

Bias, fairness (or unfairness), and discrimination are terms often mentioned in the context of trustworthy AI. If you haven’t studied the Trustworthy AI course, here’s a short summary of the rough definitions:

Bias, discrimination, and fairness

Bias refers to a skewed or unbalanced perspective.

Discrimination is an effect of one’s action or outcome of a decision-making process.

Fairness is a desired quality of a system requiring it to avoid discrimination by ensuring that the system outcomes treat people in an equivalent manner. In machine learning, fairness is evaluated by comparing the model performance with the protected and unprotected attributes.

Bias induced during AI model generation workflow

As we know, AI models are probability-based parameterized models based on the input data. Thus, their performance depends heavily on the quality and nature of the data used for training the models. This data may be generated by experiments under controlled conditions, or from a wide population uninfluenced by any experimental conditions (for example data from social media). However, human factors may still cause biases in the data. Before using the data for the training of AI models, the data is typically preprocessed, for example for noise reduction, information extraction, and optimization purposes – which may also induce data biases and cause the AI algorithms and lead to unfair or unbalanced decision-making. The workflow of AI model generation from the perspective of data is summarized in the figure below.

A process chart that shows how a sampled population from the earth is selected, made into a data set and the data set is fed to an AI model.

Model generation process and sources of bias

There can be different causes for inducing the bias in the data and this is summarized below:

Bias introduced during the data collection process:

Measurement bias arises from how we choose, utilize, or report particular features while collecting the data. An example of this could be two factory workers reporting error rates in the production of an item, where one worker reports it more stringently and frequently than the other.
Population bias (also known as representation bias) arises when the representation, demographics, user characteristics, and statistics of the population used for model generation differs from the target user group. For example, data (or an AI model) from UK traffic can’t be used for traffic regulations (or for self-driving cars) in Europe, because the UK has left-handed traffic, unlike Europe’s right-handed traffic.
Historical bias is caused by the existing social inequalities and biases in the real world which is then reflected in the data, causing bias in the training data. For example, the Google image search for CEO results in far more pictures of men, showing an inherent bias against women in the concept of ‘CEO’, as a study that was conducted on this in 2015 found. One of the suggested reasons for this is that only 5% of the Fortune 500 CEOs were women in 2015. In 2022, they were still only at 15%. Thus, this historical imbalance in the data caused bias in the AI algorithm.
Temporal bias arises from behavioral changes and differences in population over a period of time. For example, the sudden emergence of an event (like an earthquake or accident), may change the discussion topics on social media platforms.

Bias introduced while pre-processing the data:

Omitted variable bias arises when one or more important features were left out during the model generation process. As an example, consider a user profiling tool to predict the number of users in a given time, based on user experiences and the reasons provided by the users while leaving the service. This prediction model might drastically fail by the launch of a competitor giving similar services at a low cost, as this variable was never considered in the model. Omitted variable bias commonly occurs because we can never monitor all the indicators of an event, either due to unknown indicators, the high cost of monitoring, or some other similar reasons.
Aggregation bias arises when an individual's features are incorrectly generalized for the whole population. Consider the example of diabetes patients, where HbA1c levels are commonly used to diagnose and monitor diabetes. This component varies in complex ways across genders and ethnicities, and generalizing this for the whole population may cause poor performance of the model.
Sampling bias: Data sampling is used as a process of getting rid of redundant data, by taking (evenly spaced) samples across the data. Consider the example of a speaker recognition model, where a continuous audio signal is sampled over a given sampling rate for the best performance. While deploying the model over a device, due to resource constraints, the data was sampled at a lower sampling rate than the ideal rate. This caused the loss of information from the high frequency (high pitched) audio data and caused performance issues between men and women’s voices when recognizing speakers. This causes a bias against model performance on higher-pitched voices. (1.)
Linkage bias occurs when the model considers the user’s connections (links) in a neural network and there is a misrepresentation between the activities, connections, and interactives between this node and the actual behavior and intentions of the user. As an example, social media behavior is interpreted by an algorithm via the activity of the users, which links they click, which pages they visit, and which other users they’re connected to. One form of linking bias occurs when a user’s engagement with content they dislike (clicking on upsetting or outrageous links) is taken into account for future recommendations. Engagement is engagement, and high engagement in certain content is taken for genuine interest, even if from the user (human) perspective this might not be so. Another form of linking bias relates to communities where, if an algorithm only takes into account the links between nodes (users) and not behavior (activity), this leads to bias against low-connected users. A user with fewer friends would have a harder time being included in a certain community by the algorithm if their behavior (clicking on links and content relating to said community) isn’t sufficiently accounted for. (2,3.)

Definitions and metrics for evaluating fairness

In the previous section, the term ‘fairness’ was described as a desired quality of a system that’s achieved by equal treatment outcomes for people. This terminology can be translated in AI modeling terms as giving equal opportunities to people regardless of age, gender, and so forth. The transition of fairness from a social problem to a modeling problem has led to a number of different definitions of fairness in the AI community and has led to the formulation of respective metrics. Below you may find a list of such definitions and metrics that we’ll use in this course:

Nomenclature for the formulas in this section

Pr : Probability

ŷ : predicted output of the model

y : actual output

gi, gk : group identifier based on protected/sensitive class - e.g. gi = gender(man) and gk = gender(woman)

Example:
Pr(ŷ = 1|y = 1&gi) = Probability of predicted output ‘positive’ given the actual output ‘positive’ and ‘man’.

Let’s look at these metrics through an example case from hiring where we have qualified and unqualified applicants. In these examples, gender is the protected/sensitive variable, where gi denotes individuals of the group ‘man’ and gk denotes individuals of the group ‘woman’.

Equalized odds

The condition of equalized odds states that the probability of a prediction for both positive and negative classes should be equal for members of different groups. In the case of hiring, achieving equalized odds would mean that the likelihood of a qualified candidate being hired and the likelihood of an unqualified candidate not being hired are the same for men and women.

Pr(ŷ = 1|y = 1&gi) = Pr(ŷ = 1|y = 1&gk) & Pr(ŷ = 1|y = 0&gi) = Pr(ŷ = 1|y = 0&gk)

Equal opportunity

The equal opportunity condition states that the members of different groups should have equal probability for being classified in the positive class. Thus, the rates of true positives should be equal for both group members. In the case of our example, this would mean that female and male candidates are equally likely to get hired.

Pr(ŷ = 1|y = 1&gi) = Pr(ŷ = 1|y = 1&gk)

Overall accuracy equality

According to this condition, the accuracy of a prediction for different groups should be equal. In machine learning, accuracy is calculated as the percentage of overall correct predictions. Thus, for satisfying this definition of fairness, the accuracy of individual groups should be equal. In our example, achieving overall accuracy equality would mean that the accuracy at which qualified candidates are hired is equal for men and women.

Pr(ŷ = 0|y = 0&gi) + Pr(ŷ = 1|y = 1&gi) = Pr(ŷ = 0|y = 0&gk) + Pr(ŷ = 1|y = 1&gk)

Conditional use accuracy equality

This is a variant of the overall accuracy equality condition, where the accuracy of positive and negative classes are assessed separately. According to this condition, the accuracy of both positive and negative classes should be equal. In the example, the accuracy should be the same for men and women for both negative and positive outcomes so that an applicant with a good predictive score should also actually be qualified, and an applicant with a bad predictive score should actually be unqualified.

Pr(y = 1|ŷ = 1&gi) = Pr(y = 1|ŷ = 1&gk) & Pr(y = 0|ŷ = 0&gi) = Pr(y = 0|ŷ = 0&gk)

Treatment equality

According to this condition, the ratio of false negatives to that of false positives should be the same for different group categories. As a reminder, false negatives are qualified applicants who are deemed unqualified, and false positives are unqualified applicants who are deemed qualified. In the example, this would mean that the ratio of qualified applicants who don’t get hired over the amount of unqualified candidates that get hired is equal for men and women.

Pr(ŷ = 1|y = 0&gi)/Pr(ŷ = 0|y = 1&gi) = Pr(ŷ = 1|y = 0&gk )/Pr(ŷ = 0|y = 1&gk)

Equalizing disincentives

The Equalizing disincentives metric takes the difference of the true positive rate to false positive rate and states that this should be equal for the members of different groups. In our example, this would mean that the difference between qualified applicants that get hired and unqualified applicants that get hired should be equally large for men and women.

Pr(ŷ = 1|y = 1&gi) − Pr(ŷ = 1|y = 0&gi) = Pr(ŷ = 1|y = 1&gk ) − Pr(ŷ = 1|y = 0&gk)

Fairness through unawareness

This means an algorithm is said to be fair as long as it doesn’t use protected attributes in the decision-making process. In our example, this would mean removing gender as an attribute from the model when making decisions.

Fairness through awareness

An algorithm’s fairness is measured in the statistical distance in outcome between similar individuals (in our example, job applicants). This relies on a distance metric that’s compared for both the distance between two individuals and the distance between the individuals’ outcomes. Here, the algorithm is fair if the distance between individuals is the same as the distance between outcomes. For our example, the algorithm is fair if the difference in qualification reflects the difference in outcomes so that more qualified individuals have a higher likelihood of being hired.

Statistical parity

This is one of the oldest and simplest definitions of fairness. Under this fairness definition, the algorithm is fair if the probability of an outcome is equal between individuals of two different groups under a protected class. In our example, statistical parity would mean that it’s as probable for a woman to be hired as it is for a man to be hired.

Pr(ŷ = 1|gi) = Pr(ŷ = 1|gk)

Counterfactual fairness

Counterfactual terms are based on ‘what-if’ scenarios. As such, counterfactual fairness is met if the predicted outcome is the same should only the protected/sensitive variable be changed. To be more specific, the predicted outcome is fair as long as it doesn’t rely on any descendant of the protected class variable. In our example, counterfactual fairness would mean that changing the applicant's gender doesn’t impact the outcome of the model.

(4, 5, 6.)

Next section

II. Methods for assessing and documenting model bias and fairness

I.