We used IBM’s AIF360 library to checks for common types of bias and found the following:

false positive rate ratio
false discovery rate ratio

Fairness does not end after producing AI outputs.

💡 Continue interacting with stakeholders to assert that their idea of fairness is correctly implemented.
💡 Monitor the outputs of deployed systems by rerunning the analysis on updated models and datasets.
💡 Test model and dataset variations for multiple sensitive characteristics and parameters.

Keep a balance between justifying outputs as part of a fair process and accommodating constructive criticism. Do not over-rely on technical justification, and ensure meaningful human oversight whenever AI systems are deployed in decision-making, high-stakes, or rights-impacting contexts. Human oversight prevents overreliance on imperfect models, catches context-specific errors, and enables ethical judgment, accountability, and recourse for affected people.

Each fairness metric provided by AIF360 is computed across 7 groups, each of which is compared to the rest of the population. No group intersections are accounted for. We check whether notions of bias exceed 0.05 in a scale 0-1 where 0 represents biased systems, or whether notions of fairness are lesser than 0.95 in a scale 0-1 where 1 represents fair systems.

Considered groups are:
education primary
education secondary
education tertiary
education unknown
marital divorced
marital married
marital single

tabular data with common formatting

path: /home/maniospas/Documents/mammoth-commons/data/bank.csv
max discrete: 10
Uses pandas to load a CSV file that contains numeric, categorical, and predictive data columns. This automatically detects the characteristics of the dataset being loaded, namely the delimiter that separates the columns, and whether each column contains numeric or categorical data. The last categorical column is used as the dataset label. To load the file maintaining more control over options (e.g., a subset of columns, a different label column) use the custom csv loader instead.

How to replicate this data loader during AI creation?

If you want to train a model while using the same loading mechanism as this dataset, run the following Python script. This uses supporting methods from the lightweight mammoth-commons core to retrieve numpy arrays *X,y* holding dataset features and categorical labels respectively.

% pip install --upgrade pandas
% pip install --upgrade mammoth_commons
import pandas as pd
from mammoth_commons.externals import pd_read_csv
from mammoth_commons.datasets import CSV

# set parameters and load data (modify max_discrete as needed)
path = ...
max_discrete = 10
df = pd_read_csv(path, on_bad_lines="skip")

# identify numeric and categorical columns
num = [col for col in df if pd.api.types.is_any_real_numeric_dtype(df[col])]
num = [col for col in num if len(set(df[col])) > max_discrete]
num_set = set(num)
cat = [col for col in df if col not in num_set]

# convert to numpy data
csv_dataset = CSV(df, num=num, cat=cat[:-1], labels=cat[-1])
X = X.astype(np.float32)
y = df[cat[-1]]

consider a trivial model that uncovers biases when features are ignored

This is a deliberately biased predictor that ignores dataset features and decides on a fixed prediction based on the majority.

How does this work?

Creates a trivial predictor that returns the most common predictive label value among provided data. If the label is numeric, the median is computed instead. This model servers as an informed baseline of what happens even for an uninformed predictor. Several kinds of class biases may exist, for example due to different class imbalances for each sensitive attribute dimension (e.g., for old white men compared to young hispanic women).

Overall performance

Metric	Value
Accuracy	`0.885`
Average Abs Odds Difference	`0.000`
Average Odds Difference	`0.000`
Base Rate	`0.885`
Between All Groups CV	`0.026`
Between All Groups GEI	`0.000`
Between All Groups Theil Index	`0.000`
Bias Amplification	`2.819`
Coefficient of Variation	`0.286`
Disparate Impact	`1.000`
Equal Opportunity Difference	`0.000`
Equalized Odds Difference	`0.000`
Error Rate	`0.115`
False Discovery Rate	`0.115`
False Negative Rate	`0.000`
False Negative Rate Difference	`0.000`
False Omission Rate	`0.000`
False Omission Rate Difference	`0.000`
False Positive Rate	`1.000`
False Positive Rate Difference	`0.000`
False Positive Rate Ratio	`1.000`
Gen. Entropy Index	`0.041`
Gen. Equalized Odds Difference	`0.000`
Gen. False Negative Rate	`0.000`
Gen. False Positive Rate	`0.000`
Gen. True Negative Rate	`1.000`
Gen. True Positive Rate	`1.000`
Negative Predictive Value	`0.000`
Num False Negatives	`0.000`
Num False Positives	`521.000`
Num Gen. False Negatives	`0.000`
Num Gen. False Positives	`0.000`
Num Gen. True Negatives	`521.000`
Num Gen. True Positives	`4000.000`
Num Instances	`4521.000`
Num Negatives	`521.000`
Num Positives	`4000.000`
Num Pred. Negatives	`0.000`
Num Pred. Positives	`4521.000`
Num True Negatives	`0.000`
Num True Positives	`4000.000`
Positive Predictive Value	`0.885`
Selection Rate	`1.000`
Smoothed EDF	`1.226`
Statistical Parity Difference	`0.000`
Theil Index	`0.034`
True Negative Rate	`0.000`
True Positive Rate	`1.000`
True Positive Rate Difference	`0.000`

2 types of bias

tabular data with common formatting

consider a trivial model that uncovers biases when features are ignored

Overall performance