What is this?
Responsible use
Analysis methodology
Data pipeline
For experts

We analysed how the accuracy is distributed in a model's outputs given a tested dataset by comparing several protected groups pairwise. The assessment depends on specific parameters provided as inputs.

Fairness does not end after producing AI outputs.

💡 Continue interacting with stakeholders to assert that their idea of fairness is correctly implemented.
💡 Monitor the outputs of deployed systems by rerunning the analysis on updated models and datasets.
💡 Test model and dataset variations for multiple sensitive characteristics and parameters.

Keep a balance between justifying outputs as part of a fair process and accommodating constructive criticism. Do not over-rely on technical justification, and ensure meaningful human oversight whenever AI systems are deployed in decision-making, high-stakes, or rights-impacting contexts. Human oversight prevents overreliance on imperfect models, catches context-specific errors, and enables ethical judgment, accountability, and recourse for affected people.

The max relative difference of the accuracy is obtained across all protected groups, by comparing them pairwise. The result is considered biased if it lays 0.200 away from its ideal target that would indicate fairness. For example, the ideal target is 0 for differences between measure values, and 1 for values that should be large (e.g., the minimum accuracy across all groups). Some metrics have no known ideal values.

The analysis considered 12 protected groups:
marital divorced - education primary
marital divorced - education secondary
marital divorced - education tertiary
marital divorced - education unknown
marital single - education primary
marital single - education secondary
marital single - education tertiary
marital single - education unknown
marital married - education primary
marital married - education secondary
marital married - education tertiary
marital married - education unknown

csv

tabular data with common formatting

path: /home/maniospas/Documents/mammoth-commons/data/bank.csv
max discrete: 10
Uses pandas to load a CSV file that contains numeric, categorical, and predictive data columns. This automatically detects the characteristics of the dataset being loaded, namely the delimiter that separates the columns, and whether each column contains numeric or categorical data. The last categorical column is used as the dataset label. To load the file maintaining more control over options (e.g., a subset of columns, a different label column) use the custom csv loader instead.
How to replicate this data loader during AI creation? If you want to train a model while using the same loading mechanism as this dataset, run the following Python script. This uses supporting methods from the lightweight mammoth-commons core to retrieve numpy arrays *X,y* holding dataset features and categorical labels respectively.
% pip install --upgrade pandas
% pip install --upgrade mammoth_commons
import pandas as pd
from mammoth_commons.externals import pd_read_csv
from mammoth_commons.datasets import CSV

# set parameters and load data (modify max_discrete as needed)
path = ...
max_discrete = 10
df = pd_read_csv(path, on_bad_lines="skip")

# identify numeric and categorical columns
num = [col for col in df if pd.api.types.is_any_real_numeric_dtype(df[col])]
num = [col for col in num if len(set(df[col])) > max_discrete]
num_set = set(num)
cat = [col for col in df if col not in num_set]

# convert to numpy data
csv_dataset = CSV(df, num=num, cat=cat[:-1], labels=cat[-1])
X = X.astype(np.float32)
y = df[cat[-1]]

FairBench

pairwise maxrel acc

This is the accuracy of analysis that compares groups pairwise.

Caveats and recommendations

• This is a generic list of caveats that apply to all measures.
• Non-quantitative criteria may also impact perceived fairness.
• Choose carefully the criteria on when measures are considered close to their ideal values.
• A single measure cannot decide whether a system is fair or biased without further investigation. It can at best indicate the absense of a particular bias. However, different measures are often at odds with each other, even when they have similar optima.
• Consult with stakeholders to determine on which social and legal criteria systems should follow. This translates to choosing measures appropriate for the operating context.

Distribution


Computations cover several cases.

class yes

This is branch class yes.
0.126 maxrel acc
Obtained from 12 values

class no

This is branch class no.
0.126 maxrel acc
Obtained from 12 values