Part 4: Bias Mitigation Strategies | Module 6

Introduction to Bias Mitigation

Bias mitigation encompasses techniques and strategies for reducing unfair bias in AI systems. These approaches can be applied at different stages of the machine learning pipeline, each with distinct advantages and trade-offs.

Effective bias mitigation requires understanding when and how bias enters the system, selecting appropriate techniques based on the specific context, and continuously monitoring results. No single approach works universally; practitioners must often combine multiple techniques.

Key Framework

Bias mitigation techniques are categorized by when they are applied in the ML pipeline: pre-processing (data level), in-processing (algorithm level), and post-processing (output level). Each category offers different trade-offs between fairness improvement and model performance.

The Bias Mitigation Pipeline

Mitigation strategies can be applied at three points in the machine learning lifecycle, each targeting different sources of bias.

📊

Pre-Processing

Modify training data

⚙

In-Processing

Modify learning algorithm

📤

Post-Processing

Modify model outputs

Pre-Processing Techniques

Pre-processing methods modify the training data before it is used to train the model. These techniques aim to remove or reduce bias in the data itself, allowing standard algorithms to train on more equitable inputs.

                                Resampling / Reweighting
                            
Pre-Processing
Adjust the distribution of training data by oversampling underrepresented groups, undersampling overrepresented groups, or assigning different weights to samples based on group membership.

                            Implementation Steps
                            Calculate representation rates for each protected group
Determine target distribution (e.g., equal representation)
Apply oversampling (SMOTE, random duplication) or undersampling
Alternatively, assign instance weights inversely proportional to group size

                        

                                Relabeling
                            
Pre-Processing
Modify labels in the training data to achieve fairer distributions. This addresses historical bias encoded in labels by correcting labels that may reflect past discrimination.

                            Implementation Steps
                            Identify instances near the decision boundary
Analyze label distribution across protected groups
Flip labels strategically to balance positive/negative rates
Ensure relabeling does not compromise data integrity

                        

                                Fair Representation Learning
                            
Pre-Processing
Transform data into a new representation that preserves task-relevant information while removing information about protected attributes. The transformed data can then be used with any downstream classifier.

                            Implementation Steps
                            Train an encoder to map data to latent representation
Add adversarial constraint to prevent predicting protected attribute
Optimize for both task utility and fairness
Use transformed representation for downstream tasks

                        

                                Disparate Impact Remover
                            
Pre-Processing
Modify feature values to reduce correlation with protected attributes while preserving rank-ordering within groups. Edits features to be more similar across groups.

                            Implementation Steps
                            For each feature, compute distributions per group
Define a repair level (0 = no change, 1 = full repair)
Transform feature values toward median distribution
Preserve relative ranking within each group

                        

Practical Example: Reweighting

Consider a hiring dataset where 80% of past hires were male. Simple reweighting would assign weight 0.625 to male examples (1/0.80 * 0.5) and weight 2.5 to female examples (1/0.20 * 0.5), giving equal effective influence to both groups during training.

In-Processing Techniques

In-processing methods modify the learning algorithm itself to incorporate fairness constraints during model training. These approaches directly optimize for both accuracy and fairness simultaneously.

Adversarial Debiasing

In-Processing

Train the predictor model jointly with an adversary that tries to predict the protected attribute from predictions. The predictor learns to make accurate predictions while preventing the adversary from inferring group membership.

Implementation Steps

Build main classifier for prediction task
Build adversary network to predict protected attribute
Train classifier to maximize task accuracy
Simultaneously train to minimize adversary accuracy
Use gradient reversal or min-max optimization

Constrained Optimization

In-Processing

Add fairness constraints directly to the optimization objective. The model is trained to minimize loss subject to constraints on fairness metrics like demographic parity or equalized odds.

Implementation Steps

Define fairness constraint mathematically
Add constraint to optimization problem
Use Lagrangian relaxation or barrier methods
Tune constraint strength vs. accuracy trade-off

Prejudice Remover Regularizer

In-Processing

Add a regularization term to the loss function that penalizes the model for producing predictions that are correlated with protected attributes. Encourages independence between predictions and group membership.

Implementation Steps

Compute mutual information between predictions and protected attribute
Add penalty term proportional to mutual information
Tune regularization strength parameter
Train model with augmented loss function

Meta-Fair Classifier

In-Processing

Learn a family of classifiers that span the fairness-accuracy trade-off space, allowing selection of the optimal classifier based on specific fairness requirements without retraining.

Implementation Steps

Train classifier with parameterized fairness constraint
Generate Pareto frontier of fairness-accuracy trade-offs
Select operating point based on requirements
Deploy chosen classifier configuration

# Conceptual example: Adversarial Debiasing

class AdversarialDebiasing:
    def __init__(self, predictor, adversary, lambda_adv):
        self.predictor = predictor    # Main task classifier
        self.adversary = adversary    # Predicts protected attribute
        self.lambda_adv = lambda_adv  # Adversary weight

    def compute_loss(self, X, y, protected):
        pred = self.predictor(X)
        adv_pred = self.adversary(pred)

        # Task loss: minimize prediction error
        task_loss = cross_entropy(pred, y)

        # Adversary loss: minimize protected attribute prediction
        adv_loss = cross_entropy(adv_pred, protected)

        # Combined: good prediction + poor adversary performance
        return task_loss - self.lambda_adv * adv_loss

Post-Processing Techniques

Post-processing methods modify model predictions after the model has been trained. These approaches are model-agnostic and can be applied to any classifier, but they require access to protected attribute information at prediction time.

Threshold Adjustment

Post-Processing

Apply different decision thresholds to different groups to equalize outcome rates or error rates. Simple but effective when group membership is known at prediction time.

Implementation Steps

Train classifier normally to produce probability scores
Calculate ROC curves separately for each group
Select thresholds that achieve desired fairness metric
Apply group-specific thresholds at prediction time

Equalized Odds Post-Processing

Post-Processing

Modify predictions to achieve equalized odds by solving a linear program that finds the optimal randomized mapping from predictions to outcomes while satisfying fairness constraints.

Implementation Steps

Compute TPR and FPR for each group
Formulate linear program with equalized odds constraints
Find optimal randomized decision rule
Apply probabilistic outcome assignment

Calibrated Equalized Odds

Post-Processing

Combine calibration with equalized odds by finding group-specific probability mappings that achieve both calibrated scores and equal error rates across groups.

Implementation Steps

Calibrate scores separately for each group
Apply equalized odds optimization on calibrated scores
Derive final score transformation
Verify both calibration and equalized odds are satisfied

Reject Option Classification

Post-Processing

For predictions near the decision boundary (uncertain predictions), favor outcomes that benefit disadvantaged groups. Focuses fairness interventions where they have least impact on accuracy.

Implementation Steps

Define uncertainty region around decision boundary
For uncertain predictions, identify protected group
Favor positive outcomes for disadvantaged groups
Accept confident predictions without modification

Legal Consideration

Post-processing techniques that explicitly use protected attributes to make different decisions for different groups may raise legal concerns under anti-discrimination laws in some jurisdictions. This approach, sometimes called "disparate treatment," requires careful legal analysis even when the intent is to achieve fairness.

Comparing Mitigation Approaches

Each category of mitigation techniques has distinct characteristics that make it more or less suitable for different situations.

Aspect	Pre-Processing	In-Processing	Post-Processing
Model Agnostic	Yes - works with any algorithm	No - requires algorithm changes	Yes - works with any model
Protected Attribute Needed	During training only	During training	At prediction time
Accuracy Impact	Moderate - data modified	Controllable via trade-off	Can be minimal for uncertain cases
Implementation Complexity	Low to moderate	High - custom training	Low - applied to outputs
Best For	Historical data bias	Strong fairness requirements	Legacy models, quick fixes

Selecting Mitigation Strategies

Choosing the right mitigation approach depends on several factors including the source of bias, model constraints, and deployment requirements.

Bias in Historical Data

When training data reflects past discrimination or underrepresentation of certain groups.

                            Resampling
                            Relabeling
                            Fair Representation
                        

Strict Fairness Requirements

When regulatory or ethical requirements demand specific fairness metrics be achieved.

Constrained Optimization Adversarial Debiasing

Legacy Model in Production

When you cannot retrain the model but need to improve fairness of deployed system.

Threshold Adjustment Reject Option

Unknown Protected Attribute

When protected attribute is not available at prediction time but known during training.

Fair Representation Adversarial Debiasing

Best Practices for Bias Mitigation

Implementation Guidelines

Start with data quality: Address data collection and labeling issues before applying algorithmic fixes
Combine approaches: Use multiple techniques together for more robust results
Monitor continuously: Fairness can degrade over time as data distributions shift
Document trade-offs: Record decisions about fairness-accuracy trade-offs and their rationale
Involve stakeholders: Include affected communities in decisions about fairness definitions and acceptable trade-offs
Test on held-out data: Validate fairness improvements on data not used for tuning

Common Pitfalls to Avoid

Optimizing single metric: Improving one fairness metric may worsen others
Ignoring intersectionality: Fairness for individual groups does not guarantee fairness for intersectional subgroups
Overfitting to training distribution: Fairness tuned on training data may not generalize
Assuming static data: Distribution shifts can invalidate carefully tuned fairness interventions
Treating mitigation as one-time: Bias mitigation requires ongoing monitoring and adjustment

Key Takeaways

Bias mitigation techniques can be applied at pre-processing (data), in-processing (algorithm), and post-processing (output) stages
Pre-processing methods are model-agnostic but modify the training data distribution
In-processing methods provide fine control over fairness-accuracy trade-offs but require custom training
Post-processing methods are quick to implement but may require protected attributes at prediction time
No single technique works universally; selection depends on bias source, constraints, and requirements
Effective mitigation often combines multiple approaches with continuous monitoring