Part 4 of 5

Bias Mitigation Strategies

Introduction to Bias Mitigation

Bias mitigation encompasses techniques and strategies for reducing unfair bias in AI systems. These approaches can be applied at different stages of the machine learning pipeline, each with distinct advantages and trade-offs.

Effective bias mitigation requires understanding when and how bias enters the system, selecting appropriate techniques based on the specific context, and continuously monitoring results. No single approach works universally; practitioners must often combine multiple techniques.

Key Framework

Bias mitigation techniques are categorized by when they are applied in the ML pipeline: pre-processing (data level), in-processing (algorithm level), and post-processing (output level). Each category offers different trade-offs between fairness improvement and model performance.

The Bias Mitigation Pipeline

Mitigation strategies can be applied at three points in the machine learning lifecycle, each targeting different sources of bias.

📊

Pre-Processing

Modify training data

In-Processing

Modify learning algorithm

📤

Post-Processing

Modify model outputs

Pre-Processing Techniques

Pre-processing methods modify the training data before it is used to train the model. These techniques aim to remove or reduce bias in the data itself, allowing standard algorithms to train on more equitable inputs.

Resampling / Reweighting

Pre-Processing

Adjust the distribution of training data by oversampling underrepresented groups, undersampling overrepresented groups, or assigning different weights to samples based on group membership.

Implementation Steps
  1. Calculate representation rates for each protected group
  2. Determine target distribution (e.g., equal representation)
  3. Apply oversampling (SMOTE, random duplication) or undersampling
  4. Alternatively, assign instance weights inversely proportional to group size

Relabeling

Pre-Processing

Modify labels in the training data to achieve fairer distributions. This addresses historical bias encoded in labels by correcting labels that may reflect past discrimination.

Implementation Steps
  1. Identify instances near the decision boundary
  2. Analyze label distribution across protected groups
  3. Flip labels strategically to balance positive/negative rates
  4. Ensure relabeling does not compromise data integrity

Fair Representation Learning

Pre-Processing

Transform data into a new representation that preserves task-relevant information while removing information about protected attributes. The transformed data can then be used with any downstream classifier.

Implementation Steps
  1. Train an encoder to map data to latent representation
  2. Add adversarial constraint to prevent predicting protected attribute
  3. Optimize for both task utility and fairness
  4. Use transformed representation for downstream tasks

Disparate Impact Remover

Pre-Processing

Modify feature values to reduce correlation with protected attributes while preserving rank-ordering within groups. Edits features to be more similar across groups.

Implementation Steps
  1. For each feature, compute distributions per group
  2. Define a repair level (0 = no change, 1 = full repair)
  3. Transform feature values toward median distribution
  4. Preserve relative ranking within each group
Practical Example: Reweighting

Consider a hiring dataset where 80% of past hires were male. Simple reweighting would assign weight 0.625 to male examples (1/0.80 * 0.5) and weight 2.5 to female examples (1/0.20 * 0.5), giving equal effective influence to both groups during training.

In-Processing Techniques

In-processing methods modify the learning algorithm itself to incorporate fairness constraints during model training. These approaches directly optimize for both accuracy and fairness simultaneously.

Adversarial Debiasing

In-Processing

Train the predictor model jointly with an adversary that tries to predict the protected attribute from predictions. The predictor learns to make accurate predictions while preventing the adversary from inferring group membership.

Implementation Steps
  1. Build main classifier for prediction task
  2. Build adversary network to predict protected attribute
  3. Train classifier to maximize task accuracy
  4. Simultaneously train to minimize adversary accuracy
  5. Use gradient reversal or min-max optimization

Constrained Optimization

In-Processing

Add fairness constraints directly to the optimization objective. The model is trained to minimize loss subject to constraints on fairness metrics like demographic parity or equalized odds.

Implementation Steps
  1. Define fairness constraint mathematically
  2. Add constraint to optimization problem
  3. Use Lagrangian relaxation or barrier methods
  4. Tune constraint strength vs. accuracy trade-off

Prejudice Remover Regularizer

In-Processing

Add a regularization term to the loss function that penalizes the model for producing predictions that are correlated with protected attributes. Encourages independence between predictions and group membership.

Implementation Steps
  1. Compute mutual information between predictions and protected attribute
  2. Add penalty term proportional to mutual information
  3. Tune regularization strength parameter
  4. Train model with augmented loss function

Meta-Fair Classifier

In-Processing

Learn a family of classifiers that span the fairness-accuracy trade-off space, allowing selection of the optimal classifier based on specific fairness requirements without retraining.

Implementation Steps
  1. Train classifier with parameterized fairness constraint
  2. Generate Pareto frontier of fairness-accuracy trade-offs
  3. Select operating point based on requirements
  4. Deploy chosen classifier configuration
# Conceptual example: Adversarial Debiasing

class AdversarialDebiasing:
    def __init__(self, predictor, adversary, lambda_adv):
        self.predictor = predictor    # Main task classifier
        self.adversary = adversary    # Predicts protected attribute
        self.lambda_adv = lambda_adv  # Adversary weight

    def compute_loss(self, X, y, protected):
        pred = self.predictor(X)
        adv_pred = self.adversary(pred)

        # Task loss: minimize prediction error
        task_loss = cross_entropy(pred, y)

        # Adversary loss: minimize protected attribute prediction
        adv_loss = cross_entropy(adv_pred, protected)

        # Combined: good prediction + poor adversary performance
        return task_loss - self.lambda_adv * adv_loss

Post-Processing Techniques

Post-processing methods modify model predictions after the model has been trained. These approaches are model-agnostic and can be applied to any classifier, but they require access to protected attribute information at prediction time.

Threshold Adjustment

Post-Processing

Apply different decision thresholds to different groups to equalize outcome rates or error rates. Simple but effective when group membership is known at prediction time.

Implementation Steps
  1. Train classifier normally to produce probability scores
  2. Calculate ROC curves separately for each group
  3. Select thresholds that achieve desired fairness metric
  4. Apply group-specific thresholds at prediction time

Equalized Odds Post-Processing

Post-Processing

Modify predictions to achieve equalized odds by solving a linear program that finds the optimal randomized mapping from predictions to outcomes while satisfying fairness constraints.

Implementation Steps
  1. Compute TPR and FPR for each group
  2. Formulate linear program with equalized odds constraints
  3. Find optimal randomized decision rule
  4. Apply probabilistic outcome assignment

Calibrated Equalized Odds

Post-Processing

Combine calibration with equalized odds by finding group-specific probability mappings that achieve both calibrated scores and equal error rates across groups.

Implementation Steps
  1. Calibrate scores separately for each group
  2. Apply equalized odds optimization on calibrated scores
  3. Derive final score transformation
  4. Verify both calibration and equalized odds are satisfied

Reject Option Classification

Post-Processing

For predictions near the decision boundary (uncertain predictions), favor outcomes that benefit disadvantaged groups. Focuses fairness interventions where they have least impact on accuracy.

Implementation Steps
  1. Define uncertainty region around decision boundary
  2. For uncertain predictions, identify protected group
  3. Favor positive outcomes for disadvantaged groups
  4. Accept confident predictions without modification
Legal Consideration

Post-processing techniques that explicitly use protected attributes to make different decisions for different groups may raise legal concerns under anti-discrimination laws in some jurisdictions. This approach, sometimes called "disparate treatment," requires careful legal analysis even when the intent is to achieve fairness.

Comparing Mitigation Approaches

Each category of mitigation techniques has distinct characteristics that make it more or less suitable for different situations.

Aspect Pre-Processing In-Processing Post-Processing
Model Agnostic Yes - works with any algorithm No - requires algorithm changes Yes - works with any model
Protected Attribute Needed During training only During training At prediction time
Accuracy Impact Moderate - data modified Controllable via trade-off Can be minimal for uncertain cases
Implementation Complexity Low to moderate High - custom training Low - applied to outputs
Best For Historical data bias Strong fairness requirements Legacy models, quick fixes

Selecting Mitigation Strategies

Choosing the right mitigation approach depends on several factors including the source of bias, model constraints, and deployment requirements.

Bias in Historical Data

When training data reflects past discrimination or underrepresentation of certain groups.

Resampling Relabeling Fair Representation

Strict Fairness Requirements

When regulatory or ethical requirements demand specific fairness metrics be achieved.

Constrained Optimization Adversarial Debiasing

Legacy Model in Production

When you cannot retrain the model but need to improve fairness of deployed system.

Threshold Adjustment Reject Option

Unknown Protected Attribute

When protected attribute is not available at prediction time but known during training.

Fair Representation Adversarial Debiasing

Best Practices for Bias Mitigation

Implementation Guidelines
  • Start with data quality: Address data collection and labeling issues before applying algorithmic fixes
  • Combine approaches: Use multiple techniques together for more robust results
  • Monitor continuously: Fairness can degrade over time as data distributions shift
  • Document trade-offs: Record decisions about fairness-accuracy trade-offs and their rationale
  • Involve stakeholders: Include affected communities in decisions about fairness definitions and acceptable trade-offs
  • Test on held-out data: Validate fairness improvements on data not used for tuning

Common Pitfalls to Avoid

  • Optimizing single metric: Improving one fairness metric may worsen others
  • Ignoring intersectionality: Fairness for individual groups does not guarantee fairness for intersectional subgroups
  • Overfitting to training distribution: Fairness tuned on training data may not generalize
  • Assuming static data: Distribution shifts can invalidate carefully tuned fairness interventions
  • Treating mitigation as one-time: Bias mitigation requires ongoing monitoring and adjustment

Key Takeaways

  • Bias mitigation techniques can be applied at pre-processing (data), in-processing (algorithm), and post-processing (output) stages
  • Pre-processing methods are model-agnostic but modify the training data distribution
  • In-processing methods provide fine control over fairness-accuracy trade-offs but require custom training
  • Post-processing methods are quick to implement but may require protected attributes at prediction time
  • No single technique works universally; selection depends on bias source, constraints, and requirements
  • Effective mitigation often combines multiple approaches with continuous monitoring