Part 5: Explainability & Transparency | Module 6

Introduction to AI Explainability

Explainability in AI refers to the ability to understand and communicate how an AI system makes decisions. As AI systems increasingly influence high-stakes decisions in healthcare, finance, criminal justice, and employment, the ability to explain these decisions becomes both an ethical imperative and a regulatory requirement.

Transparency encompasses broader disclosure practices, including documentation of AI system capabilities, limitations, training data, and intended uses. Together, explainability and transparency form the foundation of accountable AI.

Regulatory Requirements

The EU AI Act requires high-risk AI systems to be "sufficiently transparent to enable users to interpret the system's output and use it appropriately." GDPR Article 22 provides a right to "meaningful information about the logic involved" in automated decisions. These requirements make explainability a compliance necessity, not just a best practice.

The Interpretability Spectrum

AI models exist on a spectrum from inherently interpretable to fundamentally opaque. Understanding where a model falls on this spectrum helps determine appropriate explainability approaches.

📋

Inherently Interpretable

Decision rules, linear models, shallow trees

🔍

Opaque / Black Box

Deep neural networks, ensemble methods

Interpretable by Design

Linear Models: Coefficients directly indicate feature importance and direction
Decision Trees: Rule paths from root to leaf are human-readable
Rule Lists: If-then rules that can be directly understood
GAMs: Generalized Additive Models show individual feature effects

Post-hoc Explainability

When using complex models, post-hoc explanation methods provide insights into model behavior after training. These methods approximate the model's decision-making process in human-understandable terms.

Key Explainability Methods

Several techniques have emerged as standard approaches for explaining AI predictions. Each has strengths and appropriate use cases.

🍋

LIME

Local Interpretable Model-agnostic Explanations

LIME explains individual predictions by fitting a simple, interpretable model (like linear regression) to the local neighborhood of the instance being explained.

Model Agnostic Local Explanations Any Data Type

How It Works

Perturbs the input, generates predictions for perturbed samples, weights by proximity to original, fits interpretable model to weighted samples.

📊

SHAP

SHapley Additive exPlanations

SHAP uses game-theoretic Shapley values to fairly attribute the prediction to each feature, providing both local and global explanations with theoretical guarantees.

Theoretically Grounded Consistent Local + Global

How It Works

Calculates the marginal contribution of each feature by considering all possible feature coalitions, fairly distributing the prediction among features.

👁

Attention Visualization

For Transformer Models

Visualizes attention weights in transformer models to show which parts of the input the model focuses on when making predictions.

Built-in Sequence Models Visual

How It Works

Extracts attention weight matrices from transformer layers and visualizes them as heatmaps showing relationships between input elements.

🔄

Counterfactual Explanations

What-if Analysis

Explains decisions by showing what minimal changes to the input would result in a different prediction, providing actionable insights.

Actionable Intuitive Contrastive

How It Works

Searches for the smallest perturbation that changes the prediction, revealing which features are most critical to the decision.

Comparing Explainability Methods

Method	Scope	Model Types	Best For
LIME	Local (per prediction)	Any model	Explaining individual decisions
SHAP	Local and Global	Any model	Feature importance with theory
Attention	Local	Transformers only	NLP and sequence tasks
Counterfactuals	Local	Any model	Actionable recourse
Integrated Gradients	Local	Differentiable models	Deep learning attribution
Partial Dependence	Global	Any model	Understanding feature effects

Limitations of Explainability

No explanation method perfectly captures model behavior. LIME explanations can be unstable across similar inputs. Attention weights may not reflect true causal importance. SHAP values can be computationally expensive. Always use multiple methods and validate explanations against domain knowledge.

Model Cards

Model cards are standardized documentation for machine learning models, proposed by Mitchell et al. (2019). They provide transparency about model capabilities, limitations, and appropriate use cases, serving as a form of "nutrition label" for AI systems.

Model Card Template

Model Details

Model Name: [Name and version] Developer: [Organization/team] Model Type: [Architecture description] Training Date: [When model was trained]

Intended Use

Primary Use: [Intended applications] Users: [Who should use this model] Out of Scope: [Inappropriate uses]

Performance Metrics

Metrics: [Accuracy, F1, AUC, etc.] Test Data: [Description of evaluation data] Subgroup Performance: [Performance across demographics]

Ethical Considerations

Risks: [Known risks and harms] Mitigations: [Steps taken to reduce harm] Caveats: [Important limitations]

Model Card Benefits

Model cards support responsible AI by forcing developers to consider ethical implications, enabling users to make informed decisions about deployment, providing accountability documentation, and facilitating regulatory compliance under frameworks like the EU AI Act.

Datasheets for Datasets

Datasheets for datasets, proposed by Gebru et al. (2018), document the motivation, composition, collection process, and intended uses of datasets. This transparency helps downstream users understand potential biases and appropriate applications.

Datasheet Structure

Motivation

Why was dataset created?
Who created it and for whom?
Who funded it?

Composition

What do instances represent?
How many instances total?
What data is included?
Is there sensitive information?

Collection Process

How was data collected?
Who collected it?
Over what timeframe?
Was consent obtained?

Preprocessing

What cleaning was done?
Was data filtered?
Is raw data available?

Uses

What tasks was it used for?
What should it not be used for?
Are there other impacts?

Distribution & Maintenance

How is it distributed?
Under what license?
Who maintains it?
How to report errors?

Implementing Transparency in Practice

Transparency Levels by Audience

End Users: Clear explanations of how AI affects them, what factors matter, and how to seek recourse
Business Stakeholders: Model performance metrics, limitations, appropriate use cases, and risk factors
Technical Teams: Detailed model architecture, training procedures, hyperparameters, and validation results
Regulators: Compliance documentation, audit trails, fairness assessments, and impact evaluations

Building Transparent AI Systems

Document from Start: Begin transparency documentation at project inception, not as an afterthought
Version Everything: Track changes to data, models, and documentation over time
Test Explanations: Validate that explanations are accurate and understandable to target audiences
Provide Recourse: Clearly communicate how users can contest or appeal AI decisions
Regular Updates: Keep documentation current as models are retrained or updated

Implementation Example

A financial services company deploying a credit scoring model provides: (1) Model cards with performance by demographic group, (2) SHAP-based explanations for each decision showing top factors, (3) Counterfactual explanations showing what changes would improve scores, (4) Clear appeals process for contested decisions, (5) Quarterly model monitoring reports published publicly.

Regulatory Requirements for Transparency

EU AI Act Requirements

Technical Documentation: High-risk AI must have comprehensive technical documentation describing capabilities and limitations
User Instructions: Clear instructions for deployers on intended use, capabilities, and known limitations
Logging: Automatic logging of AI system operation for traceability
Human Oversight: Information enabling appropriate human oversight measures

GDPR Right to Explanation

GDPR Article 22 provides rights related to automated decision-making:

Right to obtain human intervention in automated decisions
Right to express views and contest decisions
Right to "meaningful information about the logic involved"
Right to know the significance and envisaged consequences

Key Takeaways

Explainability enables understanding of AI decisions; transparency encompasses broader documentation and disclosure
LIME provides local explanations by fitting interpretable models to prediction neighborhoods
SHAP uses game theory to fairly attribute predictions to features with theoretical guarantees
Counterfactual explanations show what changes would result in different outcomes, enabling recourse
Model cards document model capabilities, limitations, and appropriate uses in a standardized format
Datasheets for datasets document data provenance, composition, and intended uses
Regulatory frameworks increasingly require transparency and explainability for high-risk AI systems