Part 6 of 6

Model Risks & Limitations

⏱ 40-50 min read ☆ Risk

Introduction

AI models are not infallible - they have fundamental limitations and face numerous risks that can cause failures in production. Understanding these risks is essential for AI governance, as mitigation must be built into systems from the start, not added as an afterthought.

This part covers the major risk categories that affect AI systems: drift, adversarial attacks, the black box problem, and inherent limitations.

Model Drift

📈

Model Drift

Performance degradation over time as the world changes

All models degrade over time. The patterns learned during training may not hold as circumstances change. This is one of the most common causes of AI system failures in production.

Types of Drift

  • Data Drift (Covariate Shift): The distribution of input data changes. Example: Customer demographics shift, product mix changes, seasonal patterns evolve.
  • Concept Drift: The relationship between inputs and outputs changes. Example: What constitutes "spam" evolves as tactics change.
  • Label Drift: The meaning or distribution of labels changes. Example: Categories are redefined, class balance shifts.
Mitigation Strategies
  • Continuous monitoring of prediction distributions
  • Regular comparison against ground truth when available
  • Statistical tests for distribution changes
  • Scheduled retraining pipelines
  • Trigger-based retraining when drift exceeds thresholds

Real-World Example: COVID-19

The pandemic caused dramatic drift across virtually all AI systems. Demand forecasting models failed as purchasing patterns changed overnight. Fraud detection systems saw false positive rates spike as normal behavior shifted. Credit scoring models based on employment stability became unreliable.

Adversarial Attacks

Adversarial Attacks

Intentional manipulation to cause model failures

Adversarial attacks exploit the way models learn, crafting inputs that cause incorrect predictions while appearing normal to humans. These attacks reveal fundamental fragilities in how AI systems perceive the world.

Attack Types

  • Evasion Attacks: Modify inputs at inference time to cause misclassification. Example: Adding imperceptible noise to an image to change its predicted class.
  • Poisoning Attacks: Corrupt training data to influence model behavior (covered in Part 5).
  • Model Extraction: Query the model to reconstruct its parameters or training data.
  • Prompt Injection: For language models, craft inputs that override intended behavior.
Mitigation Strategies
  • Adversarial training - include adversarial examples in training
  • Input validation and sanitization
  • Ensemble methods - harder to fool multiple models
  • Rate limiting and anomaly detection for queries
  • Defense in depth - multiple security layers

Physical-World Attacks

Adversarial attacks aren't limited to digital inputs. Researchers have demonstrated attacks using physical objects: stickers that cause stop signs to be misclassified, glasses that defeat facial recognition, and clothing patterns that make people invisible to detection systems. These have serious implications for safety-critical AI systems.

The Black Box Problem

The Black Box Problem

Inability to explain why models make specific predictions

Deep learning models are often called "black boxes" because their internal decision-making processes are opaque. This creates challenges for accountability, debugging, regulatory compliance, and user trust.

Why It Matters

  • Regulatory Requirements: Laws like GDPR require "meaningful information about the logic involved" in automated decisions.
  • Debugging: Hard to fix errors if you don't understand why they occur.
  • Trust: Users and stakeholders may not accept decisions they can't understand.
  • Liability: Who is responsible when an unexplainable AI makes a harmful decision?
Approaches to Explainability
  • Use interpretable models where possible (trade-off with performance)
  • Post-hoc explanation methods (LIME, SHAP)
  • Attention visualization for transformer models
  • Concept-based explanations
  • Counterfactual explanations - "what would need to change"

The Explainability Trade-off

There's often a trade-off between model performance and interpretability. Simpler, more interpretable models (decision trees, linear models) may perform worse than complex deep learning models. Organizations must decide where on this spectrum their applications should fall based on risk and requirements.

Inherent Model Limitations

Beyond specific risks, AI models have fundamental limitations that should inform expectations and governance.

No True Understanding

Models learn statistical patterns, not true understanding. They can fail spectacularly on examples outside their training distribution, even when those examples seem trivial to humans.

Correlation vs. Causation

Models find correlations, which may be spurious. A model might learn that hospital admission predicts death - not because admission causes death, but because sick people are admitted.

Distribution Dependence

Models only work well on data similar to their training data. Performance degrades on out-of-distribution inputs, often without warning.

Brittleness

Small input changes can cause large output changes. Models lack the robustness of human perception and reasoning.

No Common Sense

Models lack the background knowledge humans use for reasoning. They can make errors that seem absurd to humans but are consistent with their training.

Uncertainty Blindness

Many models don't know what they don't know. They may produce confident predictions on inputs they've never seen before.

Operational Risks

Beyond model-specific risks, operational factors can cause AI system failures.

Common Operational Risks

  • Data Pipeline Failures: Broken ETL processes, schema changes, data source outages
  • Feature Store Issues: Stale features, incorrect calculations, missing values
  • Infrastructure Problems: Scaling failures, latency spikes, resource exhaustion
  • Deployment Errors: Wrong model version deployed, configuration mistakes
  • Monitoring Gaps: Failures go undetected because the right metrics aren't tracked

MLOps Best Practices

Robust operational practices include: version control for data, code, and models; automated testing pipelines; canary deployments; comprehensive monitoring; runbooks for incident response; and regular chaos engineering to test resilience.

Risk Assessment Framework

A systematic approach to AI risk assessment helps prioritize mitigation efforts.

Risk Assessment Questions

  • Impact: What's the worst-case outcome of a model failure?
  • Frequency: How often might failures occur?
  • Detectability: How quickly would failures be noticed?
  • Reversibility: Can the harm from failures be undone?
  • Human Oversight: Is there human review before consequential actions?
  • Attack Surface: Who might want to attack this system, and how?

Risk-Based Governance

Not all AI systems need the same level of governance. High-risk applications (healthcare, criminal justice, autonomous vehicles) require rigorous testing, monitoring, and human oversight. Lower-risk applications (content recommendations, spell check) can tolerate more automation. Match governance intensity to risk level.

Building Resilient AI Systems

Key Principles

  • Defense in Depth: Multiple layers of protection, not single points of failure
  • Graceful Degradation: Systems should fail safely, not catastrophically
  • Human-in-the-Loop: Keep humans involved in high-stakes decisions
  • Continuous Monitoring: Detect problems before they cause significant harm
  • Regular Testing: Probe systems for weaknesses before attackers do
  • Incident Response: Have plans ready for when (not if) things go wrong

Key Takeaways

  • Model drift causes all models to degrade over time - continuous monitoring is essential
  • Adversarial attacks can manipulate models in ways that may be invisible to humans
  • The black box problem creates challenges for accountability and regulatory compliance
  • Models have fundamental limitations - no true understanding, correlation-based, distribution-dependent
  • Operational risks are as important as model risks
  • Risk assessment should consider impact, frequency, detectability, and reversibility
  • Resilient systems require defense in depth, graceful degradation, and human oversight