Part 4 of 5

Data Carving

🕑 150-180 minutes 📖 Advanced Level 📋 Module 4

Introduction

Data carving is a file recovery technique that extracts files from raw data without relying on file system metadata. When files are deleted, formatted, or the file system is damaged, carving can recover data by identifying file signatures and structures directly in the disk image.

📚 Learning Objectives

By the end of this part, you will understand the principles of data carving, recognize common file signatures (magic numbers), understand techniques for handling fragmented files, and use specialized carving tools like Scalpel and PhotoRec.

Data Carving Principles

Data carving works independently of the file system. Instead of using MFT entries, inodes, or FAT tables, it searches raw disk data for recognizable patterns that indicate the start and end of files.

When Carving is Needed

  • File system is corrupted or damaged
  • Disk has been formatted
  • File system metadata has been overwritten
  • Recovering from unallocated space
  • MFT entries have been reused
  • Unknown or unsupported file system

Carving vs File System Recovery

Aspect File System Recovery Data Carving
Relies on metadata Yes (MFT, inodes, etc.) No
Recovers file names Yes No (usually)
Recovers timestamps Yes Sometimes (embedded in file)
Handles fragmentation Yes (has data run info) Difficult
Works after format Limited Yes
🔍 Best Practice

Always attempt file system recovery first. If MFT entries exist for deleted files, you'll get more complete information (file names, timestamps, paths). Use carving as a secondary technique for data that file system recovery cannot find.

File Signatures (Magic Numbers)

Every file format has characteristic byte sequences, usually at the beginning (header) and sometimes at the end (footer). These signatures enable carving tools to identify file types and boundaries.

JPEG File Signature Structure
FF
D8
FF
...
...
...
...
...
...
Header (FF D8 FF) | Image Data | Footer (FF D9)

Common File Signatures

File Type Header (Hex) Footer (Hex)
JPEG FF D8 FF FF D9
PNG 89 50 4E 47 0D 0A 1A 0A 49 45 4E 44 AE 42 60 82
GIF 47 49 46 38 (GIF8) 00 3B
PDF 25 50 44 46 (%PDF) 25 25 45 4F 46 (%%EOF)
ZIP/DOCX/XLSX 50 4B 03 04 (PK..) 50 4B 05 06
RAR 52 61 72 21 1A 07 Variable
MP3 FF FB or 49 44 33 (ID3) None standard
MP4 00 00 00 xx 66 74 79 70 None standard
Windows EXE 4D 5A (MZ) None standard
ELF (Linux) 7F 45 4C 46 None standard
💡 Office Documents

Modern Microsoft Office files (DOCX, XLSX, PPTX) are actually ZIP archives containing XML files. They share the same signature as ZIP files. Legacy Office formats (DOC, XLS, PPT) use the OLE Compound File format with signature D0 CF 11 E0.

Carving Techniques

Different carving approaches handle various scenarios with different trade-offs between speed, accuracy, and complexity.

Header-Footer Carving

The simplest technique: search for header, then search for footer, extract everything between them.

  • Pros: Simple, fast, accurate file boundaries
  • Cons: Requires known footer, fails with fragmented files
  • Best for: JPEG, PNG, PDF files

Header-Max Size Carving

Find header, then extract a maximum expected size. Useful when files have no footer.

  • Pros: Works without footer signature
  • Cons: May include extra data, wastes space
  • Best for: MP3, AVI, executables

Structure-Based Carving

Parse internal file structure to determine size and validate content.

  • Pros: Most accurate, validates file integrity
  • Cons: Complex, slow, format-specific
  • Best for: ZIP, Office documents, databases
1
Scan Raw Data
Search for file signatures sector by sector
2
Identify Header
Match byte pattern to known file type
3
Find Boundary
Locate footer or calculate size
4
Extract File
Copy data to recovered file
5
Validate
Verify file opens correctly

Handling Fragmented Files

Fragmentation is the biggest challenge in data carving. When files are not stored in contiguous clusters, simple header-footer carving fails because the data between header and footer includes unrelated content.

Fragmentation Scenarios

  • Bifragmented: File split into exactly two pieces
  • Multi-fragmented: File split across many non-contiguous areas
  • Interleaved: Multiple files' fragments mixed together

Advanced Carving Techniques

Semantic Carving

Uses understanding of file format structure to validate and reassemble fragments. For example, validating JPEG markers or ZIP central directory consistency.

Statistical Analysis

Analyzes entropy and statistical properties to identify boundaries between different file fragments.

Graph-Based Reassembly

Treats fragments as nodes and uses content analysis to determine which fragments connect, building a graph of possible reassemblies.

Fragmentation Reality

Studies show that while many files are contiguous, a significant percentage are fragmented. On heavily used systems, fragmentation rates of 20-30% are common. However, most fragmented files have only 2-3 fragments, making partial recovery feasible with advanced tools.

Data Carving Tools

Several specialized tools exist for data carving, ranging from simple signature-based carvers to sophisticated analysis platforms.

PhotoRec

Powerful open-source carver supporting 480+ file formats. Works on disk images or directly on devices. Excellent for images, documents, and multimedia.

Free Cross-Platform

Scalpel

Fast, efficient header-footer carver. Highly configurable with custom signature definitions. Based on Foremost.

Free Linux

Foremost

Original header-footer carver developed for US Air Force. Simple but effective. Good for basic carving tasks.

Free Linux

Bulk Extractor

Extracts features like email addresses, URLs, credit card numbers, and embedded JPEG images. High-speed parallel processing.

Free Cross-Platform

EnCase/FTK

Commercial forensic suites with integrated carving capabilities. Structure-aware carving with validation.

Commercial Windows

Autopsy

Open-source forensic platform with PhotoRec integration. GUI-based carving with result organization.

Free Cross-Platform

Configuring Custom Signatures

Most carving tools allow custom signature definition. Scalpel uses a configuration file format:

💡 Scalpel Configuration Example

# file_type case_sensitive footer_present header_value [footer_value] [max_size]
jpg y y \xff\xd8\xff \xff\xd9 5000000
pdf y y %PDF %%EOF 20000000
doc y n \xd0\xcf\x11\xe0 10000000

Practical Carving Workflow

A systematic approach to data carving maximizes recovery while managing large volumes of output.

Step 1: Target Selection

Identify what to carve from:

  • Full disk image - comprehensive but slow
  • Unallocated space only - faster, focuses on deleted data
  • Specific partitions - targeted recovery

Step 2: Configure File Types

Select file types relevant to the investigation. Carving everything generates massive output and takes longer.

Step 3: Run Carving Tool

Execute the carving process. This can take hours for large images.

Step 4: Review Results

Carving generates many false positives - fragments that match signatures but aren't valid files. Manually review or use validation scripts.

Step 5: Validate and Hash

Test that carved files open correctly. Calculate hashes for evidence tracking.

🔍 Managing Carving Output

Carving can produce thousands of files, many of which are partial or corrupted. Organize output by file type, use duplicate detection to reduce volume, and focus validation on file types most relevant to your case. Document which carved files were actually usable.

Carving Limitations

Understanding carving limitations helps set appropriate expectations and choose the right recovery approach.

Technical Limitations

  • No file names: Carved files get sequential names, not original names
  • No timestamps: Unless embedded in file itself (like EXIF)
  • No folder structure: Files recovered as flat collection
  • Fragmentation: Severely fragmented files often unrecoverable
  • Overwritten data: Cannot recover data that's been overwritten

Format-Specific Issues

  • Encrypted files: Carved but useless without decryption key
  • Compressed archives: Partial recovery rarely usable
  • Database files: Need complete structure for queries
  • Video containers: Complex structures often fragment badly
SSD Carving Challenges

On SSDs with TRIM enabled, deleted data is often truly gone. The TRIM command tells the SSD to erase deleted blocks, leaving nothing for carving to find. SSD carving may only succeed for very recently deleted files or files on partitions where TRIM wasn't active.

📚 Key Takeaways
  • Data carving recovers files using signatures, not file system metadata
  • File signatures (magic numbers) identify file types and boundaries
  • Header-footer carving works well for files like JPEG and PDF with known endings
  • Fragmented files are the biggest challenge - advanced techniques partially address this
  • PhotoRec and Scalpel are powerful free carving tools
  • Carved files lack original names, timestamps, and folder structure
  • Carving produces many false positives - validation is essential
  • SSDs with TRIM make carving largely ineffective for deleted data