Metadata Analysis

Document and image metadata often reveals internal usernames, software versions, file paths, and author information that isn’t visible in the document itself.


Document Metadata (Exiftool)

# Single file
exiftool document.pdf
exiftool photo.jpg
 
# Recursive directory
exiftool -r downloaded_docs/
 
# Extract specific fields
exiftool -Author -Creator -Software -Producer document.pdf
 
# Output as CSV
exiftool -csv *.pdf > metadata.csv
 
# Batch extract and search for usernames
exiftool -Author -Creator -r . | grep -i "author\|creator" | sort -u

Key fields to extract:

FieldWhat it reveals
Author / CreatorInternal username → email address candidate
Last Modified ByMost recent editor
Software / ProducerOS and application versions → CVE lookup
Internal pathsUNC paths, share names, internal hostnames
CompanyConfirm target, naming conventions
Creation / Modification datesDocument age, timezone
GPS coordinatesLocation of image capture

FOCA

Windows GUI tool that automates document harvesting + metadata extraction for a target domain.

# GitHub: https://github.com/ElevenPaths/FOCA
# Workflow:
# 1. Enter target domain
# 2. FOCA searches Google/Bing for indexed documents (PDF, DOCX, XLSX, PPT)
# 3. Downloads and extracts metadata automatically
# 4. Displays usernames, software versions, paths, emails

Image EXIF Data

# Extract GPS from photo
exiftool -GPSLatitude -GPSLongitude -GPSAltitude photo.jpg
 
# Extract all EXIF from image
exiftool -all photo.jpg
 
# Strip EXIF before sharing (OPSEC)
exiftool -all= photo.jpg

GPS to Google Maps:

# Decimal conversion from deg/min/sec
exiftool -n -GPSLatitude -GPSLongitude photo.jpg

Online tools:


PDF Metadata

# pdfinfo
pdfinfo document.pdf
 
# Strings — extract readable text including hidden metadata
strings document.pdf | grep -i "author\|creator\|producer"
 
# pdf-parser.py (Didier Stevens)
pdf-parser.py --search author document.pdf

Video Metadata

# ffprobe (part of ffmpeg)
ffprobe -v quiet -print_format json -show_format video.mp4
 
# MediaInfo
mediainfo video.mp4

Protecting Against Metadata Leakage

(Defensive perspective — to know what defenders look for)

  • Strip metadata before publishing: exiftool -all= file
  • Use LibreOffice’s “Document Properties → Security → Remove personal info on save”
  • PDF sanitisation tools: qpdf --linearize --sanitize input.pdf output.pdf

See Also