AI Image Detection with Grad-CAM
Detecting AI-Generated Images Using Deep Learning & Interpretability
VGG16 · EfficientNet-B0 · ResNet18 · Grad-CAM · Transfer Learning
DS-5220 Deep Learning | Vanderbilt University | Fall 2024
Developed and evaluated deep learning models for automatic detection of AI-generated images. Compared three CNN architectures on the CIFAKE dataset (120K images) and used Gradient-weighted Class Activation Mapping (Grad-CAM) to provide interpretability, revealing how models distinguish real photographs from synthetic images.
Key Achievement
98.42%
Best accuracy (EfficientNet-B0)
0.9987
ROC-AUC score
26x
More efficient than VGG16
The Problem
With the rise of generative AI models like Stable Diffusion and DALL-E, distinguishing between real photographs and AI-generated images has become increasingly important for combating misinformation. This project explores whether deep learning can reliably detect synthetic images and, crucially, understand why certain images are classified as fake.
My Contribution: VGG16 Model
I was responsible for implementing and evaluating the VGG16 architecture, a classic deep CNN with 138M parameters.
97.94%
Accuracy
0.9794
F1-Score
0.9981
ROC-AUC
138M
Parameters
Grad-CAM Insights from VGG16:
- Fake images: Model focuses on diffuse background textures and unnatural artifacts
- Real images: Attention concentrated on object outlines and natural details
Comparative Analysis: 3 Architectures
EfficientNet-B0
Kanu Shetkar
5.3M parameters | Winner: Best accuracy with fewest params
VGG16 (My Model)
Roshan Sivakumar
138M parameters | Classic architecture, strong performance
ResNet18
Beema Rajan
11M parameters | Modern baseline with skip connections
Key Insights
- Bigger is not always better: VGG16 (138M params) did not outperform EfficientNet-B0 (5.3M params)
- Efficiency matters: Compound scaling in modern architectures provides significant advantages
- Interpretability is crucial: Grad-CAM revealed all models learned meaningful, distinguishable features
- All models highly effective: Every architecture achieved >96% accuracy on AI detection
Dataset: CIFAKE
120K
Images
32×32
Resolution
50/50
Real/Fake Split
SD
Stable Diffusion
Real images from CIFAR-10, synthetic images generated with Stable Diffusion
Technologies Used
Team
- Roshan Sivakumar - VGG16
- Kanu Shetkar - EfficientNet-B0
- Beema Rajan - ResNet18