FISB: Feature-based Image Stitching Benchmark
Image stitching — sewing multiple photographs into a single panorama — sounds like a solved problem. Your phone does it. Google Street View does it. But when I tried to evaluate different stitching algorithms systematically during my time at IIT Jodhpur, I found that there was no standard benchmark to compare them against. So we built one.
This work, done with my friend Abhishek Rajora, addresses that gap: a benchmarking study of classical feature-based stitching pipelines, accompanied by a custom dataset of 49 scenes.
The Pipeline
Feature-based stitching has four sequential stages, and the choice at each stage compounds:
- Feature Detection — find interesting keypoints in each image (SIFT, AKAZE, BRISK, ORB)
- Feature Matching — match keypoints across image pairs (Brute Force, KNN, FLANN)
- Geometric Transformation — estimate the homography via RANSAC
- Blending — merge the warped images seamlessly (Alpha, Gaussian, MultiBand, Seamless Clone)
What We Found
More features ≠ better stitching
ORB detected 11,876 matches — three times more than SIFT’s 3,707. And yet its SSIM was 0.52 vs SIFT’s 0.96. Most of ORB’s matches are redundant or noisy. Speed comes with a heavy accuracy cost.
| Detector | Matches | SSIM | PSNR |
|---|---|---|---|
| SIFT | 3707 | 0.96 | 22.26 |
| AKAZE | 4210 | 0.89 | 20.62 |
| BRISK | 3798 | 0.92 | 18.54 |
| ORB | 11876 | 0.52 | 8.87 |
Seamless Clone wins decisively
Among blenders, Seamless Clone was the clear winner — and for an interesting reason. Unlike Alpha or Gaussian blending (which operate in pixel intensity space), Seamless Clone works in the gradient domain. It preserves edge transitions even when the images have different lighting conditions.
| Blender | SSIM | PSNR |
|---|---|---|
| Alpha | 0.55 | 7.98 |
| Gaussian | 0.49 | 12.23 |
| MultiBand | 0.92 | 18.49 |
| Seamless Clone | 0.96 | 22.26 |
FLANN trades accuracy for speed — unfavorably at small scale
FLANN uses approximate nearest neighbors (KD-tree for SIFT, LSH for ORB). At small dataset scale, the approximation costs more in quality than it saves in time. For real-time applications at scale it remains useful, but for a careful offline pipeline, Brute Force matching with SIFT was optimal.
The Dataset
No public benchmark existed for image stitching at the time. We created a 49-scene dataset covering:
- Rotation, perspective shifts, zoom, viewpoint change, illumination variation
- Both real and digitally synthesized images
- Indoor and outdoor scenes
We also evaluated on Google Landmarks. Dataset available here.
Optimal Configuration (small dataset, offline use)
SIFT + Brute Force Matcher + Seamless Clone Blending
MultiBand is a strong runner-up. Full report here — GitHub repo here.
