CS180 Project 4: Image Warping and Mosaicing & Feature Matching for Autostitching
Author: Nicolas Rault-Wang (nraultwang at berkeley.edu)
Credit to Notion for this template.
1. Recovering a Homography
We begin by explaining how we estimate the homography transformation matrix to map into the geometry of .
Two images and with the same center of projection are related by a homography. To estimate this transform, we first need to label a set of feature correspondence points between both images.
Note that must be at least 4 because homographies have 8 degrees of freedom due to scale invariance andeach pair of correspondences provides two constraints on .
1a. Establishing Point Correspondences
The figure below shows some example pairs of correspondence points. Well-defined edges are excellent features to correspond because they are easy to identify in both images.
1b. Estimating the Homography
While a minimum of 4 correspondence pairs are required to solve for , this system is very sensitive to noise. To produce a more robust homography, we use more than 4 correspondence pairs to form an over-determined system, then use least-squares to obtain an estimate of the free parameters of .
Working in homogeneous coordinates, the derivation of two equations constraining the free parameters of from an arbitrary correspondence pair is straightforward:
Stacking all equations into an feature matrix and target vector gives
where we have unrolled into .
For well-chosen correspondence points , the columns of are linearly independent, making have full column rank. Hence, the least-squares estimate of , and thus , is uniquely given by
2. Warping Images with a Homography
In all following sections, we’ll use the term “canvas” to refer to an -axis aligned bounding box that is large enough to contain all points in the image of the homography transform.
2a. Warping One Image
With the homography relating and , we map into with an inverse warping procedure:
- Determine the minimum dimensions of the canvas by warping the corners of into .
- For every canvas pixel in the quadrilateral formed from the warped corners of , compute the point in the geometry of by inverting the homography.
- In general, will not lie exactly on a grid point in , so we interpolate the RGBA color vector at from the grid points of . Our code uses
scipy.interpolate.RegularGridInterpolator
to do this.
- After completing steps 2 and 3, we have the color value at each canvas point , completing the process of inverse-warping to the canvas with , producing in the geometry of .
- To avoid edge-artifacts, we use a 2-level laplacian pyramid to smoothly blend and .
- We found that a mask created from the Euclidean distance transform on and works very well. Our code uses
scipy.ndimage.distance_transform_edt
to do this.
- We found that a mask created from the Euclidean distance transform on and works very well. Our code uses
2b. Image Rectification
An application of homography warping is undoing (rectifying) the perspective distortion of rectangular objects in images. This is theoretically possible because two images of a planar rectangular-looking object, like a tile or poster, are related by a homography.
Below, we show some examples of this operation. The image on the left is the original and the image on the right is the result of rectification.
3. Creating Panoramas
We can extend our one-image warping procedure to create panoramas. Since two images with the same center of projection are related by a homography, we can form a panorama by warping a set of images with the same center of projection into a particular image in this set. This can be implemented by recursively applying the the one-image warp to grow a panorama one image at a time.
4. Auto Stitching Panoramas
We’ll follow the paper “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al. to automates the process of finding correspondence points between two images.
4a. Harris Interest Point Detector
The Harris corner detector uses a fixed-size window to search 7 image scales for interest point detections. This makes the corner detection process scale-invariant. See the Bells and Whistles section below for more details.
4b. Adaptive Non-Maximal Suppression (ANMS)
For each Harris point , ANMS first computes the minimum “suppression radius” between and every with much greater corner response, satisfying . Then, ANMS sorts all points by and returns the top points with largest suppression radii. The points that make it through this filter have strong corner responses and are spread evenly over the image.
4c. Create MOPS Feature Detectors
Next, we extract a 40x40 pixel region around each corner point remaining after ANMS. See the Bells and Whistles section below for more details about how this feature is extracted.
4d. Lowe Outlier Rejection
The Lowe outlier rejection uses the observation that corresponding feature descriptors are closest to each other and much farther from any other descriptor to reject feature descriptors that likely exist in only one image.
We implemented this test by computing the L2 distance between each pair of MOPS descriptors from images A and B, then keeping only the pairs of descriptors with (1-nearest-neighbor distance) / (2-nearest-neighbor distance) less than some threshold. This process also allowed us to create an initial set of correspondence points between A and B, as indicated by point colors above.
4e. Random Sampling and Consensus (RANSAC)
Random Sampling and Consensus (RANSAC) can be used to find a robust set of correspondence pairs for homography estimation. Each of many iterations does the following.
- Sample a set of 4 correspondences between A and B without replacement.
- Exactly compute the homography using these points.
- Apply to the set of all correspondence points returned by step 4c.
- Identify the inlier correspondences: , such that .
- Repeat until 1-4 many times and return the largest set of inliers.
Since the inliers returned by RANSAC have no large outliers, applying least-squares to just these inliers will tend to produce a more robust estimate of .
4f. Result
5. Sample Auto-Stitched Mosaics
6. Coolest thing I learned
I think the idea that images are really a pencil of rays and that a camera image just captures a subset of it was the coolest thing I learned through this project. Taking photographs and feeding them into my pipeline helped me understand the importance of this premise because I got bad results when I moved even a little between taking photos. I’d be interested in learning if there any ways to easily generalize this abstraction of images to extract depth from images with different centers of projection.
7. Bells & Whistles
4B: Scale-Invariant Corner Detection
At high resolution, a large corner might not fit inside the Harris detector’s window, causing it to look like an edge rather than a corner. However, if we analyze smaller and smaller versions of this image, eventually this large corner will fit inside the Harris detector’s window, resulting in a corner detection.
So, we can make our corner detector scale invariant by applying the Harris detector to an input image shrunk by increasing powers of two. This method enables us to find both the smallest and largest edges in an image.
4B: Rotationally Invariant Feature Descriptors
- We use inverse warping to extract a image patch centered at each Harris corner and aligned with the smoothed image gradient at that point.
- Because the patches are aligned with the image gradient and Harris corners are rotationally invariant, these oriented-features will also be rotationally invariant.
- This property is useful because it makes our matching procedure robust to rotations about a camera’s z-axis.
- At right are some examples of the procedure, showing how a 40x40 region is extracted from the image then downsampled and normalized to produce a robust MOPS freature descriptor.
4B: Panorama Recognition
In this section, we’ll describe our method for automatically forming groups of images that can be plausibly stitched into a panorama. A drawback of our auto-stitching method described in part 4 is the need to manually specify which images are part of panorama.
Our approach to automatic image grouping relies on the following observation:
- As input, our system takes a collection of images without any information about how they are related.
- For demonstration, we’ll demonstrate our method on a collection with 9 images containing three triplets of images that should be grouped into 3 separate panoramas.
- Using our section 4 code, we can compute the number of RANSAC inliers between each pair of images and to build a complete weighted graph with nodes and edges, where
- Node represents image .
- Edge is weighted by the number of inliers between and .
- Applying our key observation, we form a graph by removing any edges with weight less than 6.
- Finally, we run Depth-First Search (DFS) on to obtain the connected components of .
- The images within each connected component can now be automatically stitched together using our mosaic stitching code from part 3, as shown below.