CS180 Project 4: Image Warping and Mosaicing & Feature Matching for Autostitching

Author: Nicolas Rault-Wang (nraultwang at berkeley.edu)

Credit to Notion for this template.

1. Recovering a Homography

We begin by explaining how we estimate the homography transformation matrix $H$ to map $A$ into the geometry of $B$ .

Two images $A$ and $B$ with the same center of projection are related by a homography. To estimate this transform, we first need to label a set of $N\geq 4$ feature correspondence points $(x_j, y_j) \leftrightarrow (x_j^\prime, y_j^\prime)$ between both images.

Note that $N$ must be at least 4 because homographies have 8 degrees of freedom due to scale invariance andeach pair of correspondences provides two constraints on $H$ .

1a. Establishing Point Correspondences

The figure below shows some example pairs of correspondence points. Well-defined edges are excellent features to correspond because they are easy to identify in both images.

1b. Estimating the Homography

While a minimum of 4 correspondence pairs are required to solve for $H$ , this system is very sensitive to noise. To produce a more robust homography, we use more than 4 correspondence pairs to form an over-determined system, then use least-squares to obtain an estimate of the free parameters of $H$ .

Working in homogeneous coordinates, the derivation of two equations constraining the free parameters of $H$ from an arbitrary correspondence pair $(x_j, y_j) \leftrightarrow (x_j^\prime, y_j^\prime)$ is straightforward:

\begin{bmatrix}a&b&c\\d&e&f\\g&h&1\end{bmatrix}\begin{bmatrix}x_j\\y_j\\1\end{bmatrix} = w \begin{bmatrix}x_j^\prime\\y_j^\prime\\1\end{bmatrix}

\begin{cases}ax_j + by_j + c &= wx_j^\prime\\dx_j + ey_j + f &= wy_j^\prime\\gx_j + hy_j + 1 &= w\end{cases} \implies \begin{cases}ax_j + by_j + c &= (gx_j + hy_j + 1) x_j^\prime \\ dx_j + ey_j + f &= (gx_j + hy_j + 1) y_j^\prime \end{cases}

\begin{cases} ax_j + by_j + c +0d+0e+0f- gx_j x_j^\prime - hy_j x_j^\prime = x_j^\prime \\ 0a + 0b + 0c +dx_j + ey_j + f - gx_j y_j^\prime - hy_j y_j^\prime = y_j^\prime \end{cases}

Stacking all $2N$ equations into an $2N\times 8$ feature matrix $X$ and $2N\times 1$ target vector $\vec y$ gives

\underbrace{\begin{bmatrix}x_1&y_1&1&0&0&0&-x_1x_1^\prime&-y_1x_1^\prime \\0&0&0&x_1&y_1&1&-x_1y_1^\prime&-y_1y_1^\prime\\&&&&\vdots&&&&\\x_j&y_j&1&0&0&0&-x_jx_j^\prime&-y_jx_j^\prime \\0&0&0&x_j&y_j&1&-x_jy_j^\prime&-y_jy_j^\prime\\&&&&\vdots&&&&\\x_N&y_N&1&0&0&0&-x_Nx_N^\prime&-y_Nx_N^\prime \\0&0&0&x_N&y_N&1&-x_Ny_N^\prime&-y_Ny_N^\prime \end{bmatrix}}_{X} \vec \theta= \underbrace{\begin{bmatrix} x_1^\prime \\ y_1^\prime\\\vdots\\x_j^\prime \\ y_j^\prime\\\vdots\\x_N^\prime \\ y_N^\prime\end{bmatrix}}_{\vec y}

where we have unrolled $H$ into $\vec \theta = \begin{bmatrix}a&b&c&d&e&f&g&h\end{bmatrix}^T$ .

For $N \geq 4$ well-chosen correspondence points $(x_j, y_j) \leftrightarrow (x_j^\prime, y_j^\prime)$ , the columns of $X$ are linearly independent, making $X$ have full column rank. Hence, the least-squares estimate of $\vec\theta$ , and thus $H$ , is uniquely given by

\vec\theta^* = (X^TX)^{-1}X^T\vec y

2. Warping Images with a Homography

In all following sections, we’ll use the term “canvas” to refer to an $xy$ -axis aligned bounding box that is large enough to contain all points in the image of the homography transform.

2a. Warping One Image

With the homography $H$ relating $A$ and $B$ , we map $A$ into $B$ with an inverse warping procedure:

Determine the minimum dimensions of the canvas by warping the corners of $A$ into $B$ .

For every canvas pixel $(i^\prime, j^\prime)$ in the quadrilateral formed from the warped corners of $A$ , compute the point $(x,y)$ in the geometry of $A$ by inverting the homography.
$\begin{bmatrix}x \\y\\1\end{bmatrix} = H^{-1}\begin{bmatrix}i^\prime \\j^\prime\\1\end{bmatrix}$

In general, $(x,y)$ will not lie exactly on a grid point in $A$ , so we interpolate the RGBA color vector $C(i^\prime, j^\prime)$ at $(x,y)$ from the grid points of $A$ . Our code uses scipy.interpolate.RegularGridInterpolator to do this.

After completing steps 2 and 3, we have the color value $C(i^\prime, j^\prime)$ at each canvas point $(i^\prime, j^\prime)$ , completing the process of inverse-warping $A$ to the canvas with $B$ , producing $A^\prime$ in the geometry of $B$ .

To avoid edge-artifacts, we use a 2-level laplacian pyramid to smoothly blend $A^\prime$ and $B$ .
1. We found that a mask created from the Euclidean distance transform on $A^\prime$ and $B$ works very well. Our code uses scipy.ndimage.distance_transform_edt to do this.

The canvas is computed by transforming the corners of $A$ with the homography $H$ .

$A^\prime$ : the result of warping $A$ into the common canvas via the homography $H$ .

2b. Image Rectification

An application of homography warping is undoing (rectifying) the perspective distortion of rectangular objects in images. This is theoretically possible because two images of a planar rectangular-looking object, like a tile or poster, are related by a homography.

Below, we show some examples of this operation. The image on the left is the original and the image on the right is the result of rectification.

Rectification of the rectangular windows of the building.

Rectification of the bathroom floor tiles.

3. Creating Panoramas

We can extend our one-image warping procedure to create panoramas. Since two images with the same center of projection are related by a homography, we can form a panorama by warping a set of images with the same center of projection into a particular image in this set. This can be implemented by recursively applying the the one-image warp to grow a panorama one image at a time.

4. Auto Stitching Panoramas

We’ll follow the paper “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al. to automates the process of finding correspondence points between two images.

4a. Harris Interest Point Detector

The Harris corner detector uses a fixed-size window to search 7 image scales for interest point detections. This makes the corner detection process scale-invariant. See the Bells and Whistles section below for more details.

4b. Adaptive Non-Maximal Suppression (ANMS)

Step 2. Use Adaptive Non-Maximal Suppression (ANMS) to filter all but 750 interest points.

For each Harris point $x_i$ , ANMS first computes the minimum “suppression radius” $r_i$ between $x_i$ and every $x_k$ with much greater corner response, satisfying $f(x_i) < 0.9 f(x_j)$ . Then, ANMS sorts all points by $r_i$ and returns the top $N$ points with largest suppression radii. The points that make it through this filter have strong corner responses and are spread evenly over the image.

4c. Create MOPS Feature Detectors

Next, we extract a 40x40 pixel region around each corner point remaining after ANMS. See the Bells and Whistles section below for more details about how this feature is extracted.

4d. Lowe Outlier Rejection

Inliers according to Lowe outlier rejection. Corresponding points have the same color.

The Lowe outlier rejection uses the observation that corresponding feature descriptors are closest to each other and much farther from any other descriptor to reject feature descriptors that likely exist in only one image.

We implemented this test by computing the L2 distance between each pair of MOPS descriptors from images A and B, then keeping only the pairs of descriptors with (1-nearest-neighbor distance) / (2-nearest-neighbor distance) less than some threshold. This process also allowed us to create an initial set of correspondence points between A and B, as indicated by point colors above.

4e. Random Sampling and Consensus (RANSAC)

Homography inliers found by 10k iterations of 4-point RANSAC. Corresponding points have the same color.

Random Sampling and Consensus (RANSAC) can be used to find a robust set of correspondence pairs for homography estimation. Each of many iterations does the following.

Sample a set of 4 correspondences $\{(\vec x_0, \vec x_0^\prime), (\vec x_1, \vec x_1^\prime),(\vec x_2, \vec x_2^\prime), (\vec x_3, \vec x_3^\prime)\}$ between A and B without replacement.

Exactly compute the homography $H$ using these points.

Apply $H$ to the set of all correspondence points returned by step 4c.

Identify the inlier correspondences: $(\vec x_k, \vec x_k^\prime)$ , such that $\|\vec x_k^\prime - H\vec x_k\|_2^2 < \epsilon$ .

Repeat until 1-4 many times and return the largest set of inliers.

Since the inliers returned by RANSAC have no large outliers, applying least-squares to just these inliers will tend to produce a more robust estimate of $H$ .

4f. Result

Stitched image using the homography estimated using the 19 RANSAC inlier correspondence pairs.

5. Sample Auto-Stitched Mosaics

6. Coolest thing I learned

I think the idea that images are really a pencil of rays and that a camera image just captures a subset of it was the coolest thing I learned through this project. Taking photographs and feeding them into my pipeline helped me understand the importance of this premise because I got bad results when I moved even a little between taking photos. I’d be interested in learning if there any ways to easily generalize this abstraction of images to extract depth from images with different centers of projection.

7. Bells & Whistles

4B: Scale-Invariant Corner Detection

Searching for corners at multiple scales allows the Harris detector to find large corners. The large edge is only detected when the original image is shrunk by a factor of 4.

At high resolution, a large corner might not fit inside the Harris detector’s window, causing it to look like an edge rather than a corner. However, if we analyze smaller and smaller versions of this image, eventually this large corner will fit inside the Harris detector’s window, resulting in a corner detection.

So, we can make our corner detector scale invariant by applying the Harris detector to an input image shrunk by increasing powers of two. This method enables us to find both the smallest and largest edges in an image.

4B: Rotationally Invariant Feature Descriptors

We use inverse warping to extract a $40\times40$ image patch centered at each Harris corner and aligned with the smoothed image gradient at that point.

Because the patches are aligned with the image gradient and Harris corners are rotationally invariant, these oriented-features will also be rotationally invariant.

This property is useful because it makes our matching procedure robust to rotations about a camera’s z-axis.

At right are some examples of the procedure, showing how a 40x40 region is extracted from the image then downsampled and normalized to produce a robust MOPS freature descriptor.

4B: Panorama Recognition

In this section, we’ll describe our method for automatically forming groups of images that can be plausibly stitched into a panorama. A drawback of our auto-stitching method described in part 4 is the need to manually specify which images are part of panorama.

Our approach to automatic image grouping relies on the following observation:

💡

Two images should be stitched together only if they have many RANSAC inliers.

As input, our system takes a collection of $N$ images $I_0, I_1, ..., I_{N-1}$ without any information about how they are related.
- For demonstration, we’ll demonstrate our method on a collection with 9 images containing three triplets of images that should be grouped into 3 separate panoramas.

Input: A collection of images without grouping information

Using our section 4 code, we can compute the number of RANSAC inliers between each pair of images $I_j$ and $I_k$ to build a complete weighted graph $G$ with $N$ nodes $0, 1, \dots, N-1$ and $\binom{N}{2}$ edges, where
1. Node $k$ represents image $I_k$ .
1. Edge $(j, k)$ is weighted by the number of inliers between $I_j$ and $I_k$ .

Applying our key observation, we form a graph $G^\prime$ by removing any edges with weight less than 6.

Finally, we run Depth-First Search (DFS) on $G^\prime$ to obtain the connected components $C_0, C_1, \dots, C_m$ of $G^\prime$ .
- The images within each connected component can now be automatically stitched together using our mosaic stitching code from part 3, as shown below.