🏛️

CS180 Project 4: Image Warping and Mosaicing & Feature Matching for Autostitching

Author: Nicolas Rault-Wang (nraultwang at berkeley.edu)

Credit to Notion for this template.

1. Recovering a Homography

We begin by explaining how we estimate the homography transformation matrix HH to map AA into the geometry of BB.

Two images AA and BB with the same center of projection are related by a homography. To estimate this transform, we first need to label a set of N4N\geq 4 feature correspondence points (xj,yj)(xj,yj)(x_j, y_j) \leftrightarrow (x_j^\prime, y_j^\prime) between both images.

Note that NN must be at least 4 because homographies have 8 degrees of freedom due to scale invariance andeach pair of correspondences provides two constraints on HH.

1a. Establishing Point Correspondences

The figure below shows some example pairs of correspondence points. Well-defined edges are excellent features to correspond because they are easy to identify in both images.

1b. Estimating the Homography

While a minimum of 4 correspondence pairs are required to solve for HH, this system is very sensitive to noise. To produce a more robust homography, we use more than 4 correspondence pairs to form an over-determined system, then use least-squares to obtain an estimate of the free parameters of HH.

Working in homogeneous coordinates, the derivation of two equations constraining the free parameters of HH from an arbitrary correspondence pair (xj,yj)(xj,yj)(x_j, y_j) \leftrightarrow (x_j^\prime, y_j^\prime) is straightforward:

[abcdefgh1][xjyj1]=w[xjyj1]\begin{bmatrix}a&b&c\\d&e&f\\g&h&1\end{bmatrix}\begin{bmatrix}x_j\\y_j\\1\end{bmatrix} = w \begin{bmatrix}x_j^\prime\\y_j^\prime\\1\end{bmatrix}
{axj+byj+c=wxjdxj+eyj+f=wyjgxj+hyj+1=w    {axj+byj+c=(gxj+hyj+1)xjdxj+eyj+f=(gxj+hyj+1)yj\begin{cases}ax_j + by_j + c &= wx_j^\prime\\dx_j + ey_j + f &= wy_j^\prime\\gx_j + hy_j + 1 &= w\end{cases} \implies \begin{cases}ax_j + by_j + c &= (gx_j + hy_j + 1) x_j^\prime \\ dx_j + ey_j + f &= (gx_j + hy_j + 1) y_j^\prime \end{cases}
{axj+byj+c+0d+0e+0fgxjxjhyjxj=xj0a+0b+0c+dxj+eyj+fgxjyjhyjyj=yj\begin{cases} ax_j + by_j + c +0d+0e+0f- gx_j x_j^\prime - hy_j x_j^\prime = x_j^\prime \\ 0a + 0b + 0c +dx_j + ey_j + f - gx_j y_j^\prime - hy_j y_j^\prime = y_j^\prime \end{cases}

Stacking all 2N2N equations into an 2N×82N\times 8 feature matrix XX and 2N×12N\times 1 target vector y\vec y gives

[x1y11000x1x1y1x1000x1y11x1y1y1y1xjyj1000xjxjyjxj000xjyj1xjyjyjyjxNyN1000xNxNyNxN000xNyN1xNyNyNyN]Xθ=[x1y1xjyjxNyN]y\underbrace{\begin{bmatrix}x_1&y_1&1&0&0&0&-x_1x_1^\prime&-y_1x_1^\prime \\0&0&0&x_1&y_1&1&-x_1y_1^\prime&-y_1y_1^\prime\\&&&&\vdots&&&&\\x_j&y_j&1&0&0&0&-x_jx_j^\prime&-y_jx_j^\prime \\0&0&0&x_j&y_j&1&-x_jy_j^\prime&-y_jy_j^\prime\\&&&&\vdots&&&&\\x_N&y_N&1&0&0&0&-x_Nx_N^\prime&-y_Nx_N^\prime \\0&0&0&x_N&y_N&1&-x_Ny_N^\prime&-y_Ny_N^\prime \end{bmatrix}}_{X} \vec \theta= \underbrace{\begin{bmatrix} x_1^\prime \\ y_1^\prime\\\vdots\\x_j^\prime \\ y_j^\prime\\\vdots\\x_N^\prime \\ y_N^\prime\end{bmatrix}}_{\vec y}

where we have unrolled HH into θ=[abcdefgh]T\vec \theta = \begin{bmatrix}a&b&c&d&e&f&g&h\end{bmatrix}^T.

For N4N \geq 4 well-chosen correspondence points (xj,yj)(xj,yj)(x_j, y_j) \leftrightarrow (x_j^\prime, y_j^\prime), the columns of XX are linearly independent, making XX have full column rank. Hence, the least-squares estimate of θ\vec\theta, and thus HH, is uniquely given by

θ=(XTX)1XTy\vec\theta^* = (X^TX)^{-1}X^T\vec y

2. Warping Images with a Homography

In all following sections, we’ll use the term “canvas” to refer to an xyxy-axis aligned bounding box that is large enough to contain all points in the image of the homography transform.

2a. Warping One Image

With the homography HH relating AA and BB, we map AA into BB with an inverse warping procedure:

  1. Determine the minimum dimensions of the canvas by warping the corners of AA into BB.
  1. For every canvas pixel (i,j)(i^\prime, j^\prime) in the quadrilateral formed from the warped corners of AA, compute the point (x,y)(x,y) in the geometry of AA by inverting the homography.
    [xy1]=H1[ij1]\begin{bmatrix}x \\y\\1\end{bmatrix} = H^{-1}\begin{bmatrix}i^\prime \\j^\prime\\1\end{bmatrix}
  1. In general, (x,y)(x,y) will not lie exactly on a grid point in AA, so we interpolate the RGBA color vector C(i,j)C(i^\prime, j^\prime) at (x,y)(x,y) from the grid points of AA. Our code uses scipy.interpolate.RegularGridInterpolator to do this.
  1. After completing steps 2 and 3, we have the color value C(i,j)C(i^\prime, j^\prime) at each canvas point (i,j)(i^\prime, j^\prime), completing the process of inverse-warping AA to the canvas with BB, producing AA^\prime in the geometry of BB.
  1. To avoid edge-artifacts, we use a 2-level laplacian pyramid to smoothly blend AA^\prime and BB.
    1. We found that a mask created from the Euclidean distance transform on AA^\prime and BB works very well. Our code uses scipy.ndimage.distance_transform_edt to do this.
The canvas is computed by transforming the corners of AA with the homography HH.
AA^\prime: the result of warping AA into the common canvas via the homography HH.

2b. Image Rectification

An application of homography warping is undoing (rectifying) the perspective distortion of rectangular objects in images. This is theoretically possible because two images of a planar rectangular-looking object, like a tile or poster, are related by a homography.

Below, we show some examples of this operation. The image on the left is the original and the image on the right is the result of rectification.

Rectification of the rectangular windows of the building.

Rectification of the bathroom floor tiles.
Rectification of the signs.

3. Creating Panoramas

We can extend our one-image warping procedure to create panoramas. Since two images with the same center of projection are related by a homography, we can form a panorama by warping a set of images with the same center of projection into a particular image in this set. This can be implemented by recursively applying the the one-image warp to grow a panorama one image at a time.

The Wall
View from Doe
Etcheverry Alley
Doe at Night
VLSB at Sunset

4. Auto Stitching Panoramas

We’ll follow the paper “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al. to automates the process of finding correspondence points between two images.

4a. Harris Interest Point Detector

Step 1. Compute Harris interest points

The Harris corner detector uses a fixed-size window to search 7 image scales for interest point detections. This makes the corner detection process scale-invariant. See the Bells and Whistles section below for more details.

4b. Adaptive Non-Maximal Suppression (ANMS)

Step 2. Use Adaptive Non-Maximal Suppression (ANMS) to filter all but 750 interest points.

For each Harris point xix_i, ANMS first computes the minimum “suppression radius” rir_i between xix_i and every xkx_k with much greater corner response, satisfying f(xi)<0.9f(xj)f(x_i) < 0.9 f(x_j). Then, ANMS sorts all points by rir_i and returns the top NN points with largest suppression radii. The points that make it through this filter have strong corner responses and are spread evenly over the image.

4c. Create MOPS Feature Detectors

Next, we extract a 40x40 pixel region around each corner point remaining after ANMS. See the Bells and Whistles section below for more details about how this feature is extracted.

4d. Lowe Outlier Rejection

Inliers according to Lowe outlier rejection. Corresponding points have the same color.

The Lowe outlier rejection uses the observation that corresponding feature descriptors are closest to each other and much farther from any other descriptor to reject feature descriptors that likely exist in only one image.

We implemented this test by computing the L2 distance between each pair of MOPS descriptors from images A and B, then keeping only the pairs of descriptors with (1-nearest-neighbor distance) / (2-nearest-neighbor distance) less than some threshold. This process also allowed us to create an initial set of correspondence points between A and B, as indicated by point colors above.

4e. Random Sampling and Consensus (RANSAC)

Homography inliers found by 10k iterations of 4-point RANSAC. Corresponding points have the same color.

Random Sampling and Consensus (RANSAC) can be used to find a robust set of correspondence pairs for homography estimation. Each of many iterations does the following.

  1. Sample a set of 4 correspondences {(x0,x0),(x1,x1),(x2,x2),(x3,x3)}\{(\vec x_0, \vec x_0^\prime), (\vec x_1, \vec x_1^\prime),(\vec x_2, \vec x_2^\prime), (\vec x_3, \vec x_3^\prime)\} between A and B without replacement.
  1. Exactly compute the homography HH using these points.
  1. Apply HH to the set of all correspondence points returned by step 4c.
  1. Identify the inlier correspondences: (xk,xk)(\vec x_k, \vec x_k^\prime), such that xkHxk22<ϵ\|\vec x_k^\prime - H\vec x_k\|_2^2 < \epsilon.
  1. Repeat until 1-4 many times and return the largest set of inliers.

Since the inliers returned by RANSAC have no large outliers, applying least-squares to just these inliers will tend to produce a more robust estimate of HH.

4f. Result

Stitched image using the homography estimated using the 19 RANSAC inlier correspondence pairs.

5. Sample Auto-Stitched Mosaics

Evening Glade
Doe at Night
Eucalyptus Grove
Blue Building

6. Coolest thing I learned

I think the idea that images are really a pencil of rays and that a camera image just captures a subset of it was the coolest thing I learned through this project. Taking photographs and feeding them into my pipeline helped me understand the importance of this premise because I got bad results when I moved even a little between taking photos. I’d be interested in learning if there any ways to easily generalize this abstraction of images to extract depth from images with different centers of projection.

7. Bells & Whistles

4B: Scale-Invariant Corner Detection

Searching for corners at multiple scales allows the Harris detector to find large corners. The large edge is only detected when the original image is shrunk by a factor of 4.

At high resolution, a large corner might not fit inside the Harris detector’s window, causing it to look like an edge rather than a corner. However, if we analyze smaller and smaller versions of this image, eventually this large corner will fit inside the Harris detector’s window, resulting in a corner detection.

So, we can make our corner detector scale invariant by applying the Harris detector to an input image shrunk by increasing powers of two. This method enables us to find both the smallest and largest edges in an image.

4B: Rotationally Invariant Feature Descriptors

  • We use inverse warping to extract a 40×4040\times40 image patch centered at each Harris corner and aligned with the smoothed image gradient at that point.
  • Because the patches are aligned with the image gradient and Harris corners are rotationally invariant, these oriented-features will also be rotationally invariant.
  • This property is useful because it makes our matching procedure robust to rotations about a camera’s z-axis.
  • At right are some examples of the procedure, showing how a 40x40 region is extracted from the image then downsampled and normalized to produce a robust MOPS freature descriptor.

4B: Panorama Recognition

In this section, we’ll describe our method for automatically forming groups of images that can be plausibly stitched into a panorama. A drawback of our auto-stitching method described in part 4 is the need to manually specify which images are part of panorama.

Our approach to automatic image grouping relies on the following observation:

💡

Two images should be stitched together only if they have many RANSAC inliers.

  • As input, our system takes a collection of NN images I0,I1,...,IN1I_0, I_1, ..., I_{N-1} without any information about how they are related.
    • For demonstration, we’ll demonstrate our method on a collection with 9 images containing three triplets of images that should be grouped into 3 separate panoramas.

Input: A collection of images without grouping information
  1. Using our section 4 code, we can compute the number of RANSAC inliers between each pair of images IjI_j and IkI_k to build a complete weighted graph GG with NN nodes 0,1,,N10, 1, \dots, N-1 and (N2)\binom{N}{2} edges, where
    1. Node kk represents image IkI_k.
    1. Edge (j,k)(j, k) is weighted by the number of inliers between IjI_j and IkI_k.
  1. Applying our key observation, we form a graph GG^\prime by removing any edges with weight less than 6.
  1. Finally, we run Depth-First Search (DFS) on GG^\prime to obtain the connected components C0,C1,,CmC_0, C_1, \dots, C_m of GG^\prime.
    • The images within each connected component can now be automatically stitched together using our mosaic stitching code from part 3, as shown below.
Step 1: Compute the adjacency matrix of GG
Step 2: Prune edges of GG with weight less than 6 to obtain GG^\prime. Step 3: Run DFS on GG^\prime to recover the connected components.

8. Original Images

Part 4A

The Wall

Doe at Night

View from Doe

VLSB at Sunset

Etcheverry Alley

Part 4B

Evening Glade

Eucalyptus Grove

Blue Building

Doe @ Night