EPITA 2022 MLRF practice_03-01_ORB_AR v2023-05-30_183716 by Joseph CHAZALON

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Practice 03 part 01: Augmented Reality using ORB matching¶

We will demonstrate a simple technique, light enough to be run on an old smartphone, which detects an instance of a know document in a video frame, and overlays some dynamic content over this document in the frame.

We will use an excerpt of a dataset we created for a funny little app a few years ago, which allows children to point at a songbook page and play the associated song using a tablet. This is illustrated below.

AR example

This is much like marker-based Augmented Reality (AR), where the marker is a complex image.

This approach requires to prepare of document model prior to matching documents within frames.

We will proceed in 5 steps:

  1. detect keypoints from the model and display them;
  2. compute the descriptors for the model image;
  3. detect the keypoints for a frame image, compute the descriptors and display them;
  4. create matchers and index descriptors;
  5. estimate the homography from the model to the frame;
  6. project a modified image over the document in the frame.

The resources for this session are packaged directly within this notebook's archive: you can access them under the resources/ folder:

  • model.png: the model image we will use;
  • frame_0010.jpeg: a frame image extracted from a video.

0. Import module, load resources¶

In [1]:
# deactivate buggy jupyter completion
%config Completer.use_jedi = False
In [2]:
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import os
In [29]:
cv2.__version__
Out[29]:
'4.0.0'

I tested this lab session using OpenCV 4.0.0. Beware of API breaks with version 5!

In [3]:
PATH_TO_RESOURCES = "./resources"
In [4]:
model_img = cv2.imread(os.path.join(PATH_TO_RESOURCES, "model.png"))
model_img.shape, model_img.dtype
Out[4]:
((1654, 2340, 3), dtype('uint8'))
In [5]:
# to remain sane
def bgr2rgb(img):
    return cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

Here is our model image¶

In [6]:
plt.imshow(bgr2rgb(model_img), cmap='gray')
Out[6]:
<matplotlib.image.AxesImage at 0x7f4a68fc0d30>

We need to convert it to grayscale to extract ORB keypoints from it.

In [7]:
model_img_gray = cv2.cvtColor(model_img, cv2.COLOR_BGR2GRAY)
In [8]:
plt.imshow(model_img_gray, cmap='gray')
Out[8]:
<matplotlib.image.AxesImage at 0x7f4a68ee15f8>

Here is the frame we will process¶

In [9]:
frame_img = cv2.imread(os.path.join(PATH_TO_RESOURCES, "frame_0010.jpeg"))
frame_img.shape, model_img.dtype
Out[9]:
((1080, 1920, 3), dtype('uint8'))
In [13]:
plt.imshow(bgr2rgb(frame_img))
Out[13]:
<matplotlib.image.AxesImage at 0x7f4a68e0cd68>

We also need to convert it to grayscale, for the same reason.

In [14]:
frame_img_gray = cv2.cvtColor(frame_img, cv2.COLOR_BGR2GRAY)
In [15]:
plt.imshow(frame_img_gray, cmap='gray')
Out[15]:
<matplotlib.image.AxesImage at 0x7f4a68d74470>

1. Detect and draw keypoints (model image)¶

First, we will detect and display some keypoints using the ORB method.

work

Complete the creation of the ORB object below, setting parameters appropriately.

Tips:

  • ORB keypoint detection and description is performed with the same ORB object.
  • Create this ORB object using cv2.ORB.create(...).
  • You need to select appropriate parameters. The default parameters may not give the best possible results.
  • In particular, we need:
    • a few thousands features;
    • several levels (as the frame object may appear several times smaller than in the original model);
    • to select the Harris score to stabilize the results (scoreType=cv2.ORB_HARRIS_SCORE).
In [16]:
# Run me!
cv2.ORB.create?
In [17]:
# TODO create the ORB detector and descriptor
# orb = cv2.ORB.create(...) # FIXME
In [18]:
# prof
orb = cv2.ORB.create(nfeatures=2000,
                     scaleFactor=1.2,
                     nlevels=10,
                     edgeThreshold=5,
                     firstLevel=0,
                     WTA_K=2,
                     scoreType=cv2.ORB_HARRIS_SCORE,
                     patchSize=15)
work

Now you can detect keypoints from the model image.

Tips:

  • Use the orb.detect() method.
  • Use the graylevel image.
In [19]:
# TODO detect keypoints
# model_kpts = # FIXME
# len(model_kpts)
In [20]:
#prof
model_kpts = orb.detect(model_img_gray)
len(model_kpts)
Out[20]:
2000
work

Display the keypoints using the function we provide below.

Expected result:

In [21]:
# because the function from OpenCV's python wrapper is buggy
def draw_keypoints(color_image, keypoints, color=(0,255,0)):
    '''
    Display keypoints in some color over an image.

    Parameters
    ----------
    color_image: ndarray, shape=(rows, cols, 3 channels)
        color image in BGR order

    keypoints: list of cv2.KeyPoint
        keypoints detected in the image

    color: tuple of uint8 (optional)
        color of the keypoints to drawn, in BGR order
    '''
    if color_image.ndim != 3:
        raise ValueError(
            "draw_keypoints: parameter `color_image` must be a... (wait for it) color image!")
    draw = color_image.copy()
    for k in keypoints:
        angle = k.angle
        class_id = k.class_id
        convert = k.convert
        octave = k.octave
        overlap = k.overlap
        pt_x, pt_y = k.pt
        pt_int = int(pt_x), int(pt_y)
        response = k.response
        size = k.size
        cv2.circle(draw, pt_int, int(size), color)
        pt2 = int(pt_x + np.sin(angle)*size), int(pt_y + np.cos(angle)*size)
        cv2.line(draw, pt_int, pt2, color, thickness=2)
    plt.imshow(bgr2rgb(draw))
In [22]:
# TODO draw the keypoints detected in the model image
# draw_keypoints(...) # FIXME
In [23]:
# prof
draw_keypoints(model_img, model_kpts)

2. Compute descriptors (model image)¶

work

Compute the descriptors for each of the keypoints we previously detected.

Tips:

  • Use the ORB.compute() method.
In [24]:
# TODO compute the descriptors
# model_kpts, model_desc = ... # FIXME
# len(model_kpts), model_desc.shape
In [25]:
# prof
model_kpts, model_desc = orb.compute(model_img_gray, model_kpts)
len(model_kpts), model_desc.shape, model_desc.dtype
Out[25]:
(2000, (2000, 32), dtype('uint8'))
work

What is the size (in bytes) of an ORB descriptor?

TODO you answer here

Storing an ORB descriptor takes ... bytes (without indexing overhead).

PROF

Storing an ORB descriptor takes 32 bytes (without indexing overhead).

2. Detect and compute keypoints from the frame¶

work

Using the ORB.detectAndCompute() method, preform keypoint detection and description in a single step.

Tips:

  • This function requires a mask but we do not need it. Set mask=None.

Expected result of draw_keypoints():

In [26]:
# TODO detect keypoints and compute descriptors for the frame
# frame_kpts, frame_descr = # FIXME
# len(frame_kpts), frame_descr.shape
In [27]:
# prof
frame_kpts, frame_descr = orb.detectAndCompute(frame_img_gray, mask=None)
len(frame_kpts), frame_descr.shape
Out[27]:
(2000, (2000, 32))
In [28]:
# Run me!
draw_keypoints(frame_img, frame_kpts)
work

What are the regions where keypoints are detected?

TODO you answer here

PROF

Keypoints are detected in textured areas. Uniform areas do not permit to extract any discriminant element.

3. Create a matcher and index model descriptors¶

A matcher object is used to compare two sets of descriptors.

The relevant OpenCV documentation is available at the DescriptorMatcher documentation page.

Overview¶

There are two matchers available in OpenCV:

  • the brute force matcher, which performs a $m \times n$ match of each of the $m$ descriptors in the first set against each of the $n$ descriptors in the second set;
  • the FLANN-based matcher, which performs fast approximate nearest neighbor search using an indexing structure.

In both cases, we need to specifiy the distance the matcher will use to compare descriptors. There are several built-in norms:

  • cv2.NORM_INF: $\|X-Y\|_{L_{\infty}} = \max _i | X_i - Y_i|$ where $V_i$ is the $i$-th component of vector $V$.
  • cv2.NORM_L1: $\| X-Y \| _{L_1} = \sum _i | X_i - Y_i|$
  • cv2.NORM_L2: $\| X-Y \| _{L_2} = \sqrt{\sum_i (X_i - Y_i)^2}$
  • cv2.NORM_L2SQR: $\| X-Y \| _{L_{2S}} = \sum_i (X_i - Y_i)^2$
  • cv2.NORM_HAMMING: Calculates the Hamming distance (count the non-zero bits in $X_i \mathbin{\&} Y_i$, where $\mathbin{\&}$ is a bit-wise "and") between the arrays.
  • cv2.NORM_HAMMING2: Similar to NORM_HAMMING, but in the calculation, each two bits of the input sequence will be added and treated as a single bit to be used in the same calculation as NORM_HAMMING (only useful if you set the WTA_K parameter of ORB to something else than 2).

Brute force (BF) matcher¶

It has only 1 parameter, beside the distance function: crossCheck. It allows to perform a symmetry test, i.e. to keep only descriptors pairs where each one is the closest to the other one in each set, or more formally: $$ \{ (\hat{d_i},\hat{d_j}) \mid \hat{d_j} = \underset{d_j \in D_2}{\mathrm{argmin}} \operatorname{dist}(\hat{d_i}, d_j) \land \hat{d_i} = \underset{d_i \in D_1}{\mathrm{argmin}} \operatorname{dist}(d_i, \hat{d_j}) \}, $$ otherwise, we get the following set, $\forall d_i \in D_1$: $$ \{ (d_i,\hat{d_j}) \mid \hat{d_j} = \underset{d_j \in D_2}{\mathrm{argmax}} \operatorname{score}(d_i, d_j) \}. $$

We recommend to create a BF matcher using cv2.BFMatcher_create(normType, crossCheck).

FLANN-based matcher¶

FLANN stands for Fast Library for Approximate Nearest Neighbors.

The FLANN-based matcher is much more complex than the BF one, as it can use multiple indexing strategies (which may or may not be compatible with your descriptor type!) which have, in turn, parameters to be set.

This matcher may be faster when matching a large train collection than the brute force matcher.

A good but old documentation is available for OpenCV 2.4 implementation.

OpenCV supports several indexing algorithms:

  • Linear: the index will perform a linear, brute-force search.

    • algorithm code: FLANN_INDEX_LINEAR = 0
    • no extra parameters
  • KD-Trees: the index constructed will consist of a set of randomized kd-trees which will be searched in parallel.

    • algorithm code: FLANN_INDEX_KDTREE = 1
    • extra parameters: trees The number of parallel kd-trees to use. Good values are in the range [1..16].
  • K-Means: the index constructed will be a hierarchical k-means tree.

    • algorithm code: FLANN_INDEX_KMEANS = 2
    • extra parameters include the branching factor to use for the hierarchical k-means tree,

    and parameters for K-Means initialization and computation.

  • Composite: the index created combines the randomized kd-trees and the hierarchical k-means tree.

    • algorithm code: FLANN_INDEX_COMPOSITE = 3
    • extra parameters include both previous parameters sets.
  • KD-Tree: the index is contructed using a single KD-tree.

    • algorithm code: FLANN_INDEX_KDTREE_SINGLE = 4
    • extra parameters: None, apparently
  • Hierarchical clustering: Documentation is missing, but it seems to be a classical hierarchical clustering.

    • algorithm code: FLANN_INDEX_HIERARCHICAL = 5
    • extra parameters: unclear
  • LSH (Locality Sensitive Hashing): the index created uses multi-probe LSH.

    • algorithm code: FLANN_INDEX_LSH = 6
    • This indexing algorithm is compatible with ORB's binary descriptors!
    • extra parameters:
      • table_number: the number of hash tables to use (between 10 and 30 usually).
      • key_size: the size of the hash key in bits (between 10 and 20 usually).
      • multi_probe_level: the number of bits to shift to check for neighboring buckets (0 is regular LSH, 2 is recommended).
    • Also, this tutorial suggests to use the following parameters:
      • table_number = 6
      • key_size = 12
      • multi_probe_level = 1
  • and even more…

This matcher also has search parameters (like whether to sort the results) but there are very little reasons to change the default values.

To create a FLANN-based matcher, we recommend to use the following technique:

# Create a dictionary for indexing parameters:
flann_index_params= dict(algorithm = 6, # LSH
                        table_number = 6, # LSH parameters
                        key_size = 12,
                        multi_probe_level = 1)
# Then create the matcher
matcher = cv2.FlannBasedMatcher(indexParams=flann_index_params)
work

Create a BF matcher and FLANN matcher.

Hint: keep in mind that your ORB descriptors will be binary.

In [26]:
# TODO
# matcher_BF = ...
# matcher_FLANN = ...
In [27]:
# prof

# BF
matcher_BF = cv2.BFMatcher_create(normType=cv2.NORM_HAMMING, crossCheck=True)

# FLANN
# Create a dictionary for indexing parameters:
flann_index_params = dict(algorithm=6,  # LSH
                          table_number=6,  # LSH parameters
                          key_size=12,
                          multi_probe_level=1)
# Then create the matcher
matcher_FLANN = cv2.FlannBasedMatcher(indexParams=flann_index_params)

Indexation¶

While it is possible to directly call matcher.match(descriptors1, descriptors2), we usually index descriptors before matching them.

This is useful in real conditions for the case we are working on: we have to match each frame against every possible model (there were severa songs available), so this allows to:

  1. perform indexing only once;
  2. handle multiple models and therefore perform object detection (however the pipeline is a bit more complex).

This is performed using the matcher.add(list_of_list_of_descriptors) which adds sets of descriptors for several training (or "model") images.

The index then retains for each single descriptor:

  • its value (indexed);
  • the id of the training image.

We will therefore distinguish between:

  • train descriptors, provided upon training;
  • query descriptors, provided upon matching.
work

Index the descriptors of the model image.

Tips:

  • Use matcher.add().
  • Do it for both matchers (we will compare them).
  • add() takes a list of list of descriptors!
In [28]:
# TODO
# matcher_BF.add(...)
# matcher_FLANN.add(...)
In [29]:
# prof
matcher_BF.add([model_desc])
matcher_FLANN.add([model_desc])

4. Match descriptors and estimate the homography¶

We are now ready to match descriptors.

We suggest to use a FLANN-based matcher to be able to perform a ratio-test.

Matching descriptors¶

Matching descriptors is performed using one of the following functions:

  • matcher.match(query_descriptors): actually is knnMatch, with k=1.
  • matcher.knnMatch(query_descriptors, k=...): Performs a K-nearest neighbor search for a given query point using the index.
    • Without symmetry test, this will return a match for each query descriptor as long as there is at least one descriptor in the train set.
    • Useful for ratio test with k > 1.
    • Note that k > 1 is not possible with BF matcher when crossCheck is True!
  • matcher.radiusMatch(query_descriptors, maxDistance=...): Performs a radius nearest neighbor search for a given query point, ie returning only results within the specified radius).
    • Useful with when we have a background model which allows us to set a threshold.

4.1 Simple match¶

work

Compute the matches between the frame descriptors and the model descriptors using the FLANN matcher.

Tips:

  • Use the matcher.match() method to avoid having a list of tuples of 1 element as result.
In [30]:
# TODO compute the matches
# matches = ...
# len(matches)
In [31]:
# prof
matches = matcher_FLANN.match(frame_descr)
len(matches)
Out[31]:
2000

Display the matches¶

Here is a simple way to display the matches using cv2.drawMatches(). We could keep only the closest matches, but we will keep this simple for now.

In [32]:
# Run me
def draw_matches(img1, kpts1, img2, kpts2, matches, color=(0,0,255), title=""):
    '''img1 and img2 are color images.'''
    img_matches = np.empty((max(img1.shape[0], img2.shape[0]),
                           img1.shape[1]+img2.shape[1], 
                           3), 
                           dtype=np.uint8)
    img_matches = cv2.drawMatches(img1, kpts1, img2, kpts2, 
                          matches, 
                          img_matches,
                          matchColor=color,
                          flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
    plt.figure(figsize=(12,4))
    plt.imshow(bgr2rgb(img_matches))
    plt.title(title + " - %d matches" % (len(matches),))
work

Draw those first matches (frame → model) in RED.

Expected result of draw_matches():

In [33]:
# TODO draw the first matches
# draw_matches(...)
In [34]:
# prof
draw_matches(model_img, model_kpts, 
             frame_img, frame_kpts, 
             matches, 
             color=(0,0,255), 
             title="Frame → Model")

4.2 Symmetry test¶

Let us now use the BF matcher to ask for a cross check.

work

Compute the matches using a symmetry test and display them in BLUE.

Expected result of draw_matches():

In [35]:
# TODO
# matches = ...
# draw_matches(...)
In [36]:
# prof
# bruteforce version with crosscheck: fewer matches!
matches = matcher_BF.match(frame_descr, model_desc)

draw_matches(model_img, model_kpts, 
             frame_img, frame_kpts, 
             matches, 
             color=(255,0,0), 
             title="Frame ⇔ Model")

4.3 Ratio test¶

Let's stop using the BF matcher now, and use the FLANN matcher for what remains.

work

Compute the matches using the FLANN-based matcher, asking for the 2 nearest neighbors.

Hint: matches will contain a list of pairs of matches, as opposed to single matches in previous steps.

In [37]:
# TODO
# matches = ...
In [38]:
# prof
matches = matcher_FLANN.knnMatch(frame_descr, k=2)
len(matches), len(matches[0])
Out[38]:
(2000, 2)

Match results¶

The result of matches = matcher.match*(query_descriptors) line is a list of DMatch objects. A DMatch object has following attributes:

  • DMatch.distance: Distance between descriptors. The lower, the better it is.
  • DMatch.trainIdx: Index of the descriptor in train descriptors
  • DMatch.queryIdx: Index of the descriptor in query descriptors
  • DMatch.imgIdx: Index of the train image.
work

Filter the matches using a ratio test.

Tips:

  • This means, $\forall \texttt{m},\texttt{n} \in M$ keep $\texttt{m}$ only if $\texttt{m.distance} < \texttt{n.distance} * T$ where $T$ is the ratio test value.
In [39]:
# TODO filter matches
# good_matches = ...
# len(good_matches)
In [40]:
# prof
RATIO_TEST_VALUE = 0.75
good_matches = [m1 for m1, m2 in matches if m1.distance < m2.distance * RATIO_TEST_VALUE]
# FIXME newer versions of OpenCV may return only 1 element in the match!?!? Need to check len(...)
len(good_matches)
Out[40]:
202
work

Draw those good matches (frame → model) with ratio test in CYAN.

Expected result of draw_matches():

In [41]:
# TODO
# draw_matches(...)
In [42]:
# prof
draw_matches(model_img, model_kpts, 
             frame_img, frame_kpts, 
             good_matches, 
             color=(255,255,0), 
             title="Frame → Model (ratio test)")
work

Compare the filtering of the symmetry test and the ratio test: which one rejects more matches?

TODO write your answer here

PROF

The ratio test filters out more matches. It is cheaper to computer and more reliable.

Recommended by David Lowe.

4.4 Geometric validation¶

Finally, using the good matches we computed using the ratio test, we can estimate the perspective transform between the model and the frame (in this direction, because we will project a modified model image over the scene/frame).

First we need to build two corresponding lists of point coordinates, for the source and for the destination.

work

Extract the point coordinates of the good matches to build two list of corresponding coordinates in the model and in the frame referentials.

Tips:

  • Recover the index of the keypoints using either m.trainIdx or m.queryIdx.
  • Extract the point coordinates from each keypoint using kpts[INDEX].pt.
In [43]:
# prof
pts_mdl = []
pts_frame = []
# for m in good_matches:
#     # TODO
len(pts_mdl), len(pts_frame)
Out[43]:
(0, 0)
In [44]:
# prof
pts_mdl = []
pts_frame = []
for m in good_matches:
    pts_mdl.append(model_kpts[m.trainIdx].pt)
    pts_frame.append(frame_kpts[m.queryIdx].pt)
len(pts_mdl), len(pts_frame)
Out[44]:
(202, 202)

As the RANSAC implementation in OpenCV requires float numbers, we will convert our coordinates.

In [45]:
# Run me
pts_mdl, pts_frame = np.float32(pts_mdl), np.float32(pts_frame)

We are now ready to estimate the homography using RANSAC.

work

Use cv2.findHomography() to estimate the homography from the model to the frame.

Tips:

  • The constant to use the RANSAC method is cv2.RANSAC.
  • 3 is a good value for the RANSAC retroprojection error threshold which rejects point pairs if
$$ \| \texttt{dstPoints} _i - \texttt{convertPointsHomogeneous} ( \texttt{H} * \texttt{srcPoints} _i) \|_2 > \texttt{ransacReprojThreshold}. $$
In [46]:
cv2.findHomography?
In [47]:
# TODO
# H, pts_inliers_mask = cv2.findHomography(...)
# H
In [48]:
# Note: there probably are keypoint duplicates (same coordinates) at different octaves
H, pts_inliers_mask = cv2.findHomography(pts_mdl, pts_frame, cv2.RANSAC, 3.0)
H
Out[48]:
array([[ 4.12308437e-01, -5.34564698e-02,  3.95275174e+02],
       [ 1.53889198e-02,  3.13182834e-01,  2.09917143e+02],
       [ 3.69411999e-05, -7.92524624e-05,  1.00000000e+00]])
In [49]:
# prof
# sanity check: we usually want at least 15 inliers for the homography to be trustworthly
np.count_nonzero(pts_inliers_mask)
Out[49]:
129
work

Filter the good matches to keep only the RANSAC inliers.

Tips:

  • pts_inliers_mask indicates which point pairs are inliers.
In [50]:
# TODO 
# matches_ransac_inliers = ...
# len(matches_ransac_inliers)
In [51]:
# prof
matches_ransac_inliers = [gm for gm, ok in zip(good_matches, pts_inliers_mask) if ok == 1]
len(matches_ransac_inliers)
Out[51]:
129
work

Draw those good inlier matches (frame → model) with ratio test and RANSAC in GREEN.

Expected result of draw_matches():

In [52]:
# TODO
# draw_matches(...)
In [53]:
# prof
draw_matches(model_img, model_kpts, 
             frame_img, frame_kpts, 
             matches_ransac_inliers, 
             color=(0,255,0), 
             title="Frame → Model (ratio test + RANSAC)")

5. Simple AR¶

Finally, we can project some image over the frame.

Model quadrilateral¶

work

Define an array of shape (1, 4, 2) and type np.float32 to represent the coordinates of the 4 corners of the model.

In [54]:
# TODO
# model_quad = np.float32([[[0, 0],
#                          ...]])
In [55]:
# prof
model_quad = np.float32([[[0, 0],
                         [model_img.shape[1]-1, 0],
                         [model_img.shape[1]-1, model_img.shape[0]-1],
                         [0, model_img.shape[0]-1]]])
model_quad
Out[55]:
array([[[   0.,    0.],
        [2339.,    0.],
        [2339., 1653.],
        [   0., 1653.]]], dtype=float32)

Frame quadrilateral¶

work

Now use cv2.perspectiveTransform() to compute the coordinates of the model corners within the frame referential.

In [56]:
# TODO
# frame_quad = cv2.perspectiveTransform(...)
# frame_quad
In [57]:
# prof
frame_quad = cv2.perspectiveTransform(model_quad, H)
frame_quad
Out[57]:
array([[[ 395.27518,  209.91714],
        [1251.5259 ,  226.35364],
        [1330.6464 ,  799.2486 ],
        [ 353.1797 ,  837.29803]]], dtype=float32)

Draw the object outline¶

We can now draw the detected object over the frame.

Expected result:

In [58]:
dbg_img = frame_img.copy()
cv2.polylines(dbg_img, np.int32(frame_quad), True, (0, 255, 0), 10)
plt.imshow(bgr2rgb(dbg_img))
Out[58]:
<matplotlib.image.AxesImage at 0x7f7860b9de10>
In [79]:
# prof
# extra illustration for the lecture
draw_matches(model_img, model_kpts, 
             dbg_img, frame_kpts, 
             matches_ransac_inliers, 
             color=(0,255,0), 
             title="Frame → Model (ratio test + RANSAC)")

Project a modifier model image on the scene (the frame)¶

Let us use a very simple modified model image, to indicate we detected it:

In [59]:
model_img_modified = np.uint8(model_img * (1,1,0))
plt.imshow(bgr2rgb(model_img_modified))
Out[59]:
<matplotlib.image.AxesImage at 0x7f785c02e4e0>
work

Use cv2.warpPerspective() to project model_img_modified onto the frame's referential.

Tips:

  • Warning: the dsize takes a tuple(int, int) in the form (num_columns, num_rows), and not (rows, cols) as in the shape of a row-major NumPy array!

Expected output:

In [60]:
cv2.warpPerspective?
In [61]:
# TODO
# warped_img = cv2.warpPerspective(...)
# plt.imshow(bgr2rgb(warped_img))
In [62]:
# prof
warped_img = cv2.warpPerspective(model_img_modified, 
                                 H,
                                 (frame_img.shape[1], frame_img.shape[0])) # xy coord, not shape!!!
plt.imshow(bgr2rgb(warped_img))
Out[62]:
<matplotlib.image.AxesImage at 0x7f7854f854a8>

We need to use a mask to blend this warped image with the original frame.

work

Create a mask with np.zeros and fill the right region using cv2.fillPoly().

Expected output:

In [63]:
cv2.fillPoly?
In [64]:
# TODO
# warped_img_msk= np.zeros(...)
# warped_img_msk = cv2.fillPoly(...)
# plt.imshow(warped_img_msk)
In [65]:
# prof
warped_img_msk= np.zeros(frame_img.shape[:2], dtype=np.uint8)
warped_img_msk = cv2.fillPoly(warped_img_msk, np.int32(frame_quad), 255)
plt.imshow(warped_img_msk)
Out[65]:
<matplotlib.image.AxesImage at 0x7f7854f67358>
work

Finally, overlay the modified image over the frame.

Expected output:

In [66]:
# TODO
In [67]:
# prof
frame_ar = frame_img.copy()
frame_ar[warped_img_msk>0] = warped_img[warped_img_msk>0]
plt.imshow(bgr2rgb(frame_ar))
Out[67]:
<matplotlib.image.AxesImage at 0x7f7854ec0eb8>
In [ ]:
 

BONUS Mobile document scanner¶

Assume you have the four coordinates of the corners of the documents in the frame (they are in frame_quad.squeeze()), and that it is a landscape A4 page (you have its corners in model_quad.squeeze()), create a dewarped (cropped, without perspective) document image.

Said differently: Knowing the model shape, from the coordinates of the object Input

produce the following cropped image: output

Hints:

  • use cv2.getPerspectiveTransform

Extra kudos:

  • adjust the size of the output image depending on the area on the frame.
In [68]:
H_inv = cv2.getPerspectiveTransform(frame_quad.squeeze(), model_quad.squeeze())
H_inv
Out[68]:
array([[ 2.53803668e+00,  1.70295108e-01, -1.03897076e+03],
       [-5.87479569e-02,  3.06044481e+00, -6.19218226e+02],
       [-9.84140923e-05,  2.36256993e-04,  1.00000000e+00]])
In [69]:
# More fun: invert H (previously computed)
# NOTE: it may be more stable to recompute the homography and asking for the invert
# NOTE2: usually we find the corners then we dewarp the image (no model, we just know the document aspect ratio),
#        so we use cv2.getPerspectiveTransform(from_4_points, to_4_points)

H_inv = np.linalg.inv(H)
H_inv /= H_inv[2,2]  # because we know H_inv[2,2] should be equal to 1
H_inv
Out[69]:
array([[ 2.53803661e+00,  1.70295061e-01, -1.03897072e+03],
       [-5.87479464e-02,  3.06044461e+00, -6.19218185e+02],
       [-9.84140372e-05,  2.36256867e-04,  1.00000000e+00]])
In [70]:
frame_dewarped = cv2.warpPerspective(frame_img, 
                                 H_inv,
                                 (model_img.shape[1], model_img.shape[0])) # xy coord, not shape!!!
plt.imshow(bgr2rgb(frame_dewarped))
Out[70]:
<matplotlib.image.AxesImage at 0x7f7854ea0dd8>

But we have a strong interpolation; we could keep a smaller image, using the shape of the region detected in the frame. We will compute an optimal surface and adjust the homograpy.

In [71]:
# destination base size
dst_size = np.int32(np.ptp(frame_quad.squeeze(), axis=0))
dst_size
Out[71]:
array([977, 627], dtype=int32)
In [72]:
# model aspect ratio
model_ar = model_img.shape[1] / model_img.shape[0]
model_ar
Out[72]:
1.414752116082225
In [73]:
# take a mean surface with the same AR
dst_size[0] = (dst_size[0] + dst_size[1]*model_ar) // 2
dst_size[1] = dst_size[0] // model_ar
dst_size, dst_size[0] / dst_size[1]
Out[73]:
(array([932, 658], dtype=int32), 1.4164133738601823)

We need to introduce a scaling in H.

In [74]:
scale_factor = dst_size[1] / model_img.shape[0]  # yes, rowcol vs xy coordinates…
scale_factor
Out[74]:
0.3978234582829504
In [75]:
# For the lazy or when in doubt with homographies…
H_scaling = cv2.getPerspectiveTransform(
    np.float32([[0.,0.], [0.,1.], [1.,1.], [1.,0.]]), # base in frame's referential
    np.float32([[0.,0.], [0.,scale_factor], [scale_factor,scale_factor], [scale_factor,0]])) # in target's ref
H_scaling
Out[75]:
array([[0.39782345, 0.        , 0.        ],
       [0.        , 0.39782345, 0.        ],
       [0.        , 0.        , 1.        ]])
In [76]:
H_scaling.dot(H_inv)  # apply H_inv then H_scaling
Out[76]:
array([[ 1.00969049e+00,  6.77473691e-02, -4.13326918e+02],
       [-2.33713109e-02,  1.21751664e+00, -2.46339516e+02],
       [-9.84140372e-05,  2.36256867e-04,  1.00000000e+00]])
In [77]:
H_scaling @ H_inv  # @ is matrix multiplication
Out[77]:
array([[ 1.00969049e+00,  6.77473691e-02, -4.13326918e+02],
       [-2.33713109e-02,  1.21751664e+00, -2.46339516e+02],
       [-9.84140372e-05,  2.36256867e-04,  1.00000000e+00]])
In [78]:
frame_dewarped = cv2.warpPerspective(frame_img, 
                                 H_scaling @ H_inv,
                                 tuple(dst_size)) # xy coord, not shape!!!
plt.imshow(bgr2rgb(frame_dewarped))
Out[78]:
<matplotlib.image.AxesImage at 0x7f7854e11978>

And now we have an image very close to the frame area.

Not that if the perspective is very low, it is usually better to crop the image (without any perspective correction) to avoid introducing interpolation errors. This makes a sensible difference when running an OCR on the resulting image.