EPITA 2021 MLRF practice_03-01_ORB_AR v2021-06-02_172455 by Joseph CHAZALON

Creative Commons License This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

Practice 03 part 01: Augmented Reality using ORB matching

We will demonstrate a simple technique, light enough to be run on an old smartphone, which detects an instance of a know document in a video frame, and overlays some dynamic content over this document in the frame.

We will use an excerpt of a dataset we created for a funny little app a few years ago, which allows children to point at a songbook page and play the associated song using a tablet. This is illustrated below.

AR example

This is much like marker-based Augmented Reality (AR), where the marker is a complex image.

This approach requires to prepare of document model prior to matching documents within frames.

We will proceed in 5 steps:

  1. detect keypoints from the model and display them;
  2. compute the descriptors for the model image;
  3. detect the keypoints for a frame image, compute the descriptors and display them;
  4. create matchers and index descriptors;
  5. estimate the homography from the model to the frame;
  6. project a modified image over the document in the frame.

The resources for this session are packaged directly within this notebook's archive: you can access them under the resources/ folder:

0. Import module, load resources

Here is our model image

We need to convert it to grayscale to extract ORB keypoints from it.

Here is the frame we will process

We also need to convert it to grayscale, for the same reason.

1. Detect and draw keypoints (model image)

First, we will detect and display some keypoints using the ORB method.

work **Complete the creation of the ORB object below, setting parameters appropriately.** *Tips:* - ORB keypoint detection and description is performed with the same `ORB` object. - Create this `ORB` object using `cv2.ORB.create(...)`. - You need to select appropriate parameters. The default parameters may not give the best possible results. - In particular, we need: - a few thousands features; - several levels (as the frame object may appear several times smaller than the original model); - to select the Harris score to stabilize the results (`scoreType=cv2.ORB_HARRIS_SCORE`).
work **Now you can detect keypoints from the model image.** *Tips:* - Use the `orb.detect()` method. - Use the graylevel image.
work **Display the keypoints using the function we provide below.**

Expected result:

2. Compute descriptors (model image)

work **Compute the descriptors for each of the keypoints we previously detected.** *Tips:* - Use the `ORB.compute() method.`
work **What is the size (in bytes) of an ORB descriptor?**

TODO you answer here

Storing an ORB descriptor takes ... bytes (without indexing overhead).

PROF

Storing an ORB descriptor takes 32 bytes (without indexing overhead).

2. Detect and compute keypoints from the frame

work **Using the `ORB.detectAndCompute()` method, preform keypoint detection and description in a single step.** *Tips:* - This function requires a mask but we do not need it. Set `mask=None`.

Expected result of draw_keypoints():

work **What are the regions where keypoints are detected?**

TODO you answer here

PROF

Keypoints are detected in textured areas. Uniform areas do not permit to extract any discriminant element.

3. Create a matcher and index model descriptors

A matcher object is used to compare two sets of descriptors.

The relevant OpenCV documentation is available at the DescriptorMatcher documentation page.

Overview

There are two matchers available in OpenCV:

In both cases, we need to specifiy the distance the matcher will use to compare descriptors. There are several built-in norms:

Brute force (BF) matcher

It has only 1 parameter, beside the distance function: crossCheck. It allows to perform a symmetry test, ie to keep only descriptors pairs where each one is the closest to the other one in each set, or more formally: $$ \{ (\hat{d_i},\hat{d_j}) \mid \hat{d_j} = \underset{d_j \in D_2}{\mathrm{argmin}} \operatorname{dist}(\hat{d_i}, d_j) \land \hat{d_i} = \underset{d_i \in D_1}{\mathrm{argmin}} \operatorname{dist}(d_i, \hat{d_j}) \}, $$ otherwise, we get the following set, $\forall d_i \in D_1$: $$ \{ (d_i,\hat{d_j}) \mid \hat{d_j} = \underset{d_j \in D_2}{\mathrm{argmax}} \operatorname{score}(d_i, d_j) \}. $$

We recommend to create a BF matcher using cv2.BFMatcher_create(normType, crossCheck).

FLANN-based matcher

FLANN stands for Fast Library for Approximate Nearest Neighbors.

The FLANN-based matcher is much more complex than the BF one, as it can use multiple indexing strategies (which may or may not be compatible with your descriptor type!) which have, in turn, parameters to be set.

This matcher may be faster when matching a large train collection than the brute force matcher.

A good but old documentation is available for OpenCV 2.4 implementation.

OpenCV supports several indexing algorithms:

This matcher also has search parameters (like whether to sort the results) but there are very little reasons to change the default values.

To create a FLANN-based matcher, we recommend to use the following technique:

# Create a dictionary for indexing parameters:
flann_index_params= dict(algorithm = 6, # LSH
                        table_number = 6, # LSH parameters
                        key_size = 12,
                        multi_probe_level = 1)
# Then create the matcher
matcher = cv2.FlannBasedMatcher(indexParams=flann_index_params)
work **Create a BF matcher and FLANN matcher.**

Hint: keep in mind that your ORB descriptors will be binary.

Indexation

While it is possible to directly call matcher.match(descriptors1, descriptors2), we usually index descriptors before matching them.

This is useful in real conditions for the case we are working on: we have to match each frame against every possible model (there were severa songs available), so this allows to:

  1. perform indexing only once;
  2. handle multiple models and therefore perform object detection (however the pipeline is a bit more complex).

This is performed using the matcher.add(list_of_list_of_descriptors) which adds sets of descriptors for several training (or "model") images.

The index then retains for each single descriptor:

We will therefore distinguish between:

work **Index the descriptors of the model image.** *Tips:* - Use `matcher.add()`. - Do it for both matchers (we will compare them). - **add() takes a list of list of descriptors!**

4. Match descriptors and estimate the homography

We are now ready to match descriptors.

We suggest to use a FLANN-based matcher to be able to perform a ratio-test.

Matching descriptors

Matching descriptors is performed using one of the following functions:

4.1 Simple match

work **Compute the matches between the frame descriptors and the model descriptors using the FLANN matcher.** *Tips:* - Use the `matcher.match()` method to avoid having a list of tuples of 1 element as result.

Display the matches

Here is a simple way to display the matches using cv2.drawMatches(). We could keep only the closest matches, but we will keep this simple for now.

work **Draw those first matches (frame → model) in RED.**

Expected result of draw_matches():

4.2 Symmetry test

Let us now use the BF matcher to ask for a cross check.

work **Compute the matches using a symmetry test and display them in BLUE.**

Expected result of draw_matches():

4.3 Ratio test

Let's stop using the BF matcher now, and use the FLANN matcher for what remains.

work **Compute the matches using the FLANN-based matcher, asking for the 2 nearest neighbors.**

Hint: matches will contain a list of pairs of matches, as opposed to single matches in previous steps.

Match results

The result of matches = matcher.match*(query_descriptors) line is a list of DMatch objects. A DMatch object has following attributes:

work **Filter the matches using a ratio test.** *Tips:* - This means, $\forall \texttt{m},\texttt{n} \in M$ keep $\texttt{m}$ only if $\texttt{m.distance} < \texttt{n.distance} * T$ where $T$ is the ratio test value.
work **Draw those good matches (frame → model) with ratio test in CYAN.**

Expected result of draw_matches():

work **Compare the filtering of the symmetry test and the ratio test: which one rejects more matches?**

TODO write your answer here

PROF

The ratio test filters out more matches. It is cheaper to computer and more reliable.

Recommended by David Lowe.

4.4 Geometric validation

Finally, using the good matches we computed using the ratio test, we can estimate the perspective transform between the model and the frame (in this direction, because we will project a modified model image over the scene/frame).

First we need to build two corresponding lists of point coordinates, for the source and for the destination.

work **Extract the point coordinates of the good matches to build two list of corresponding coordinates in the model and in the frame referentials.** *Tips:* - Recover the index of the keypoints using either `m.trainIdx` or `m.queryIdx`. - Extract the point coordinates from each keypoint using `kpts[INDEX].pt`.

As the RANSAC implementation in OpenCV requires float numbers, we will convert our coordinates.

We are now ready to estimate the homography using RANSAC.

work **Use `cv2.findHomography()` to estimate the homography from the model to the frame.** *Tips:* - The constant to use the RANSAC method is `cv2.RANSAC`. - `3` is a good value for the RANSAC retroprojection error threshold which rejects point pairs if $$ \| \texttt{dstPoints} _i - \texttt{convertPointsHomogeneous} ( \texttt{H} * \texttt{srcPoints} _i) \|_2 > \texttt{ransacReprojThreshold}. $$
work **Filter the good matches to keep only the RANSAC inliers.** *Tips:* - `pts_inliers_mask` indicates which point pairs are inliers.
work **Draw those good inlier matches (frame → model) with ratio test and RANSAC in GREEN.**

Expected result of draw_matches():

5. Simple AR

Finally, we can project some image over the frame.

Model quadrilateral

work **Define an array of shape `(1, 4, 2)` and type `np.float32` to represent the coordinates of the 4 corners of the model.**

 Frame quadrilateral

work **Now use `cv2.perspectiveTransform()` to compute the coordinates of the model corners within the frame referential.**

Draw the object outline

We can now draw the detected object over the frame.

Project a modifier model image on the scene (the frame)

Let us use a very simple modified model image, to indicate we detected it:

work **Use `cv2.warpPerspective()` to project `model_img_modified` onto the frame's referential.** *Tips:* - **Warning:** the `dsize` takes a `tuple(int, int)` in the form `(num_columns, num_rows)`, and **not** `(rows, cols)` as in the shape of a row-major NumPy array!

Expected output:

We need to use a mask to blend this warped image with the original frame.

work **Create a mask with `np.zeros` and fill the right region using `cv2.fillPoly()`.**

Expected output:

work **Finally, overlay the modified image over the frame.**

Expected output:

BONUS Mobile document scanner

Assume you have the four coordinates of the corners of the documents in the frame (they are in frame_quad.squeeze()), and that it is a landscape A4 page (you have its corners in model_quad.squeeze()), create a dewarped (cropped, without perspective) document image.

Said differently: Knowing the model shape, from the coordinates of the object Input

produce the following cropper image: output

Hints:

Extra kudos:

But we have a strong interpolation; we could keep a smaller image, using the shape of the region detected in the frame. We will compute an optimal surface and adjust the homograpy.

We need to introduce a scaling in H.

And now we have an image very close to the frame area.

Not that if the perspective is very low, it is usually better to crop the image (without any perspective correction) to avoid introducing interpolation errors. This makes a sensible difference when running an OCR on the resulting image.