EPITA 2022 MLRF practice_03-01_ORB_AR v2023-05-30_183716 by Joseph CHAZALON
This work is licensed under a Creative Commons Attribution 4.0 International License.
We will demonstrate a simple technique, light enough to be run on an old smartphone, which detects an instance of a know document in a video frame, and overlays some dynamic content over this document in the frame.
We will use an excerpt of a dataset we created for a funny little app a few years ago, which allows children to point at a songbook page and play the associated song using a tablet. This is illustrated below.
This is much like marker-based Augmented Reality (AR), where the marker is a complex image.
This approach requires to prepare of document model prior to matching documents within frames.
We will proceed in 5 steps:
The resources for this session are packaged directly within this notebook's archive: you can access them under the resources/
folder:
model.png
: the model image we will use;frame_0010.jpeg
: a frame image extracted from a video.# deactivate buggy jupyter completion
%config Completer.use_jedi = False
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
import os
cv2.__version__
'4.0.0'
I tested this lab session using OpenCV 4.0.0. Beware of API breaks with version 5!
PATH_TO_RESOURCES = "./resources"
model_img = cv2.imread(os.path.join(PATH_TO_RESOURCES, "model.png"))
model_img.shape, model_img.dtype
((1654, 2340, 3), dtype('uint8'))
# to remain sane
def bgr2rgb(img):
return cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(bgr2rgb(model_img), cmap='gray')
<matplotlib.image.AxesImage at 0x7f4a68fc0d30>
We need to convert it to grayscale to extract ORB keypoints from it.
model_img_gray = cv2.cvtColor(model_img, cv2.COLOR_BGR2GRAY)
plt.imshow(model_img_gray, cmap='gray')
<matplotlib.image.AxesImage at 0x7f4a68ee15f8>
frame_img = cv2.imread(os.path.join(PATH_TO_RESOURCES, "frame_0010.jpeg"))
frame_img.shape, model_img.dtype
((1080, 1920, 3), dtype('uint8'))
plt.imshow(bgr2rgb(frame_img))
<matplotlib.image.AxesImage at 0x7f4a68e0cd68>
We also need to convert it to grayscale, for the same reason.
frame_img_gray = cv2.cvtColor(frame_img, cv2.COLOR_BGR2GRAY)
plt.imshow(frame_img_gray, cmap='gray')
<matplotlib.image.AxesImage at 0x7f4a68d74470>
First, we will detect and display some keypoints using the ORB method.
Complete the creation of the ORB object below, setting parameters appropriately.
Tips:
ORB
object.ORB
object using cv2.ORB.create(...)
.scoreType=cv2.ORB_HARRIS_SCORE
).# Run me!
cv2.ORB.create?
# TODO create the ORB detector and descriptor
# orb = cv2.ORB.create(...) # FIXME
# prof
orb = cv2.ORB.create(nfeatures=2000,
scaleFactor=1.2,
nlevels=10,
edgeThreshold=5,
firstLevel=0,
WTA_K=2,
scoreType=cv2.ORB_HARRIS_SCORE,
patchSize=15)
Now you can detect keypoints from the model image.
Tips:
orb.detect()
method.# TODO detect keypoints
# model_kpts = # FIXME
# len(model_kpts)
#prof
model_kpts = orb.detect(model_img_gray)
len(model_kpts)
2000
Display the keypoints using the function we provide below.
Expected result:
# because the function from OpenCV's python wrapper is buggy
def draw_keypoints(color_image, keypoints, color=(0,255,0)):
'''
Display keypoints in some color over an image.
Parameters
----------
color_image: ndarray, shape=(rows, cols, 3 channels)
color image in BGR order
keypoints: list of cv2.KeyPoint
keypoints detected in the image
color: tuple of uint8 (optional)
color of the keypoints to drawn, in BGR order
'''
if color_image.ndim != 3:
raise ValueError(
"draw_keypoints: parameter `color_image` must be a... (wait for it) color image!")
draw = color_image.copy()
for k in keypoints:
angle = k.angle
class_id = k.class_id
convert = k.convert
octave = k.octave
overlap = k.overlap
pt_x, pt_y = k.pt
pt_int = int(pt_x), int(pt_y)
response = k.response
size = k.size
cv2.circle(draw, pt_int, int(size), color)
pt2 = int(pt_x + np.sin(angle)*size), int(pt_y + np.cos(angle)*size)
cv2.line(draw, pt_int, pt2, color, thickness=2)
plt.imshow(bgr2rgb(draw))
# TODO draw the keypoints detected in the model image
# draw_keypoints(...) # FIXME
# prof
draw_keypoints(model_img, model_kpts)
Compute the descriptors for each of the keypoints we previously detected.
Tips:
ORB.compute() method.
# TODO compute the descriptors
# model_kpts, model_desc = ... # FIXME
# len(model_kpts), model_desc.shape
# prof
model_kpts, model_desc = orb.compute(model_img_gray, model_kpts)
len(model_kpts), model_desc.shape, model_desc.dtype
(2000, (2000, 32), dtype('uint8'))
What is the size (in bytes) of an ORB descriptor?
TODO you answer here
Storing an ORB descriptor takes ... bytes (without indexing overhead).
PROF
Storing an ORB descriptor takes 32 bytes (without indexing overhead).
Using the ORB.detectAndCompute()
method, preform keypoint detection and description in a single step.
Tips:
mask=None
.Expected result of draw_keypoints()
:
# TODO detect keypoints and compute descriptors for the frame
# frame_kpts, frame_descr = # FIXME
# len(frame_kpts), frame_descr.shape
# prof
frame_kpts, frame_descr = orb.detectAndCompute(frame_img_gray, mask=None)
len(frame_kpts), frame_descr.shape
(2000, (2000, 32))
# Run me!
draw_keypoints(frame_img, frame_kpts)
What are the regions where keypoints are detected?
TODO you answer here
PROF
Keypoints are detected in textured areas. Uniform areas do not permit to extract any discriminant element.
A matcher object is used to compare two sets of descriptors.
The relevant OpenCV documentation is available at the DescriptorMatcher documentation page.
There are two matchers available in OpenCV:
In both cases, we need to specifiy the distance the matcher will use to compare descriptors. There are several built-in norms:
NORM_HAMMING
, but in the calculation, each two bits of the input sequence will be added and treated as a single bit to be used in the same calculation as NORM_HAMMING
(only useful if you set the WTA_K
parameter of ORB to something else than 2
).It has only 1 parameter, beside the distance function: crossCheck
. It allows to perform a symmetry test, i.e. to keep only descriptors pairs where each one is the closest to the other one in each set, or more formally:
$$
\{
(\hat{d_i},\hat{d_j}) \mid
\hat{d_j} = \underset{d_j \in D_2}{\mathrm{argmin}} \operatorname{dist}(\hat{d_i}, d_j)
\land
\hat{d_i} = \underset{d_i \in D_1}{\mathrm{argmin}} \operatorname{dist}(d_i, \hat{d_j})
\},
$$
otherwise, we get the following set, $\forall d_i \in D_1$:
$$
\{
(d_i,\hat{d_j}) \mid
\hat{d_j} = \underset{d_j \in D_2}{\mathrm{argmax}} \operatorname{score}(d_i, d_j)
\}.
$$
We recommend to create a BF matcher using cv2.BFMatcher_create(normType, crossCheck)
.
FLANN stands for Fast Library for Approximate Nearest Neighbors.
The FLANN-based matcher is much more complex than the BF one, as it can use multiple indexing strategies (which may or may not be compatible with your descriptor type!) which have, in turn, parameters to be set.
This matcher may be faster when matching a large train collection than the brute force matcher.
A good but old documentation is available for OpenCV 2.4 implementation.
OpenCV supports several indexing algorithms:
Linear: the index will perform a linear, brute-force search.
KD-Trees: the index constructed will consist of a set of randomized kd-trees which will be searched in parallel.
trees
The number of parallel kd-trees to use. Good values are in the range [1..16]
.K-Means: the index constructed will be a hierarchical k-means tree.
and parameters for K-Means initialization and computation.
Composite: the index created combines the randomized kd-trees and the hierarchical k-means tree.
KD-Tree: the index is contructed using a single KD-tree.
Hierarchical clustering: Documentation is missing, but it seems to be a classical hierarchical clustering.
LSH (Locality Sensitive Hashing): the index created uses multi-probe LSH.
table_number
: the number of hash tables to use (between 10 and 30 usually).key_size
: the size of the hash key in bits (between 10 and 20 usually).multi_probe_level
: the number of bits to shift to check for neighboring buckets (0 is regular LSH, 2 is recommended).table_number = 6
key_size = 12
multi_probe_level = 1
and even more…
This matcher also has search parameters (like whether to sort the results) but there are very little reasons to change the default values.
To create a FLANN-based matcher, we recommend to use the following technique:
# Create a dictionary for indexing parameters:
flann_index_params= dict(algorithm = 6, # LSH
table_number = 6, # LSH parameters
key_size = 12,
multi_probe_level = 1)
# Then create the matcher
matcher = cv2.FlannBasedMatcher(indexParams=flann_index_params)
Create a BF matcher and FLANN matcher.
Hint: keep in mind that your ORB descriptors will be binary.
# TODO
# matcher_BF = ...
# matcher_FLANN = ...
# prof
# BF
matcher_BF = cv2.BFMatcher_create(normType=cv2.NORM_HAMMING, crossCheck=True)
# FLANN
# Create a dictionary for indexing parameters:
flann_index_params = dict(algorithm=6, # LSH
table_number=6, # LSH parameters
key_size=12,
multi_probe_level=1)
# Then create the matcher
matcher_FLANN = cv2.FlannBasedMatcher(indexParams=flann_index_params)
While it is possible to directly call matcher.match(descriptors1, descriptors2)
,
we usually index descriptors before matching them.
This is useful in real conditions for the case we are working on: we have to match each frame against every possible model (there were severa songs available), so this allows to:
This is performed using the matcher.add(list_of_list_of_descriptors)
which adds sets of descriptors for several training (or "model") images.
The index then retains for each single descriptor:
We will therefore distinguish between:
Index the descriptors of the model image.
Tips:
matcher.add()
.# TODO
# matcher_BF.add(...)
# matcher_FLANN.add(...)
# prof
matcher_BF.add([model_desc])
matcher_FLANN.add([model_desc])
We are now ready to match descriptors.
We suggest to use a FLANN-based matcher to be able to perform a ratio-test.
Matching descriptors is performed using one of the following functions:
matcher.match(query_descriptors)
: actually is knnMatch
, with k=1
.matcher.knnMatch(query_descriptors, k=...)
:
Performs a K-nearest neighbor search for a given query point using the index.k
> 1.k
> 1 is not possible with BF matcher when crossCheck
is True
!matcher.radiusMatch(query_descriptors, maxDistance=...)
:
Performs a radius nearest neighbor search for a given query point, ie returning only results within the specified radius).Compute the matches between the frame descriptors and the model descriptors using the FLANN matcher.
Tips:
matcher.match()
method to avoid having a list of tuples of 1 element as result.# TODO compute the matches
# matches = ...
# len(matches)
# prof
matches = matcher_FLANN.match(frame_descr)
len(matches)
2000
Here is a simple way to display the matches using cv2.drawMatches()
.
We could keep only the closest matches, but we will keep this simple for now.
# Run me
def draw_matches(img1, kpts1, img2, kpts2, matches, color=(0,0,255), title=""):
'''img1 and img2 are color images.'''
img_matches = np.empty((max(img1.shape[0], img2.shape[0]),
img1.shape[1]+img2.shape[1],
3),
dtype=np.uint8)
img_matches = cv2.drawMatches(img1, kpts1, img2, kpts2,
matches,
img_matches,
matchColor=color,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.figure(figsize=(12,4))
plt.imshow(bgr2rgb(img_matches))
plt.title(title + " - %d matches" % (len(matches),))
Draw those first matches (frame → model) in RED.
Expected result of draw_matches()
:
# TODO draw the first matches
# draw_matches(...)
# prof
draw_matches(model_img, model_kpts,
frame_img, frame_kpts,
matches,
color=(0,0,255),
title="Frame → Model")
Let us now use the BF matcher to ask for a cross check.
Compute the matches using a symmetry test and display them in BLUE.
Expected result of draw_matches()
:
# TODO
# matches = ...
# draw_matches(...)
# prof
# bruteforce version with crosscheck: fewer matches!
matches = matcher_BF.match(frame_descr, model_desc)
draw_matches(model_img, model_kpts,
frame_img, frame_kpts,
matches,
color=(255,0,0),
title="Frame ⇔ Model")
Let's stop using the BF matcher now, and use the FLANN matcher for what remains.
Compute the matches using the FLANN-based matcher, asking for the 2 nearest neighbors.
Hint: matches
will contain a list of pairs of matches, as opposed to single matches in previous steps.
# TODO
# matches = ...
# prof
matches = matcher_FLANN.knnMatch(frame_descr, k=2)
len(matches), len(matches[0])
(2000, 2)
The result of matches = matcher.match*(query_descriptors)
line is a list of DMatch
objects.
A DMatch
object has following attributes:
DMatch.distance
: Distance between descriptors. The lower, the better it is.DMatch.trainIdx
: Index of the descriptor in train descriptorsDMatch.queryIdx
: Index of the descriptor in query descriptorsDMatch.imgIdx
: Index of the train image.Filter the matches using a ratio test.
Tips:
# TODO filter matches
# good_matches = ...
# len(good_matches)
# prof
RATIO_TEST_VALUE = 0.75
good_matches = [m1 for m1, m2 in matches if m1.distance < m2.distance * RATIO_TEST_VALUE]
# FIXME newer versions of OpenCV may return only 1 element in the match!?!? Need to check len(...)
len(good_matches)
202
Draw those good matches (frame → model) with ratio test in CYAN.
Expected result of draw_matches()
:
# TODO
# draw_matches(...)
# prof
draw_matches(model_img, model_kpts,
frame_img, frame_kpts,
good_matches,
color=(255,255,0),
title="Frame → Model (ratio test)")
Compare the filtering of the symmetry test and the ratio test: which one rejects more matches?
TODO write your answer here
PROF
The ratio test filters out more matches. It is cheaper to computer and more reliable.
Recommended by David Lowe.
Finally, using the good matches we computed using the ratio test, we can estimate the perspective transform between the model and the frame (in this direction, because we will project a modified model image over the scene/frame).
First we need to build two corresponding lists of point coordinates, for the source and for the destination.
Extract the point coordinates of the good matches to build two list of corresponding coordinates in the model and in the frame referentials.
Tips:
m.trainIdx
or m.queryIdx
.kpts[INDEX].pt
.# prof
pts_mdl = []
pts_frame = []
# for m in good_matches:
# # TODO
len(pts_mdl), len(pts_frame)
(0, 0)
# prof
pts_mdl = []
pts_frame = []
for m in good_matches:
pts_mdl.append(model_kpts[m.trainIdx].pt)
pts_frame.append(frame_kpts[m.queryIdx].pt)
len(pts_mdl), len(pts_frame)
(202, 202)
As the RANSAC implementation in OpenCV requires float numbers, we will convert our coordinates.
# Run me
pts_mdl, pts_frame = np.float32(pts_mdl), np.float32(pts_frame)
We are now ready to estimate the homography using RANSAC.
Use cv2.findHomography()
to estimate the homography from the model to the frame.
Tips:
cv2.RANSAC
.3
is a good value for the RANSAC retroprojection error threshold which rejects point pairs ifcv2.findHomography?
# TODO
# H, pts_inliers_mask = cv2.findHomography(...)
# H
# Note: there probably are keypoint duplicates (same coordinates) at different octaves
H, pts_inliers_mask = cv2.findHomography(pts_mdl, pts_frame, cv2.RANSAC, 3.0)
H
array([[ 4.12308437e-01, -5.34564698e-02, 3.95275174e+02], [ 1.53889198e-02, 3.13182834e-01, 2.09917143e+02], [ 3.69411999e-05, -7.92524624e-05, 1.00000000e+00]])
# prof
# sanity check: we usually want at least 15 inliers for the homography to be trustworthly
np.count_nonzero(pts_inliers_mask)
129
Filter the good matches to keep only the RANSAC inliers.
Tips:
pts_inliers_mask
indicates which point pairs are inliers.# TODO
# matches_ransac_inliers = ...
# len(matches_ransac_inliers)
# prof
matches_ransac_inliers = [gm for gm, ok in zip(good_matches, pts_inliers_mask) if ok == 1]
len(matches_ransac_inliers)
129
Draw those good inlier matches (frame → model) with ratio test and RANSAC in GREEN.
Expected result of draw_matches()
:
# TODO
# draw_matches(...)
# prof
draw_matches(model_img, model_kpts,
frame_img, frame_kpts,
matches_ransac_inliers,
color=(0,255,0),
title="Frame → Model (ratio test + RANSAC)")
Finally, we can project some image over the frame.
Define an array of shape (1, 4, 2)
and type np.float32
to represent the coordinates of the 4 corners of the model.
# TODO
# model_quad = np.float32([[[0, 0],
# ...]])
# prof
model_quad = np.float32([[[0, 0],
[model_img.shape[1]-1, 0],
[model_img.shape[1]-1, model_img.shape[0]-1],
[0, model_img.shape[0]-1]]])
model_quad
array([[[ 0., 0.], [2339., 0.], [2339., 1653.], [ 0., 1653.]]], dtype=float32)
Now use cv2.perspectiveTransform()
to compute the coordinates of the model corners within the frame referential.
# TODO
# frame_quad = cv2.perspectiveTransform(...)
# frame_quad
# prof
frame_quad = cv2.perspectiveTransform(model_quad, H)
frame_quad
array([[[ 395.27518, 209.91714], [1251.5259 , 226.35364], [1330.6464 , 799.2486 ], [ 353.1797 , 837.29803]]], dtype=float32)
dbg_img = frame_img.copy()
cv2.polylines(dbg_img, np.int32(frame_quad), True, (0, 255, 0), 10)
plt.imshow(bgr2rgb(dbg_img))
<matplotlib.image.AxesImage at 0x7f7860b9de10>
# prof
# extra illustration for the lecture
draw_matches(model_img, model_kpts,
dbg_img, frame_kpts,
matches_ransac_inliers,
color=(0,255,0),
title="Frame → Model (ratio test + RANSAC)")
Let us use a very simple modified model image, to indicate we detected it:
model_img_modified = np.uint8(model_img * (1,1,0))
plt.imshow(bgr2rgb(model_img_modified))
<matplotlib.image.AxesImage at 0x7f785c02e4e0>
Use cv2.warpPerspective()
to project model_img_modified
onto the frame's referential.
Tips:
dsize
takes a tuple(int, int)
in the form (num_columns, num_rows)
, and not (rows, cols)
as in the shape of a row-major NumPy array!Expected output:
cv2.warpPerspective?
# TODO
# warped_img = cv2.warpPerspective(...)
# plt.imshow(bgr2rgb(warped_img))
# prof
warped_img = cv2.warpPerspective(model_img_modified,
H,
(frame_img.shape[1], frame_img.shape[0])) # xy coord, not shape!!!
plt.imshow(bgr2rgb(warped_img))
<matplotlib.image.AxesImage at 0x7f7854f854a8>
We need to use a mask to blend this warped image with the original frame.
Create a mask with np.zeros
and fill the right region using cv2.fillPoly()
.
Expected output:
cv2.fillPoly?
# TODO
# warped_img_msk= np.zeros(...)
# warped_img_msk = cv2.fillPoly(...)
# plt.imshow(warped_img_msk)
# prof
warped_img_msk= np.zeros(frame_img.shape[:2], dtype=np.uint8)
warped_img_msk = cv2.fillPoly(warped_img_msk, np.int32(frame_quad), 255)
plt.imshow(warped_img_msk)
<matplotlib.image.AxesImage at 0x7f7854f67358>
Finally, overlay the modified image over the frame.
Expected output:
# TODO
# prof
frame_ar = frame_img.copy()
frame_ar[warped_img_msk>0] = warped_img[warped_img_msk>0]
plt.imshow(bgr2rgb(frame_ar))
<matplotlib.image.AxesImage at 0x7f7854ec0eb8>
Assume you have the four coordinates of the corners of the documents in the frame (they are in frame_quad.squeeze()
), and that it is a landscape A4 page (you have its corners in model_quad.squeeze()
), create a dewarped (cropped, without perspective) document image.
Said differently: Knowing the model shape, from the coordinates of the object
produce the following cropped image:
Hints:
cv2.getPerspectiveTransform
Extra kudos:
H_inv = cv2.getPerspectiveTransform(frame_quad.squeeze(), model_quad.squeeze())
H_inv
array([[ 2.53803668e+00, 1.70295108e-01, -1.03897076e+03], [-5.87479569e-02, 3.06044481e+00, -6.19218226e+02], [-9.84140923e-05, 2.36256993e-04, 1.00000000e+00]])
# More fun: invert H (previously computed)
# NOTE: it may be more stable to recompute the homography and asking for the invert
# NOTE2: usually we find the corners then we dewarp the image (no model, we just know the document aspect ratio),
# so we use cv2.getPerspectiveTransform(from_4_points, to_4_points)
H_inv = np.linalg.inv(H)
H_inv /= H_inv[2,2] # because we know H_inv[2,2] should be equal to 1
H_inv
array([[ 2.53803661e+00, 1.70295061e-01, -1.03897072e+03], [-5.87479464e-02, 3.06044461e+00, -6.19218185e+02], [-9.84140372e-05, 2.36256867e-04, 1.00000000e+00]])
frame_dewarped = cv2.warpPerspective(frame_img,
H_inv,
(model_img.shape[1], model_img.shape[0])) # xy coord, not shape!!!
plt.imshow(bgr2rgb(frame_dewarped))
<matplotlib.image.AxesImage at 0x7f7854ea0dd8>
But we have a strong interpolation; we could keep a smaller image, using the shape of the region detected in the frame. We will compute an optimal surface and adjust the homograpy.
# destination base size
dst_size = np.int32(np.ptp(frame_quad.squeeze(), axis=0))
dst_size
array([977, 627], dtype=int32)
# model aspect ratio
model_ar = model_img.shape[1] / model_img.shape[0]
model_ar
1.414752116082225
# take a mean surface with the same AR
dst_size[0] = (dst_size[0] + dst_size[1]*model_ar) // 2
dst_size[1] = dst_size[0] // model_ar
dst_size, dst_size[0] / dst_size[1]
(array([932, 658], dtype=int32), 1.4164133738601823)
We need to introduce a scaling in H.
scale_factor = dst_size[1] / model_img.shape[0] # yes, rowcol vs xy coordinates…
scale_factor
0.3978234582829504
# For the lazy or when in doubt with homographies…
H_scaling = cv2.getPerspectiveTransform(
np.float32([[0.,0.], [0.,1.], [1.,1.], [1.,0.]]), # base in frame's referential
np.float32([[0.,0.], [0.,scale_factor], [scale_factor,scale_factor], [scale_factor,0]])) # in target's ref
H_scaling
array([[0.39782345, 0. , 0. ], [0. , 0.39782345, 0. ], [0. , 0. , 1. ]])
H_scaling.dot(H_inv) # apply H_inv then H_scaling
array([[ 1.00969049e+00, 6.77473691e-02, -4.13326918e+02], [-2.33713109e-02, 1.21751664e+00, -2.46339516e+02], [-9.84140372e-05, 2.36256867e-04, 1.00000000e+00]])
H_scaling @ H_inv # @ is matrix multiplication
array([[ 1.00969049e+00, 6.77473691e-02, -4.13326918e+02], [-2.33713109e-02, 1.21751664e+00, -2.46339516e+02], [-9.84140372e-05, 2.36256867e-04, 1.00000000e+00]])
frame_dewarped = cv2.warpPerspective(frame_img,
H_scaling @ H_inv,
tuple(dst_size)) # xy coord, not shape!!!
plt.imshow(bgr2rgb(frame_dewarped))
<matplotlib.image.AxesImage at 0x7f7854e11978>
And now we have an image very close to the frame area.
Not that if the perspective is very low, it is usually better to crop the image (without any perspective correction) to avoid introducing interpolation errors. This makes a sensible difference when running an OCR on the resulting image.