EPITA 2021 MLRF practice_01-04_twinit-part1 v2021-05-17_160644 by Joseph CHAZALON
Make sure you read and understand everything, and complete all the required actions.
Required actions are preceded by the following sign:
Perform a couple checks…
# deactivate buggy jupyter completion
%config Completer.use_jedi = False
Import all the modules we are going to use.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import cv2
import skimage
Twin it! is a poster game with many "bubbles". They are all different but a few pairs. The goal is to find the pairs.
All artwork is copyrighted by the original author, Thomas Vuarchex.
We won't tell you, at first, how many bubble they are nor how many matching pairs they are.
Here is a downsampled (compressed — don't use it) version of the original poster.
To get started, we will use the simplest available solution:
But first, you need the original image. It is available in the twin_it.tar.gz
file, at the following path: "twin_it/twin_it_200dpi.png"
.
We also provided a downsampled version: "twin_it/twin_it_50dpi.png"
.
You may want to resize your base image using:
cv2.resize
functionskimage.transform.resize
function.# TODO read and display image
# prof
full_img = skimage.io.imread("res/twin_it/twin_it_50dpi.png")
plt.figure(figsize=(15, 15))
plt.imshow(full_img)
plt.show()
full_img.shape, full_img.dtype
((2068, 1442, 3), dtype('uint8'))
You can manually select a template using NumPy slicing, as you already know.
# TODO select a pattern to use as template and display it
# prof
template = full_img[255:290,730:805,:]
plt.figure()
plt.imshow(template)
plt.show()
template.shape
(35, 75, 3)
We will look for areas in the image which look like the current template. As we previously said, we will use a very basic technique which is based on a simple difference computation:
We will use OpenCV matchTemplate()
function because it provides more variants than the match_template()
equivalent of scikit-image — which has a better padding management though…
You way want to have a look at those two tutorials:
The available methods for OpenCV's matchTemplate()
are ($R$ is the result image, $I$ is the base image, $T$ is the template, and $(r,c)$ are coordinates):
cv2.TM_SQDIFF
: the sum of the squared differences between pixels values
$$R(r,c) = \sum\limits_{r',c'}(T(r',c') - I(r+r', c+c'))^2$$cv2.TM_SQDIFF_NORMED
: the normalized sum of the squared difference between pixels
$$R(r,c) = \frac{\sum\limits_{r',c'}(T(r',c') - I(r+r', c+c'))^2}{\sqrt{\sum\limits_{r',c'}(T(r',c'))^2 \cdot \sum\limits_{r',c'}I(r+r', c+c')^2}}$$cv2.TM_CCORR
: the cross correlation between pixel values
$$R(r,c)= \sum _{r',c'} (T(r',c') \cdot I(r+r',c+c'))$$cv2.TM_CCORR_NORMED
: the normalized cross correlation between pixel values
$$R(r,c)= \frac{\sum_{r',c'} (T(r',c') \cdot I(r+r',c+c'))}{\sqrt{\sum_{r',c'}T(r',c')^2 \cdot \sum_{r',c'} I(r+r',c+c')^2}}$$cv2.TM_CCOEFF
: the correlation coefficient between pixels values
$$R(r,c)= \sum _{r',c'} (T'(r',c') \cdot I'(r+r',c+c'))$$
where
$$\begin{array}{l} T'(r',c')=T(r',c') - 1/(w \cdot h) \cdot \sum _{r'',c''} T(r'',c'') \\ I'(r+r',c+c')=I(r+r',c+c') - 1/(w \cdot h) \cdot \sum _{r'',c''} I(r+r'',c+c'') \end{array}$$cv2.TM_CCOEFF_NORMED
: the normalized correlation coefficient between pixels values
$$R(r,c)= \frac{ \sum_{r',c'} (T'(r',c') \cdot I'(r+r',c+c')) }{ \sqrt{\sum_{r',c'}T'(r',c')^2 \cdot \sum_{r',c'} I'(r+x',c+c')^2} }$$We will briefly explain those equations during the session.
For now, you just have to note two points.
Let us now compare these methods.
# TODO compute and display the matching "maps"
tm_methods = [
(cv2.TM_SQDIFF, "SQDIFF"),
(cv2.TM_SQDIFF_NORMED, "SQDIFF_NORMED"),
(cv2.TM_CCORR, "CCORR"),
(cv2.TM_CCORR_NORMED, "CCORR_NORMED"),
(cv2.TM_CCOEFF, "CCOEFF"),
(cv2.TM_CCOEFF_NORMED, "CCOEFF_NORMED")]
plt.figure(figsize=(15,15))
plt.subplot(1, 1+len(tm_methods), 1)
plt.imshow(full_img)
plt.axis("off")
plt.title("Base image")
for ii, (method, name) in enumerate(tm_methods):
res = full_img # FIXME use cv2.matchTemplate
plt.subplot(1, 1+len(tm_methods), 2+ii)
plt.imshow(res)
plt.axis("off")
plt.title(name)
plt.show()
#prof
tm_methods = [
(cv2.TM_SQDIFF, "SQDIFF"),
(cv2.TM_SQDIFF_NORMED, "SQDIFF_NORMED"),
(cv2.TM_CCORR, "CCORR"),
(cv2.TM_CCORR_NORMED, "CCORR_NORMED"),
(cv2.TM_CCOEFF, "CCOEFF"),
(cv2.TM_CCOEFF_NORMED, "CCOEFF_NORMED")]
plt.figure(figsize=(15,15))
plt.subplot(1, 1+len(tm_methods), 1)
plt.imshow(full_img)
plt.axis("off")
plt.title("Base image")
for ii, (method, name) in enumerate(tm_methods):
res = cv2.matchTemplate(full_img, template, method)
plt.subplot(1, 1+len(tm_methods), 2+ii)
plt.imshow(res)
plt.axis("off")
plt.title(name)
plt.show()
We now need to locate area with the best match. We can use two methods to get the coordinates of such pixel:
minMaxLoc()
which returns minVal, maxVal, minLoc, maxLoc
;argmax
or argmin
(according to the method we use) and unravel_index
which will convert the flat index return by argxxx
into the appropriate value.We will need to remove the region around the original patch to avoid find the exact same patch!
# TODO display cropped matches and position in maps
# change this to activate region removal
blank_patch_region = False
# coordinates of the area to erase
r_top = 0 # FIXME
c_left = 0 # FIXME
r_bottom = 1000 # FIXME
c_right = 1000 # FIXME
tm_methods = [
# method, name, lower_is_better
(cv2.TM_SQDIFF, "SQDIFF", True),
(cv2.TM_SQDIFF_NORMED, "SQDIFF_NORMED", True),
(cv2.TM_CCORR, "CCORR", False),
(cv2.TM_CCORR_NORMED, "CCORR_NORMED", False),
(cv2.TM_CCOEFF, "CCOEFF", False),
(cv2.TM_CCOEFF_NORMED, "CCOEFF_NORMED", False)]
# Opt. blank the region around the original patch
full_img_2 = None
if blank_patch_region:
raise NotImplementedError("You still need to implement this!")
# save the image!
full_img_2 = None # TODO
# erase the region
# TODO
# Display the original area, to control
plt.figure()
plt.imshow(full_img[:,:,:]) # FIXME
plt.title("Removed region")
plt.show()
else:
full_img_2 = full_img
plt.figure(figsize=(15,10))
for ii, (method, name, lower_is_better) in enumerate(tm_methods):
# compute matching map
res = np.zeros(full_img_2.shape[:2]) # FIXME use cv2.matchTemplate
# retrieve the location of the best value
# FIXME use cv2.minMaxLoc
bestVal, bestLoc = np.NaN, (0, 0)
# Display patch
ax = plt.subplot(3, len(tm_methods), 1+ii)
bestLoc_c, bestLoc_r = bestLoc # the order is reversed, this is terribly stupid
endLoc_r, endLoc_c = np.array((bestLoc_r, bestLoc_c)) + template.shape[:2]
patch = full_img[bestLoc_r:endLoc_r, bestLoc_c:endLoc_c, ...]
plt.imshow(patch)
plt.axis("off")
plt.title(name)
# Display region in map
plt.subplot(3, len(tm_methods), 1+ii+len(tm_methods))
plt.imshow(res)
plt.plot((bestLoc_c+endLoc_c)//2, (bestLoc_r+endLoc_r)//2,
'o', markeredgecolor='r', markerfacecolor='none', markersize=10)
plt.axis("off")
plt.title("%0.3f" % bestVal)
# Display a zoomed-out region
ax = plt.subplot(3, len(tm_methods), 1+ii+len(tm_methods)*2)
zoom_margin = 100
zoom = full_img[max(0, bestLoc_r-zoom_margin):min(endLoc_r+zoom_margin, full_img.shape[0]),
max(0, bestLoc_c-zoom_margin):min(endLoc_c+zoom_margin, full_img.shape[1]),
...]
plt.imshow(zoom)
plt.axis("off")
plt.title("BTL: %d:%d" % bestLoc) # BTL = Best Top Left
plt.show()
#prof
blank_patch_region = True
# coordinates of the area to erase
r_top = 220
c_left = 700
r_bottom = 300
c_right = 850
tm_methods = [
# method, name, lower_is_better
(cv2.TM_SQDIFF, "SQDIFF", True),
(cv2.TM_SQDIFF_NORMED, "SQDIFF_NORMED", True),
(cv2.TM_CCORR, "CCORR", False),
(cv2.TM_CCORR_NORMED, "CCORR_NORMED", False),
(cv2.TM_CCOEFF, "CCOEFF", False),
(cv2.TM_CCOEFF_NORMED, "CCOEFF_NORMED", False)]
# Opt. blank the region around the original patch
full_img_2 = None
if blank_patch_region:
# save the image!
full_img_2 = full_img.copy()
# erase the region
full_img_2[r_top:r_bottom,c_left:c_right,:] = (0,0,0)
# Display the original area, to control
plt.figure()
plt.imshow(full_img[r_top:r_bottom,c_left:c_right,:])
plt.title("Removed region")
plt.show()
else:
full_img_2 = full_img
plt.figure(figsize=(15,10))
for ii, (method, name, lower_is_better) in enumerate(tm_methods):
# compute matching map
res = cv2.matchTemplate(full_img_2, template, method)
# retrieve the location of the best value
minVal, maxVal, minLoc, maxLoc = cv2.minMaxLoc(res)
bestVal, bestLoc = (minVal, minLoc) if lower_is_better else (maxVal, maxLoc)
# Display patch
ax = plt.subplot(3, len(tm_methods), 1+ii)
bestLoc_c, bestLoc_r = bestLoc # the order is reversed, this is terribly stupid
endLoc_r, endLoc_c = np.array((bestLoc_r, bestLoc_c)) + template.shape[:2]
patch = full_img[bestLoc_r:endLoc_r, bestLoc_c:endLoc_c, ...]
plt.imshow(patch)
plt.axis("off")
plt.title(name)
# Display region in map
plt.subplot(3, len(tm_methods), 1+ii+len(tm_methods))
plt.imshow(res)
plt.plot((bestLoc_c+endLoc_c)//2, (bestLoc_r+endLoc_r)//2,
'o', markeredgecolor='r', markerfacecolor='none', markersize=10)
plt.axis("off")
plt.title("%0.3f" % bestVal)
# Display a zoomed-out region
ax = plt.subplot(3, len(tm_methods), 1+ii+len(tm_methods)*2)
zoom_margin = 100
zoom = full_img[max(0, bestLoc_r-zoom_margin):min(endLoc_r+zoom_margin, full_img.shape[0]),
max(0, bestLoc_c-zoom_margin):min(endLoc_c+zoom_margin, full_img.shape[1]),
...]
plt.imshow(zoom)
plt.axis("off")
plt.title("BTL: %d:%d" % bestLoc) # BTL = Best Top Left
plt.show()
OK, we have started to work on something. We are the point where we may have ideas about things to try: use grayscale images instead of color images, iterate over the best matches to look for a relevant match, suppress local maximums close to another local maximum…
But wait.
First, let us "save" where we are.
Write somes notes here.
(prof)
We tried several pattern matching techniques. We manually selected a pattern and checked whether those methods were capable to find it in the image. Except the method CCORR
, they all are.
If we remove the area around the patch in the original image, replacing it by black pixels, and try to look for similar patterns, then we obtain an approximate location of another similar patch.
Now we have kept track of what we are doing, we can try to think a bit more about what we are actually doing.
Write some critics here.
(prof)
While it was fun to start trials right away, we do not have a sound experimental setup:
PS: and for the CCORR
approach, we should substract the mean value of the template to itself to avoid matching areas with higher intensities in the image.
We will now set up an experimental setup which will be useful for the next session.
This is what you should always do first, to perform experiment-driven research — much alike test-driven development:
To complete this session, we will now start over and make sure we have all the necessary pre-requisites to perform sound experiments.
To facilitate your work, we already segmented all the bubbles and gave them an id. If we have time, we'll discuss how we did that and/or show you the code (in two words: simple thresholding and two connected components labellings). You can now use images likes those:
All the files are located here: twin_it/bubbles_200dpi/bNNN.png
where NNN
is a zero-padded number between 001
and 391
.
We will also help you by telling you that those two bubbles (b044
and b092
) have twins (there are more twins though!):
and that those two bubbles (b001
and b096
) do not have twins:
You will now try to match pairs of isolated bubbles. This removes the risk of matching in-between content and allows us to have a precise and simple evaluation: for a given test bubble, we should know what is the twin bubble, if there is any.
You will need a little trick to be able to perform pattern matching on images which have approximately the same size: you will have to pad the base image (one of the two bubble being matched) in order to cope with possible texture translation. You can use the function cv2.copyMakeBorder()
.
The goal is to ensure that the base image has borders large enough to contain the largest patterns (both horizontally and vertically).
As image border is an important concept, we will illustrate the possible borders values before going further. This section is copied from the original OpenCV documentation.
The function cv2.copyMakeBorder()
takes following arguments:
src
: input imagetop
, bottom
, left
, right
: border width in number of pixels in corresponding directionsborderType
: Flag defining what kind of border to be added. It can be following types:cv2.BORDER_CONSTANT
: Adds a constant colored border, like this: xxxxx|abcdefgh|xxxxx
. The value should be given as next argument.cv2.BORDER_REFLECT
: Border will be mirror reflection of the border elements, like this: fedcba|abcdefgh|hgfedcb
cv2.BORDER_REFLECT_101
or cv.BORDER_DEFAULT
: Same as above, but with a slight change, like this: gfedcb|abcdefgh|gfedcba
cv2.BORDER_REPLICATE
: Last element is replicated throughout, like this: aaaaaa|abcdefgh|hhhhhhh
cv2.BORDER_WRAP
: Circular copy of the rows and columns from the other side of the image, like this: cdefgh|abcdefgh|abcdefg
value
- Color of border if border type is cv2.BORDER_CONSTANT
Below is a sample code demonstrating all these border types for better understanding:
import cv2
import numpy as np
from matplotlib import pyplot as plt
BLUE = [255,0,0]
img1 = cv2.imread('opencv-logo.png')
replicate = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REPLICATE)
reflect = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REFLECT)
reflect101 = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REFLECT_101)
wrap = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_WRAP)
constant= cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_CONSTANT,value=BLUE)
plt.subplot(231),plt.imshow(img1,'gray'),plt.title('ORIGINAL')
plt.subplot(232),plt.imshow(replicate,'gray'),plt.title('REPLICATE')
plt.subplot(233),plt.imshow(reflect,'gray'),plt.title('REFLECT')
plt.subplot(234),plt.imshow(reflect101,'gray'),plt.title('REFLECT_101')
plt.subplot(235),plt.imshow(wrap,'gray'),plt.title('WRAP')
plt.subplot(236),plt.imshow(constant,'gray'),plt.title('CONSTANT')
plt.show()
This produces the result below. (Image is displayed with matplotlib. So RED and BLUE channels will be interchanged):
We can make use of Jupyter's magic to quickly get a list of all bubble images, and load them all.
# TODO read bubble images
bubble_files = [] # FIXME
bubble_images = []
for bf in bubble_files:
pass # FIXME
#prof
# read and resize bubble images
bubble_files = !ls res/twin_it/bubbles_200dpi/b*.png | sort
bubble_images = []
for bf in bubble_files:
img = cv2.imread(bf)
img_small = cv2.resize(img, None, None, 0.25, 0.25, cv2.INTER_AREA)
bubble_images.append(img_small)
print(len(bubble_images))
plt.figure()
plt.imshow(bubble_images[0])
plt.show()
print(bubble_images[0].shape, bubble_images[0].dtype)
# Note that when we load images with OpenCV, channels are in BGR order.
391
(98, 197, 3) uint8
It is more efficient to consider the bubble under test as the base image, and pad it once for all.
# TODO display the test images
test_image_ids_withtwin = [] # FIXME
test_image_ids_notwin = [] # FIXME
plt.figure()
for ii, img_id in enumerate(test_image_ids_withtwin + test_image_ids_notwin):
plt.subplot(1, 4, 1+ii)
plt.imshow(cv2.cvtColor(bubble_images[img_id], cv2.COLOR_BGR2RGB))
plt.title(bubble_files[img_id].split('/')[-1])
plt.axis('off')
plt.show()
#prof
# select the images
test_image_ids_withtwin = [43, 91] # b044 and b092
test_image_ids_notwin = [0, 95] # b001 and b096
plt.figure()
for ii, img_id in enumerate(test_image_ids_withtwin + test_image_ids_notwin):
plt.subplot(1, 4, 1+ii)
plt.imshow(cv2.cvtColor(bubble_images[img_id], cv2.COLOR_BGR2RGB))
plt.title(bubble_files[img_id].split('/')[-1])
plt.axis('off')
plt.show()
# TODO add a black border around the other images
max_bubble_shape = None # FIXME
test_image_withtwin = [] # FIXME
test_image_notwin = [] # FIXME
plt.figure(figsize=(16, 16))
for ii, img in enumerate(test_image_withtwin + test_image_notwin):
plt.subplot(1, 4, 1+ii)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()
<Figure size 432x288 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
#prof
# Read all bubble shapes, make an array of shape (391, 2 -- not 3!), and compute maximum along axis 0
max_rows, max_cols = np.max(np.array([img.shape[:2] for img in bubble_images]), axis=0)
max_rows, max_cols
(109, 197)
#prof
# Prepare the images
BLACK = (0, 0, 0)
test_image_withtwin = [
cv2.copyMakeBorder(bubble_images[img_id],
max_rows//2, max_rows//2, max_cols//2, max_cols//2, # // 2 because we'll crop the templates
cv2.BORDER_CONSTANT, value=BLACK)
for img_id in test_image_ids_withtwin]
test_image_notwin = [
cv2.copyMakeBorder(bubble_images[img_id],
max_rows//2, max_rows//2, max_cols//2, max_cols//2, # // 2 because we'll crop the templates
cv2.BORDER_CONSTANT, value=BLACK)
for img_id in test_image_ids_notwin]
plt.figure(figsize=(16, 16))
for ii, img in enumerate(test_image_withtwin + test_image_notwin):
plt.subplot(1, 4, 1+ii)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()
We can now compute the matching of each bubble against the others, with the method of our choice.
# TODO find the best matches for all test images
#prof
num_matches = 10
tm_methods = [
# method, name, lower_is_better
(cv2.TM_SQDIFF, "SQDIFF", True),
(cv2.TM_SQDIFF_NORMED, "SQDIFF_NORMED", True),
(cv2.TM_CCORR, "CCORR", False),
(cv2.TM_CCORR_NORMED, "CCORR_NORMED", False),
(cv2.TM_CCOEFF, "CCOEFF", False),
(cv2.TM_CCOEFF_NORMED, "CCOEFF_NORMED", False)]
def bid2name(img_id):
name = bubble_files[img_id].split('/')[-1]
return name[:-4]
for method, name, lower_is_better in tm_methods:
plt.figure(figsize=(15,10))
print("Results with method %s (%s is better)" % (name, "lower" if lower_is_better else "higher"))
for ii, (test_bubble_id, test_bubble) in enumerate(zip(
test_image_ids_withtwin + test_image_ids_notwin,
test_image_withtwin + test_image_notwin)):
# test_bubble is a padded image
# compute the scores
best_results = np.empty((len(bubble_files)))
for jj, (b_path, b_img) in enumerate(zip(bubble_files, bubble_images)):
# b_img is the template
if jj == test_bubble_id:
# write skipping value
best_results[jj] = np.infty if lower_is_better else 0
continue
# crop the pattern to avoid having black regions around it
b_img_row, b_img_cols = b_img.shape[:2]
b_img = b_img[b_img_row//4:(3*b_img_row)//4, b_img_cols//4:(3*b_img_cols)//4,:]
# compute matching map
res = cv2.matchTemplate(test_bubble, b_img, method)
# retrieve the location of the best value
minVal, maxVal, minLoc, maxLoc = cv2.minMaxLoc(res)
bestVal, _bestLoc = (minVal, minLoc) if lower_is_better else (maxVal, maxLoc)
# store the best value
best_results[jj] = bestVal
# analyse the scores
best_matches_id = np.argsort(best_results)
if not lower_is_better:
# reverse if needed
best_matches_id = best_matches_id[::-1]
# display the query
plt.subplot(4, num_matches+1, ii*(num_matches+1) + 1)
plt.imshow(cv2.cvtColor(bubble_images[test_bubble_id], cv2.COLOR_BGR2RGB))
plt.title("QUERY\n%s" % bid2name(test_bubble_id))
plt.axis('off')
# display the best matches
for kk, best_id in enumerate(best_matches_id[:num_matches]):
best_val = best_results[best_id]
best_img = bubble_images[best_id]
best_bname = bid2name(best_id)
plt.subplot(4, num_matches+1, ii*(num_matches+1) + 2 + kk)
plt.imshow(cv2.cvtColor(best_img, cv2.COLOR_BGR2RGB))
plt.title("%s\n%.3f" % (best_bname, best_val))
plt.axis('off')
plt.show()
Results with method SQDIFF (lower is better)
Results with method SQDIFF_NORMED (lower is better)
Results with method CCORR (higher is better)
Results with method CCORR_NORMED (higher is better)
Results with method CCOEFF (higher is better)
Results with method CCOEFF_NORMED (higher is better)
We now have enough information to draw some conclusions about this approach.
TODO write your answers here
(prof)
First, for each method, the normalized version seems to work better.
While the first result for each bubble with a twin is relevant, is may not be possible to detect the bubble with twins just by looking at the best value because the value associated with the best match of a bubble without twin may be better than the value associated with the best match of a bubble with twin.
Furthermore, we only tested for a few images and have no guarantee about the generalization power of our approach.
Several bubbles are present in the best-matching results for all queries. Such "attractors" are very common with approaches like this one, where elements with average color or texture get matched with many other elements, and pollute the results.
Given our results, we may not directly obtain the matching pairs but presenting the best matches for each bubble can significantly reduce the human time needed to solve the problem.
There are many possible improvements of the current approach.
First it is common for pattern matching techniques to substract the average gray value of an image before matching parts of it. In our case the average gray value may not match the average gray value of each bubble, and we should mask the background. However this would be an important point to test.
Also, regarding the values produces by the matching, we could try to normalize the values in the big similarity matrix we can compute over each pair of bubbles. We could also pre-filter bubbles according to their main colors.
Despite those possible improvements, this approach would still limited to detecting matching textures with very little challenges: here we can only cope with translation and some slight noise and illumination variation. More robust techniques are necessary to tackle more challenging cases, like ones with strong illumination changes, rotations, scaling, perspective, noise, etc.
Finally, the current approach is very slow and cannot be used with thousands of images.
Congratulations, you just reached the end of this session!