Revision as of 09:52, 13 July 2016

EvaLTex (Evaluating Text Localization) is a unified evaluation framework used to measure the performance of text detection and text segmentation algorithms. It takes as input text objects represented either by rectangle coordinates or by irregular masks. The output consists of a set of scores, at local and global levels, and a visual representation of the behavior of the analysed algorithm through quality histograms.

For more details on the evaluation protocol, read the scientific paper published in the Image and Vision Computing Journal and the Ph.D Thesis. Details on the visual representation of the evaluation can be found in the article published in the Proc. of International Conference in Document Analysis and Recognition.

Please cite the IVC paper in all publications that use the EvaLTex tool and the ICDAR paper in all publications that use the histogram representation.

Evaluation performance measurements

Local evaluation

For each matched GT object $G_{i}$ by a detection $D_{j}$ we assign two quality measures: Coverage (Cov) and Accuracy (Acc);

Cov computes the rate of the matched area with respect to the GT object area

$Cov_{i}={\frac {Area(G_{i}\bigcap D_{j})}{Area(G_{i})}}$

Acc computes the rate of the matched area with respect to the detection area

$Acc_{i}={\frac {Area(G_{i}\bigcap D_{j})}{Area(D_{j})}}$

The two quality metrics are adapted based on the type of matching (one-to-one, one-to-many, many-to-one or many-to-many). For more details please refer to the scientific paper published in the Image and Vision Computing Journal and the details in Chapter 3 of this Ph.D Thesis.

Global evaluation

The global evaluation consists of a set of measurements: a global recall $R_{G}$ , a quantitative recall $R_{quant}$ , a qualitative recall $R_{qual}$ , a "global" precision $P_{G}$ , a "quantitative" precision $P_{quant}$ , a "qualitative" precision $P_{qual}$ , a split metric as well as an overall F-Score value. In addition, the tool provides two histogram representations of the local qualities and a derived set of metrics ( $R_{EMD}$ and $P_{EMD}$ ) computed using histogram distances. For the comprehension of all these metrics, we define the following:

$N_{G}$ = nb. of GT objects in the image/dataset
$TP$ = nb. of true positives (GT objects that were detected)
$FP$ = nb. false positives (detections with no correspondence in the GT)

Recall. The Recall ( $R_{G}$ ) computes the amount of detected text and is defined as the product of two terms:

 $R_{G}={\frac {\sum _{i=1}^{N_{G}}Cov_{i}}{N_{G}}}={\frac {TP}{N_{G}}}\cdot {\frac {\sum _{i=1}^{N_{G}}Cov_{i}}{TP}}$

The left term of the product represents the ratio between the number of true positives and the total number of GT objects. We interpret this ratio as the quantity Recall $R_{quant}$ , as it accurately describes the percentage of detected GT objects, regardless of their coverage:

$R_{quant}={\frac {TP}{N_{G}}}$

The second term is get by averaging all coverage rates of the detected GT objects. Intuitively, we can denote this proportion as the quality Recall, $R_{qual}$ , as it characterizes the mean of covered surface of the GT:

$R_{qual}={\frac {\sum _{i=1}^{N_{G}}Cov_{i}}{TP}}$

Precision. By applying the same reasoning, we obtain the following decomposition of the global Precision $P_{G}$ :

 $P_{G}={\frac {\sum _{i=1}^{N_{G}}Acc_{i}}{TP+FP}}={\frac {TP}{TP+FP}}\cdot {\frac {\sum _{i=1}^{N_{G}}Acc_{i}}{TP}}$

Here again, the left term of the product provides an insight on the percentage of detections that have a correspondence in the GT. Consequently, we call this measure the quantity precision $P_{quant}$ :

$P_{quant}={\frac {TP}{TP+FP}}$

Inversely, the right term computes the accuracy average obtained from the matching of the detection set and the GT. This ratio will then be referred to as the Precision quality $P_{qual}$ :

$P_{qual}={\frac {\sum _{i=1}^{N_{G}}Acc_{i}}{TP}}$

Split. The Split metric evaluates the level of GT fragmentation in a dataset and is computed as:

$S={\frac {\sum _{i=1}^{N_{G}}{\frac {1}{1+\ln(s_{i})\cdot \ln(s_{i})}}\cdot 0.6+0.4}{N_{G}}}$ , where $s_{i}$ =nb. of detections matching $G_{i}$

The Split measure can be used as an individual metric or integrated in the Recall computation. For more details please refer to the scientific paper published in the Image and Vision Computing Journal and the details in Chapter 3 of this Ph.D Thesis.

F-Score. We use as an overall metric the well known F-Score defined as:

$F_{G}={\frac {2\cdot R_{G}\cdot P_{G}}{R_{G}+P_{G}}}$

Quality histograms.

Coverage histogram

Accuracy histogram

EMD metrics.

$R_{EMD}=1-EMD({\widetilde {h}}_{Cov},{\widetilde {h}}_{O})$

Input format

The framework takes as input .txt files containing the coordinates of the text bounding boxes and labeled images containing text masks, depending on the evaluation task. To unify the input format

Text detection tasks

Text detection results can be represented through boxes and masks. For text detection tasks using bounding boxes, a .txt file is enough. If the text objects are represented by irregular masks, then an additional labeled image will be needed.

Text box representation

GT format. The GT format contains the following attributes:

img name
image height, image width
text object

ID: unique text object ID
region ID: region ID to which the object belongs to
"transcription": can be empty
text reject: option that decides if a text object should be counted or not; can be set to f (default) or t (not take into account)
x: x coordinate of the bounding box
y: y coordinate of the bounding box
width: width of the bounding box
height: height of the bounding box

e.g.: img_1.txt

img_1 960,1280 1,1,"Tiredness",f,38,43,882,172 2,2,"kills",f,275,264,390,186 3,3,"A",f,0,699,77,131 4,3,"short",f,128,705,355,134 5,3,"break",f,542,710,396,131 6,4,"could",f,87,884,370,137 7,4,"save",f,517,919,314,105 8,5,"your",f,166,1095,302,136 9,5,"life",f,530,1069,213,137

Detection format. The detection .txt file format contains the following attributes:

img name
text object

ID: unique text object ID
"transcription": can be empty
x: x coordinate of the bounding box
y: y coordinate of the bounding box
width: width of the bounding box
height: height of the bounding box

e.g.: img_1.txt

img_1 1,"",272,264,392,186 2,"",34,40,886,175 3,"",168,1082,300,148

Text mask representation

If a more precise evaluation is needed, one can also evaluate the detection of text masks. To do so, we need, in the addition of the file format explained before, a set of labeled images, for the GT and the detection.

The interest of using masks rather than rectangles is to represent text strings, not only in horizontal or vertical configurations, but also tilted, circular, curved or in perspective. In such cases, the rectangular representation might disturb the matching process: a detection can involuntary match a GT object due to its varying direction (inclined, curved, circular).

The irregular mask annotation disables the use of the region tag. When dealing with rectangular boxes, the regions are generated automatically based on the coordinates of the GT objects. Consequently, a region is the bounding box of several "smaller" boxes. Thus, when masks are annotated irregularly, regions cannot be generated automatically.

Original image with curved text

Labeled GT masks

Text segmentation tasks

To evaluate text segmentation tasks we use the same mask input format as for the text detection. In text segmentation, a mask will be considered a character and not an entire word.

Original image

Labeled GT characters

Labeled segmentation result example

Text segmentation GT format. Similar to text detection tasks using masks, to evaluate text segmentation we use, in addition to the .txt file a labeled image (each character is labeled differently). Each GT object is represented by a character. Character-level GT objects cannot be grouped into regions and consequently each text object has a different region tag. The x, y, width and height will define the coordinates of the bounding box of each character.

e.g.: img_1.txt

img_1 960,1280 1,1,"",f,384,43,101,166 2,2,"",f,142,44,46,164 3,3,"",f,38,47,106,163 4,4,"",f,192,80,71,126 5,5,"",f,269,80,100,131 6,6,"",f,501,81,97,126 7,7,"",f,721,81,97,131 ... 16,16,"",t,97,703,53,16 ...

Text segmentation result format. The result format consists in the same labeled image as the one used for the GT and the detection .txt file containing the positions of the bounding boxes of each segmented connected component.

e.g.: img_1.txt

img_1 1,"",383,42,103,167 2,"",142,43,49,167 3,"",35,44,112,168 4,"",268,79,101,132 5,"",194,81,71,124 6,"",500,81,99,127 7,"",612,81,100,131 8,"",721,81,97,131 9,"",824,82,99,133 10,"",344,883,29,135 11,"",387,886,65,135 ...

Output format

The evaluation results are given as .txt files, in two forms: a file with the results obtained on the entire dataset and a file with results generated for each image in the dataset. The difference between the local evaluation and the global one consists in the statistics (Cov, Acc and split) for each GT object in an image.

Global evaluation for an entire dataset

EvaLTex statistics General Number of GTs =6410 Number of detections = 6338 Number of false positives =1890 Number of true positives =4678

EvaLTex statistics summarize the number of GT objects, detections, false positives and true positives in the dataset.

Global results Recall=0.759731 Recall_noSplit=0.760799 Precision=0.692591 Split=0.791221 FScore=0.724609 FScore_noSplit=0.725095

The global scores are the default Recall score (with integrated Split), the Recall with no integrated Split, the Precision, as well as the default FScore (with integrated Split) and the FScore without the integrated Split.

Quantity results Recall=0.792747 Precision=0.712241

Quantity results only refer to the number of detected text objects or the number of detections with a match in the GT regardless of the coverage or accuracy areas.

Quality results Recall=0.958352 Recall_noSplit=0.959699 Precision=0.972411 Coverage histogram = {0.214201, 0.00491442, 0.00423657, 0.00491442, 0.00491442, 0.00525335, 0.00610066, 0.0132181, 0.0250805, 0.717167} Accuracy histogram = {0.288977, 0.00137028, 0.00091352, 0.0022838, 0.00365408, 0.00471985, 0.00517661, 0.0103532, 0.0235993, 0.658952}

The quality results contain two histograms, representing the coverage and accuracy distributions over the dataset. e.g.

Coverage histogram

Accuracy histogram

EMD results Recall=0.702796 Precision=0.696302 FScore=0.699534

As an alternative to the global scores, we can also compute, using the EMD distance two quality scores based on the Coverage and Accuracy histograms.

Local evaluation .txt file for each image

The local evaluation file, generate for each image of the dataset, has the same format as the dataset generated file.

EvaLTex statistics - image img_1 General Number of GTs =43 Number of detections = 19 Number of false positives =1 Number of true positives =18 Global results Recall=0.414803 Recall_noSplit=0.414803 Precision=0.921798 Split=0.428571 FScore=0.572144 FScore_noSplit=0.572144 Quantity results Recall=0.967873 Precision=0.947368 Quality results Recall=0.428571 Recall_noSplit=0.967873 Precision=0.973009 Coverage histogram = {0.571429, 0, 0, 0, 0, 0, 0.0238095, 0, 0.0238095, 0.380952} Accuracy histogram = {0.288977, 0.00137028, 0.00091352, 0.0022838, 0.00365408, 0.00471985, 0.00517661, 0.0103532, 0.0235993, 0.658952} EMD results Recall=0.420952 Recall_noSplit=0.420952 Precision=0.926316 FScore=0.578853 FScore_noSplit=0.578853

In addition, it contains the accuracy, coverage and split values for all the GT objects in the dataset.

Local evaluation GT object 1 Coverage = 1 Accuracy = 0.991792 Split = 1 GT object 2 Coverage = 0.809862 Accuracy = 0.994543 Split = 1 GT object 3 Coverage = 1 Accuracy = 0.954386 Split = 1 GT object 4 Coverage = 0.998092 Accuracy = 0.967474 Split = 1 GT object 5 Coverage = 1 Accuracy = 0.993222 Split = 1 GT object 6 Coverage = 1 Accuracy = 0.960362 Split = 1

Run the evaluation

The executable EvaLTex takes as input two .txt files, one for the GT and one for the detection/segmentation.

Usage

./EvaLTex -g gt.txt -d det.txt -o output_dir/ [options] -v/--verbose : values belonging to 1,2,3 and 4 -m/--mask image_gt/ image_det/ -t/--treshold

Resources

Evaluation Datasets

Text detection

Text segmentation

ICDAR 2013 Natural scene: ground truth .txt & labeled images
ICDAR 2013 Born-digital: ground truth .txt & labeled images

Source code available soon

Credits

EvaLTex was written by Ana Stefania CALARASANU. Please send any suggestions, comments or bug reports to calarasanu@lrde.epita.fr.

Please cite the IVC paper in all publications that use the EvaLTex tool and the ICDAR paper in all publications that use the histogram representation.

@@ Line 74: / Line 74: @@
 | [[File:evaltex_histogram_accuracy.png|thumb|upright|250px|alt=Accuracy histogram|Accuracy histogram]]
 |}
+'''EMD metrics.'''
+<math>R_{EMD} = 1 - EMD(\widetilde{h}_{Cov}, \widetilde{h}_{O}) </math>
 = Input format =

Difference between revisions of "Evaltex"

From LRDE