Revision as of 11:37, 8 July 2016

EvaLTex (Evaluating Text Localization) is a unified evaluation framework used to measure the performance of text detection and text segmentation algorithms. It takes as input text objects represented either by rectangle coordinates or by irregular masks. The output consists of a set of scores, at local and global levels, and a visual representation of the behavior of the analysed algorithm through quality histograms.

For more details on the evaluation protocol, read the scientific paper published in the Image and Vision Computing Journal. Details on the visual representation of the evaluation can be found in the article published in the Proc. of International Conference in Document Analysis and Recognition.

Please cite the IVC paper in all publications that use the EvaLTex tool and the ICDAR paper in all publications that use the histogram representation.

Evaluation performance measurements

Local evaluation

For each matched GT object we assign two quality measures: Coverage (Cov) and Accuracy (Acc);

Cov computes the rate of the matched area with respect to the GT object area
Acc computes the rate of the matched area with respect to the detection area

Global evaluation

The Recall ( $R$ ) computes the amount of detected text. We compute 3 measures: a global $R$ , a quantitative $R$ that measures the amount detected objects (regardless of the matched area) and a qualitative $R$ that corresponds to the rate of the detected text area with respect to the number of true positives ( $TP$ ).

$R_{G}={\frac {\sum Cov}{G}}$
$R_{quant}={\frac {TP}{G}}$
$R_{qual}={\frac {\sum Cov}{TP}}$

The Precision ( $P$ ) computes the rate of detections that have a match in the GT. Similarly to $R$ , we compute 3 measures: a quantitative $P$ that measures the amount of valid detections (regardless of the matched area) and a qualitative $P$ that corresponds to the rate of the detected text area with respect to the number of total detections, computed as the sum of $TP$ and $FP$

$P_{G}={\frac {\sum Acc}{TP+FP}}$
$P_{quant}={\frac {TP}{TP+FP}}$
$P_{qual}={\frac {\sum Acc}{TP+FP}}$

Input format

The framework takes as input .txt files containing the coordinates of the text bounding boxes and labeled images containing text masks, depending on the evaluation task. To unify the input format

Text detection tasks

Text detection results can be represented through boxes and masks. For text detection tasks using bounding boxes, a .txt file is enough. If the text objects are represented by irregular masks, then an additional labeled image will be needed.

Boxes

Ground truth format

The GT format contains the following attributes:

img name
image height, image width
text object

ID: unique text object ID
region ID: region ID to which the object belongs to
"transcription": can be empty
text reject: option that decides if a text object should be counted or not; can be set to f (default) or t (not take into account)
x: x coordinate of the bounding box
y: y coordinate of the bounding box
width: width of the bounding box
height:x height of the bounding box

e.g.

img_1 960,1280 1,1,"Tiredness",f,38,43,882,172 2,2,"kills",f,275,264,390,186 3,3,"A",f,0,699,77,131 4,3,"short",f,128,705,355,134 5,3,"break",f,542,710,396,131 6,4,"could",f,87,884,370,137 7,4,"save",f,517,919,314,105 8,5,"your",f,166,1095,302,136 9,5,"life",f,530,1069,213,137

Detection result format

The detection .txt file format contains the following attributes:

img name
text object

ID: unique text object ID
"transcription": can be empty
x: x coordinate of the bounding box
y: y coordinate of the bounding box
width: width of the bounding box
height:x height of the bounding box

e.g.

img_1 1,"",272,264,392,186 2,"",34,40,886,175 3,"",168,1082,300,148

Masks

If a more precise evaluation is needed, one can also evaluate the detection of text masks. To do so, we need TODO add explinations.

Ground truth format

Detection result format

Text segmentation tasks

Ground truth format

Similar to text detection tasks using masks, to evaluate text segmentation we use, in addition to the .txt file a labeled image (each character is labeled differently). Each GT object is represented by a character. Character-level GT objects cannot be grouped into regions and consequently each text object has a different region tag. The x, y, width and height will define the coordinates of the bounding box of each character.

e.g.

img_1 960,1280 1,1,"",f,384,43,101,166 2,2,"",f,142,44,46,164 3,3,"",f,38,47,106,163 4,4,"",f,192,80,71,126 5,5,"",f,269,80,100,131 6,6,"",f,501,81,97,126 7,7,"",f,721,81,97,131

Detection result format

Output format

The evaluation results are given in two forms:

global evaluation for an entire dataset

EvaLTex statistics General Number of GTs =6410 Number of detections = 6338 Number of false positives =1890 Number of true positives =4678 Global results Recall=0.759731 Recall_noSplit=0.760799 Precision=0.692591 Split=0.791221 FScore=0.724609 FScore_noSplit=0.725095 Quantity results Recall=0.792747 Precision=0.712241 Quality results Recall=0.958352 Recall_noSplit=0.959699 Precision=0.972411 Coverage histogram = {0.214201, 0.00491442, 0.00423657, 0.00491442, 0.00491442, 0.00525335, 0.00610066, 0.0132181, 0.0250805, 0.717167} Accuracy histogram = {0.288977, 0.00137028, 0.00091352, 0.0022838, 0.00365408, 0.00471985, 0.00517661, 0.0103532, 0.0235993, 0.658952} EMD results Recall=0.702796 Precision=0.696302 FScore=0.699534

local evaluation .txt file for each image

EvaLTex statistics - image img_1 General Number of GTs =43 Number of detections = 19 Number of false positives =1 Number of true positives =18

Global results Recall=0.414803 Recall_noSplit=0.414803 Precision=0.921798 Split=0.428571 FScore=0.572144 FScore_noSplit=0.572144

Quantity results Recall=0.967873 Precision=0.947368

Quality results Recall=0.428571 Recall_noSplit=0.967873 Precision=0.973009 Coverage histogram = {0.571429, 0, 0, 0, 0, 0, 0.0238095, 0, 0.0238095, 0.380952} Accuracy histogram = {0.288977, 0.00137028, 0.00091352, 0.0022838, 0.00365408, 0.00471985, 0.00517661, 0.0103532, 0.0235993, 0.658952}

EMD results Recall=0.420952 Recall_noSplit=0.420952 Precision=0.926316 FScore=0.578853 FScore_noSplit=0.578853

Local evaluation GT object 1 Coverage = 1 Accuracy = 0.991792 Split = 1 GT object 2 Coverage = 0.809862 Accuracy = 0.994543 Split = 1 GT object 3 Coverage = 1 Accuracy = 0.954386 Split = 1 GT object 4 Coverage = 0.998092 Accuracy = 0.967474 Split = 1 GT object 5 Coverage = 1 Accuracy = 0.993222 Split = 1 GT object 6 Coverage = 1 Accuracy = 0.960362 Split = 1

Run the evaluation

The executable EvaLTex takes as input two .txt files, one for the GT and one for the detection/segmentation.

Usage

./EvaLTex -g gt.txt -d det.txt -o output_dir/ [options] -v/--verbose : values belonging to 1,2,3 and 4 -m/--mask image_gt/ image_det/ -t/--treshold

Datasets

ICDAR 2013

Born-digital

ground truth .txt
labeled images

Natural scene

ground truth .txt
labeled images

Downloads

Credits

EvaLTex was written by Ana Stefania CALARASANU. Please send any suggestions, comments or bug reports to calarasanu@lrde.epita.fr

Please cite the ICV paper in all publications that use the EvaLTex tool and the ICDAR paper in all publications that use the histogram representation.

@@ Line 120: / Line 120: @@
 = Output format=
 The evaluation results are given in two forms:
+* global evaluation for an entire dataset
+ {| class="wikitable"
+|-
+|<small>
+EvaLTex statistics<br/>
+General <br/>
+	Number of GTs =6410<br/>
+	Number of detections = 6338<br/>
+	Number of false positives =1890<br/>
+	Number of true positives =4678<br/>
+Global results<br/>
+	Recall=0.759731<br/>
+	Recall_noSplit=0.760799<br/>
+	Precision=0.692591<br/>
+	Split=0.791221<br/>
+	FScore=0.724609<br/>
+	FScore_noSplit=0.725095<br/>
+Quantity results<br/>
+	Recall=0.792747<br/>
+	Precision=0.712241<br/>
+Quality results<br/>
+	Recall=0.958352<br/>
+	Recall_noSplit=0.959699<br/>
+	Precision=0.972411<br/>
+	Coverage histogram = {0.214201, 0.00491442, 0.00423657, 0.00491442, 0.00491442, 0.00525335, 0.00610066, 0.0132181, 0.0250805, 0.717167}<br/>
+	Accuracy histogram = {0.288977, 0.00137028, 0.00091352, 0.0022838, 0.00365408, 0.00471985, 0.00517661, 0.0103532, 0.0235993, 0.658952}<br/>
+EMD results<br/>
+	Recall=0.702796<br/>
+	Precision=0.696302<br/>
+	FScore=0.699534<br/>
+</small>
+|}
 * local evaluation ''.txt'' file for each image
@@ Line 198: / Line 238: @@
 GT object 6<br/>
 	Coverage = 1	Accuracy = 0.960362	Split = 1<br/>
-</small>
-|}
-* global evaluation for an entire dataset
- {| class="wikitable"
-|-
-|<small>
-EvaLTex statistics<br/>
-General <br/>
-	Number of GTs =6410<br/>
-	Number of detections = 6338<br/>
-	Number of false positives =1890<br/>
-	Number of true positives =4678<br/>
-Global results<br/>
-	Recall=0.759731<br/>
-	Recall_noSplit=0.760799<br/>
-	Precision=0.692591<br/>
-	Split=0.791221<br/>
-	FScore=0.724609<br/>
-	FScore_noSplit=0.725095<br/>
-Quantity results<br/>
-	Recall=0.792747<br/>
-	Precision=0.712241<br/>
-Quality results<br/>
-	Recall=0.958352<br/>
-	Recall_noSplit=0.959699<br/>
-	Precision=0.972411<br/>
-	Coverage histogram = {0.214201, 0.00491442, 0.00423657, 0.00491442, 0.00491442, 0.00525335, 0.00610066, 0.0132181, 0.0250805, 0.717167}<br/>
-	Accuracy histogram = {0.288977, 0.00137028, 0.00091352, 0.0022838, 0.00365408, 0.00471985, 0.00517661, 0.0103532, 0.0235993, 0.658952}<br/>
-EMD results<br/>
-	Recall=0.702796<br/>
-	Precision=0.696302<br/>
-	FScore=0.699534<br/>
 </small>
 |}

Difference between revisions of "Evaltex"

From LRDE

Revision as of 11:37, 8 July 2016

Contents

Evaluation performance measurements

Local evaluation

Global evaluation

Input format

Text detection tasks

Boxes

Ground truth format

Detection result format

Masks

Ground truth format

Detection result format

Text segmentation tasks

Ground truth format

Detection result format

Output format

Run the evaluation

Datasets

ICDAR 2013

Born-digital

Natural scene

Downloads

Credits