Revision as of 14:45, 7 July 2016

EvaLTex (Evaluating Text Localization) is a unified evaluation framework used to measure the performance of text detection and text segmentation algorithms. It takes as input text objects represented either by rectangle coordinates or by irregular masks. The output consists of a complex set of scores, at local and global levels, and a visual representation of the behavior of the analysed algorithm.

For more details on the evaluation protocol, read the scientific paper published in the Image and Vision Computing Journal. Details on the visual representation of the evaluation can be found in the article published in the Proc. of International Conference in Document Analysis and Recognition.

Please cite the ICV paper in all publications that use the EvaLTex tool.

Performance measurements

Local evaluation

For each matched GT object we assign two quality measures: Coverage (Cov) and Accuracy (Acc);

Cov computes the rate of the matched area with respect to the GT object area
Acc computes the rate of the matched area with respect to the detection area

Global evaluation

The Recall ( $R$ ) computes the amount of detected text. We compute 3 measures: a global $R$ , a quantitative $R$ that measures the amount detected objects (regardless of the matched area) and a qualitative $R$ that corresponds to the rate of the detected text area with respect to the number of true positives ( $TP$ ).

$R_{G}={\frac {\sum Cov}{G}}$
$R_{quant}={\frac {TP}{G}}$
$R_{qual}={\frac {\sum Cov}{TP}}$

The Precision ( $P$ ) computes the rate of detections that have a match in the GT. Similarly to $R$ , we compute 3 measures: a quantitative $P$ that measures the amount of valid detections (regardless of the matched area) and a qualitative $P$ that corresponds to the rate of the detected text area with respect to the number of total detections, computed as the sum of $TP$ and $FP$

$P_{G}={\frac {\sum Acc}{TP+FP}}$
$P_{quant}={\frac {TP}{TP+FP}}$
$P_{qual}={\frac {\sum Acc}{TP+FP}}$

Input format

The framework takes as input .txt files containing the coordinates of the bounding boxes surrounding the text objects and binary images corresponding to the text object masks.

Ground truth (GT)

The GT files contains the reference to which the detection and segmentation results will be compared to. For text detection tasks using bounding boxes, a .txt file is enough. If the text objects are represented by irregular masks, then an additional labeled image will be needed.

Text detection

The GT format contains the following attributes:

img name
image height, image width
text object

ID: unique text object ID
region ID: region ID to which the object belongs to
"transcription": can be empty
text reject: option that decides if a text object should be counted or not; can be set to f (default) or t (not take into account)
x: x coordinate of the bounding box
y: y coordinate of the bounding box
width: width of the bounding box
height:x height of the bounding box

e.g.:

img_1 960,1280 1,1,"Tiredness",f,38,43,882,172 2,2,"kills",f,275,264,390,186 3,3,"A",f,0,699,77,131 4,3,"short",f,128,705,355,134 5,3,"break",f,542,710,396,131 6,4,"could",f,87,884,370,137 7,4,"save",f,517,919,314,105 8,5,"your",f,166,1095,302,136 9,5,"life",f,530,1069,213,137

Text segmentation

To evaluate text segmentation we use, in addition to the .txt file a labeled image (each character is labeled differently). Each GT object is represented by a character. Character-level GT objects cannot be grouped into regions and consequently each text object has a different region tag. The x, y, width and height will define the coordinates of the bounding box of each character.

e.g.

img_1 960,1280 1,1,"",f,384,43,101,166 2,2,"",f,142,44,46,164 3,3,"",f,38,47,106,163 4,4,"",f,192,80,71,126 5,5,"",f,269,80,100,131 6,6,"",f,501,81,97,126 7,7,"",f,721,81,97,131

Detection/Segmentation

The detection .txt file formats differ slightly from the GT one:

no image size
no region tag
no reject option

e.g.

img_1 1,"",272,264,392,186 2,"",34,40,886,175 3,"",168,1082,300,148

Output

The output consists in a local valuation (for each image), as well as a global evaluation (one XML file for a whole database).

Run the evaluation

Parameters to run the tool

Datasets

Downloads

Credits

EvaLTex was written by Ana Stefania CALARASANU. Please send any suggestions, comments or bug reports to calarasanu@lrde.epita.fr

@@ Line 1: / Line 1: @@
 <big>'''EvaLTex''' (Evaluating Text Localization) is a unified evaluation framework used to measure the performance of ''text detection'' and ''text segmentation'' algorithms. It takes as input text objects represented either by rectangle coordinates or by irregular masks. The output consists of a complex set of scores, at local and global levels, and a visual representation of the behavior of the analysed algorithm.
 </big>
+For more details on the evaluation protocol, read the scientific paper published in the Image and Vision Computing Journal. Details on the visual representation of the evaluation can be found in the article published in the Proc. of International Conference in Document Analysis and Recognition.
 {| class="wikitable"

Difference between revisions of "Evaltex"

From LRDE