Difference between revisions of "Evaltex"
From LRDE
Line 1: | Line 1: | ||
− | <big>'''EvaLTex''' (Evaluating Text Localization) is |
+ | <big>'''EvaLTex''' (Evaluating Text Localization) is a unified evaluation framework used to measure the performance of ''text detection'' and ''text segmentation'' algorithms. It takes as input text robjects represented either by rectangle coordinates or by irregular masks. The output consists of a complex set of scores, at local and global levels, and a visual representation of the behavior of the analysed algorithm. |
</big> |
</big> |
||
− | == |
+ | == Input format == |
⚫ | |||
⚫ | |||
+ | === Ground truth (GT) === |
||
⚫ | |||
+ | The GT files contains the reference to which the detection and segmentation results will be compared to. For text detection tasks using bounding boxes, a ''.txt'' file is enough. If the text objects are represented by irregular masks, then an additional labeled image will be needed. |
||
− | * name : the image name |
||
+ | |||
− | * size : image size |
||
+ | ==== Text detection ==== |
||
− | * region : 1st level components |
||
+ | The GT format contains the following attributes: |
||
+ | <small> |
||
+ | * img name<br /> |
||
+ | * image height, image width<br /> |
||
+ | * text object |
||
+ | :* ID: unique text object ID |
||
+ | :* region ID: region ID to which the object belongs to |
||
+ | :* "transcription": can be empty |
||
+ | :* text reject: option that decides if a text object should be counted or not; can be set to ''f'' (default) or ''t'' (not take into account) |
||
+ | :* x: x coordinate of the bounding box |
||
+ | :* y: y coordinate of the bounding box |
||
+ | :* width: width of the bounding box |
||
+ | :* height:x height of the bounding box |
||
+ | </small> |
||
+ | ''e.g.'': <br/> |
||
+ | :<small>img_1<br /> |
||
+ | :960,1280<br /> |
||
+ | :1,1,"Tiredness",f,38,43,882,172<br /> |
||
+ | :2,2,"kills",f,275,264,390,186<br /> |
||
+ | :3,3,"A",f,0,699,77,131<br /> |
||
+ | :4,3,"short",f,128,705,355,134<br /> |
||
+ | :5,3,"break",f,542,710,396,131<br /> |
||
+ | :6,4,"could",f,87,884,370,137<br /> |
||
+ | :7,4,"save",f,517,919,314,105<br /> |
||
+ | :8,5,"your",f,166,1095,302,136<br /> |
||
+ | :9,5,"life",f,530,1069,213,137<br /></small> |
||
+ | |||
+ | ==== Text segmentation ==== |
||
+ | To evaluate text segmentation we use, in addition to the ''.txt'' file a labeled image (each character is labeled differently). |
||
+ | Each GT object is represented by a character. Character-level GT objects cannot be grouped into regions and consequently each text object has a different region tag. |
||
+ | The ''x'', ''y'', ''width'' and ''height'' will define the coordinates of the bounding box of each character. |
||
+ | ''e.g.'' |
||
+ | |||
+ | <small>img_1<br /> |
||
+ | 960,1280<br /> |
||
+ | 1,1,"",f,384,43,101,166<br /> |
||
+ | 2,2,"",f,142,44,46,164<br /> |
||
+ | 3,3,"",f,38,47,106,163<br /> |
||
+ | 4,4,"",f,192,80,71,126<br /> |
||
+ | 5,5,"",f,269,80,100,131<br /> |
||
+ | 6,6,"",f,501,81,97,126<br /> |
||
+ | 7,7,"",f,721,81,97,131<br /> |
||
+ | </small> |
||
+ | |||
==== Mask representation ==== |
==== Mask representation ==== |
||
Line 13: | Line 57: | ||
TODO add images |
TODO add images |
||
− | === |
+ | === Detection === |
+ | |||
+ | |||
⚫ | |||
The output consists in a local valuation (for each image), as well as a global evaluation (one XML file for a whole database). |
The output consists in a local valuation (for each image), as well as a global evaluation (one XML file for a whole database). |
||
Revision as of 11:46, 7 July 2016
EvaLTex (Evaluating Text Localization) is a unified evaluation framework used to measure the performance of text detection and text segmentation algorithms. It takes as input text robjects represented either by rectangle coordinates or by irregular masks. The output consists of a complex set of scores, at local and global levels, and a visual representation of the behavior of the analysed algorithm.
Input format
The framework takes as input .txt files containing the coordinates of the bounding boxes surrounding the text objects and binary images corresponding to the text object masks.
Ground truth (GT)
The GT files contains the reference to which the detection and segmentation results will be compared to. For text detection tasks using bounding boxes, a .txt file is enough. If the text objects are represented by irregular masks, then an additional labeled image will be needed.
Text detection
The GT format contains the following attributes:
- img name
- image height, image width
- text object
- ID: unique text object ID
- region ID: region ID to which the object belongs to
- "transcription": can be empty
- text reject: option that decides if a text object should be counted or not; can be set to f (default) or t (not take into account)
- x: x coordinate of the bounding box
- y: y coordinate of the bounding box
- width: width of the bounding box
- height:x height of the bounding box
e.g.:
- img_1
- 960,1280
- 1,1,"Tiredness",f,38,43,882,172
- 2,2,"kills",f,275,264,390,186
- 3,3,"A",f,0,699,77,131
- 4,3,"short",f,128,705,355,134
- 5,3,"break",f,542,710,396,131
- 6,4,"could",f,87,884,370,137
- 7,4,"save",f,517,919,314,105
- 8,5,"your",f,166,1095,302,136
- 9,5,"life",f,530,1069,213,137
Text segmentation
To evaluate text segmentation we use, in addition to the .txt file a labeled image (each character is labeled differently). Each GT object is represented by a character. Character-level GT objects cannot be grouped into regions and consequently each text object has a different region tag. The x, y, width and height will define the coordinates of the bounding box of each character. e.g.
img_1
960,1280
1,1,"",f,384,43,101,166
2,2,"",f,142,44,46,164
3,3,"",f,38,47,106,163
4,4,"",f,192,80,71,126
5,5,"",f,269,80,100,131
6,6,"",f,501,81,97,126
7,7,"",f,721,81,97,131
Mask representation
In addition to bounding boxes, text objects can also be represented using masks. EvaLTex takes as input binary images (white corresponds to text). TODO add images
Detection
Output
The output consists in a local valuation (for each image), as well as a global evaluation (one XML file for a whole database).
TODO add local XML file and global XML file
Performance measurements
Local evaluation
For each matched GT object we assign two quality measures: Coverage (Cov) and Accuracy (Acc);
- Cov computes the rate of the matched area with respect to the GT object area
- Acc computes the rate of the matched area with respect to the detection area
Recall
The Recall () computes the amount of detected text. We compute 3 measures: a global , a quantitative that measures the amount detected objects (regardless of the matched area) and a qualitative that corresponds to the rate of the detected text area with respect to the number of true positives ().
Precision
The Precision () computes the rate of detections that have a match in the GT. Similarly to , we compute 3 measures: a quantitative that measures the amount of valid detections (regardless of the matched area) and a qualitative that corresponds to the rate of the detected text area with respect to the number of total detections, computed as the sum of and
How to compute the measurements
Parameters to run the tool