Difference between revisions of "Publications/lazzara.13.ijdar"

Revision as of 18:51, 4 January 2018

Authors: Guillaume Lazzara, Thierry Géraud
Journal: International Journal of Document Analysis and Recognition (IJDAR)
Type: article
Publisher: Springer Berlin Heidelberg
Projects: Olena
Date: 2013-04-25

Abstract

This work is focused on the most commonly used binarization method: Sauvola's. It performs relatively well on classical documents. However, three main defects remain: the window parameter of Sauvola's formula do not fit automatically to the content; it is not robust to low contrasts; it is non-invariant with respect to contrast inversion. Thus, on documents such as magazines, the content may not be retrieved correctly which is crucial for indexing purpose. In this paper, we describe how to implement an efficient multiscale implementation of Sauvola's algorithm in order to guarantee good binarization for both small and large objects inside a single document without adjusting the window size to the content. We also describe on how to implement it in an efficient way, step by step. Pixel-based accuracy and OCR evaluations are performed on more than 120 documents. This implementation remains notably fast compared to the original algorithm. For fixed parameters, text recognition rates and binarization quality are equal or better than other methods on small and medium text and is significantly improved on large text. Thanks to the way it is implemented, it is also more robust on textured text and on image binarization. This implementation extends the robustness of Sauvola's algorithm by making the results almost insensible to the window size whatever the object sizes. Its properties make it usable in full document analysis toolchains.

Documents

Paper

Resources

Evaluation dataset

Online Demo

Source code (under GPL v2 licence) :

git clone git://git.lrde.epita.fr/olena -b papers/lazzara.13.ijdar && cd olena/scribo/scribo/binarization

Evaluation outputs (detailed hereinafter).
Evaluation outputs (detailed hereinafter).

This algorithm is also part of the generic image processing platform Olena

Outputs and Scores on classical datasets

Pixel-based Accuracy Evaluation

1. Sauvola MS_k

Dataset	Recall	p-Recall	Precision	F-measure (± std)	p-FM (± std)	PSNR	DRD	MPM	Outputs
DIBCO'09	88.4088	-	76.3713	78.08 (± 18.498)	-	15.2386	14.084	4.5068	Outputs	Outputs
H-DIBCO'10	52.9688	63.2666	83.6741	61.17 (± 25.326)	69.45 (± 26.880)	14.72	9.510	1.82	Outputs	Outputs
DIBCO'11	79.3667	-	83.9283	79.31 (± 11.711)	-	15.33	8.156	11.5355	Outputs	Outputs
H-DIBCO'12	66.5586	73.3397	88.0996	69.56 (± 18.316)	74.78 (± 17.063)	15.12	10.368	2.79	Outputs	Outputs
CMATERdb 6.1	82.1535	-	95.2257	87.20 (± 8.872)	-	17.10	5.265	0.2405	Outputs	Outputs

2. Sauvola MS_kx

Dataset	Recall	p-Recall	Precision	F-measure (± std)	pF-measure (± std)	PSNR	DRD	MPM	Outputs
DIBCO'09	95.1858	-	69.2059	76.85 (± 20.277)	-	14.52	18.516	8.9698	Outputs	Outputs
H-DIBCO'10	77.1109	88.8740	88.3846	80.03 (± 9.269)	87.07 (± 10.262)	16.36	6.896	3.42	Outputs	Outputs
DIBCO'11	89.7931	-	75.0818	79.70 (± 12.689)	-	14.91	11.669	20.4354	Outputs	Outputs
H-DIBCO'12	83.7305	84.6084	91.8082	81.77 (± 8.765)	86.41 (± 9.768)	16.51	8.369	5.34	Outputs	Outputs
CMATERdb 6.1	91.4903	-	91.6986	91.34 (± 4.232)	-	18.18	3.775	0.8408	Outputs	Outputs

These evaluation shows that our multiscale algorithm is equivalent to the classical Sauvola on historical documents. This result was expected as this algorithm is designed for magazines.

Outputs and Scores on LRDE's Document Binarization Dataset

Results have been produced using LRDE's Document Binarization Dataset (DBD) version 1.0.

Please refer to the article to know how they have been produced and what are the different document qualities.

Binary Outputs

Method	Clean documents	Scanned documents	Original documents
Sauvola	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Sauvola MS_kx	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Sauvola MS_k	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Wolf	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Otsu	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Niblack	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Kim	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
TMMS	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Sauvola MsGb	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs
Su 2011	Outputs	Outputs	Outputs (1)	Outputs (1)	Outputs (2)	Outputs (2)
Lelore	Outputs	Outputs	Outputs	Outputs	Outputs	Outputs

(1) Due to program crashes, one output image is missing (page 103).

(2) Due to program crashes, one output image is missing (pages 43, 46, 80, 90, 94, 151)

Pixel-based Accuracy Evaluation

Method	Precision	Recall	F-Measure	Time (s)
Sauvola MS_kx	0.97	0.94	94.97	170
Lelore	0.99	0.88	92.9	1625
Sauvola MS_k	0.97	0.89	92.10	170
TMMS	0.90	0.95	91.97	250
Wolf	0.99	0.85	91.38	125
Otsu	0.98	0.84	90.33	67
Sauvola	0.99	0.82	89.67	155
Kim	0.99	0.82	89.34	260
Sauvola MsGb	0.99	0.82	89.29	31h
Niblack	0.89	0.91	88.79	170
Su 2011	0.98	0.80	87.3	8800

This evaluation shows that state-of-the-art implementations, like Su 2011, do not handle magazines correctly. The main issues encountered by all the methods are related to large objects and the fact that parameters do not fit to the contents. Our multiscale approach succeeds in improving classical Sauvola algorithm results because large objects are correctly retrieved. The computation time corresponds to the approximate total time needed to compute the whole 125 document binarization. All time measures have been performed on the same computer with an Intel Xeon W3520@2,67Ghz with 6GB of RAM, except for Lelore' method of which time was measured on an Intel i7 860@2,8Ghz.

OCR-based Evaluation

Method	OCR error (%)
Set →	Clean documents			Scanned documents			Original documents
Subset →	S	M	L	S	M	L	S	M	L
Sauvola	2.62	2.61	6.00	5.49	3.87	7.75	2.62	2.61	6.90
Sauvola Ms_kx	2.59	2.21	4.83	5.14	2.74	5.68	2.54	2.28	5.25
Sauvola Ms_k	2.64	2.60	4.78	5.44	3.20	5.15	2.64	2.53	5.41
Wolf	2.60	2.42	5.04	5.14	3.43	6.53	2.61	2.52	6.48
Otsu	3.09	2.55	4.56	6.23	3.58	5.73	2.95	2.90	5.79
Niblack	2.68	2.28	6.79	4.96	5.15	12.79	2.68	2.28	7.80
Kim	2.79	3.01	5.47	7.03	5.08	7.80	2.78	3.01	6.37
TMMS	2.61	2.43	5.25	18.17	11.44	54.83	2.61	2.45	5.84
Sauvola MsGb	5.45	5.14	9.29	9.49	8.40	10.35	6.14	3.93	6.00
Su 2011	2.95	5.01	15.39	7.42	8.54 (3)	31.58 (3)	3.09 (4)	6.24 (4)	17.87 (4)
Lelore	2.46	2.21	4.88	8.01	3.44	8.65	2.51	2.22	4.88

(3) Due to program crashes, those scores do not include the results of 2 medium text lines out of 320 and of 57 small text lines out of 9551.

(4) Due to program crashes, those scores do not include the results of 9 large text lines out of 123, 9 medium text lines out of 320, 432 small text lines out of 9551.

References

[Fabrizio2009] J. Fabrizio, B. Marcotegui, M. Cord, Text segmentation in natural scenes using Toggle-Mapping, in the proceedings of the International Conference on Image Processing (ICIP), 2009, pp.2373-2376. DOI: 10.1109/ICIP.2009.5413435
[Farrahi2010] R. Farrahi Moghaddam, M. Cheriet, A multi-scale framework for adaptive binarization of degraded document images, Pattern Recognition 43, 2010, 2186-2198. DOI: 10.1016/j.patcog.2009.12.024 website.
[Kim2004] In-Jung Kim, Multi-window binarization of camera image for document recognition, in the proceedings of the International Workshop on Frontiers in Handwriting Recognition, 2004, pp. 323- 327. DOI: 10.1109/IWFHR.2004.70
[Lelore2011] T. Lelore, F. Bouchara, Super-resolved binarization of text based on the FAIR algorithm, in the proceedings of the International Conference on Document Analysis and Recognition, 2011, pp.839-843. DOI: 10.1109/ICDAR.2011.172
[Niblack1985] W. Niblack, An introduction to digital image processing, Strandberg Publishing Company Birkeroed, 1985, ISBN: 87-872-0055-4.
[Otsu1975] N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man and Cybernetics, Vol. 9, No. 1., January 1979, pp. 62-66.
[Sauvola2000] J. Sauvola, M. Pietikäinen, Adaptive document image binarization, Pattern Recognition, Volume 33, Issue 2, February 2000, Pages 225-236, ISSN 0031-3203, DOI: 10.1016/S0031-3203(99)00055-2.
[Su2011] B. Su, S. Lu, C. L. Tan, Binarization of historical document images using the local maximum and minimum, in the proceedings of the International Workshop on Document Analysis Systems, pages 159-166, 2010, ISBN: 978-1-60558-773-8, DOI: 10.1145/1815330.1815351 website.
[Wolf2004] C. Wolf, J.-M. Jolion, extraction and recognition of artificial text in multimedia documents, Pattern Analysis & Applications, Springer London, Issn: 1433-7541, pp.309-326, volume: 6, issue: 4, DOI: 10.1007/s10044-003-0197-7.

Bibtex (lrde.bib)

@Article{	  lazzara.13.ijdar,
  author	= {Guillaume Lazzara and Thierry G\'eraud},
  title		= {Efficient Multiscale {S}auvola's Binarization},
  journal	= {International Journal of Document Analysis and Recognition
		  (IJDAR)},
  year		= 2014,
  month		= jun,
  volume	= 17,
  number	= 2,
  pages		= {105--123},
  publisher	= {Springer Berlin Heidelberg},
  doi		= {10.1007/s10032-013-0209-0},
  abstract	= {This work is focused on the most commonly used
		  binarization method: Sauvola's. It performs relatively well
		  on classical documents. However, three main defects remain:
		  the window parameter of Sauvola's formula do not fit
		  automatically to the content; it is not robust to low
		  contrasts; it is non-invariant with respect to contrast
		  inversion. Thus, on documents such as magazines, the
		  content may not be retrieved correctly which is crucial for
		  indexing purpose. In this paper, we describe how to
		  implement an efficient multiscale implementation of
		  Sauvola's algorithm in order to guarantee good binarization
		  for both small and large objects inside a single document
		  without adjusting the window size to the content. We also
		  describe on how to implement it in an efficient way, step
		  by step. Pixel-based accuracy and OCR evaluations are
		  performed on more than 120 documents. This implementation
		  remains notably fast compared to the original algorithm.
		  For fixed parameters, text recognition rates and
		  binarization quality are equal or better than other methods
		  on small and medium text and is significantly improved on
		  large text. Thanks to the way it is implemented, it is also
		  more robust on textured text and on image binarization.
		  This implementation extends the robustness of Sauvola's
		  algorithm by making the results almost insensible to the
		  window size whatever the object sizes. Its properties make
		  it usable in full document analysis toolchains.}
}

@@ Line 7: / Line 7: @@
 | volume = 17
 | number = 2
-| pages = 105 to 123
+| pages = 105–123
 | publisher = Springer Berlin Heidelberg
 | abstract = This work is focused on the most commonly used binarization method: Sauvola's. It performs relatively well on classical documents. However, three main defects remain: the window parameter of Sauvola's formula do not fit automatically to the content; it is not robust to low contrasts; it is non-invariant with respect to contrast inversion. Thus, on documents such as magazines, the content may not be retrieved correctly which is crucial for indexing purpose. In this paper, we describe how to implement an efficient multiscale implementation of Sauvola's algorithm in order to guarantee good binarization for both small and large objects inside a single document without adjusting the window size to the content. We also describe on how to implement it in an efficient way, step by step. Pixel-based accuracy and OCR evaluations are performed on more than 120 documents. This implementation remains notably fast compared to the original algorithm. For fixed parameters, text recognition rates and binarization quality are equal or better than other methods on small and medium text and is significantly improved on large text. Thanks to the way it is implemented, it is also more robust on textured text and on image binarization. This implementation extends the robustness of Sauvola's algorithm by making the results almost insensible to the window size whatever the object sizes. Its properties make it usable in full document analysis toolchains.

Difference between revisions of "Publications/lazzara.13.ijdar"

From LRDE

Revision as of 18:51, 4 January 2018

Contents

Abstract

Documents

Resources

Outputs and Scores on classical datasets

Pixel-based Accuracy Evaluation

Outputs and Scores on LRDE's Document Binarization Dataset

Binary Outputs

Pixel-based Accuracy Evaluation

OCR-based Evaluation

References

Bibtex (lrde.bib)