A novel method for segmenting and straightening of text lines in handwritten Telugu documents based on smearing and regression approach

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    In handwritten document images, segmenting text lines is a very challenging task due to various reasons like variability in intra baseline skew and inter line distance between text lines. So far, no work is reported in the literature for the straightening of handwritten Telugu languages. Telugu is one of the most popular languages of India that is spoken by more than 80 million people especially in South India. Telugu characters are mostly compound characters and that is way the straightening task of Telugu document is more challenging tasks than European languages. This paper introduces a novel approach for segmenting and straightening text lines of handwritten Telugu documents based on smearing and regression approach (SRA). This method initially performs preprocessing and estimates parameters by dividing into connected components of Telugu script. A horizontal and vertical run length-smearing algorithm is used in this paper to shape text lines. To identify text lines more precisely cubic polynomial regression is used between vertical midpoints of two blocks of compound handwritten Telugu characters. A simple logic is derived on this to achieve final process. We tested the proposed algorithm with three different kind of 1000 handwritten documents. The performance of proposed method is evaluated by using matchScore, detection rate, recognition accuracy and F-measure. The experimental results indicates the efficiency of the proposed method over the existing methods. 



  • Keywords

    Telugu Languages Text Lines; Compound Characters; Run Length Smearing; Cubic Polynomial Regression.

  • References

      [1] L. Likforman-Sulem, Abderrazak Zahour and Bruno Taconet, “Text Line Segmentation of Historical Documents: A Survey”, International Journal on Document Analysis and Recognition (2007) 9:123–138. https://doi.org/10.1007/s10032-006-0023-z.

      [2] Zahour, A., Taconet, B., Mercy, P., Ramdane, S.” Arabic hand-written text-line extraction”, ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 281–285, 2001.

      [3] B. Yosef, N. Hagbi, K. Kedem, I. Dinstein, “Line Segmentation for Degraded Handwritten Historical Documents,” Proc. 10th ICDAR, pp. 1161-1165, 2009. (4).

      [4] Shi, Z., Govindaraju, V.”Line separation for complex document images using fuzzy runlength” In: Proceedings of the International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, 23–24 January 2004 (50).

      [5] Darko Brodi, “Text line segmentation with water flow algorithm based on power function”, Journal of electrical engineering, VOL. 66, NO. 3, 2015, 132–141.(6).

      [6] Rajath.A.N,” An Adaptive Approach: Text Line Extraction from Multi-Skewed Hand Written Documents” IJCSET (www.ijcset.net). June 2015 Vol 5, Issue 6,158-161.(7).

      [7] G. G. Rajput, Suryakant B. Ummapure, Preethi N. Patil ,” Text-Line Extraction from Handwritten Document images using Histogram and Connected Component”, International Journal of Computer Applications (0975 – 8887) National conference on Digital Image and Signal Processing, DISP 2015.(8)

      [8] Dibyayan Chakraborty and Umapada Pala,” Baseline Detection of Multi-lingual Unconstrained Handwritten Text Lines”. Pattern Recognition Letters (2016), https://doi.org/10.1016/j.patrec.2016.02.003.

      [9] Rahul Garg, Naresh Kumar Garg,” Problems and Review of Line Segmentation of Handwritten Text Document”, International Journal of Advanced Research in Computer Science and Software Engineering 4(4), April - 2014, pp. 1036-1039.(14).

      [10] Satadal Saha, Subhadip Basu, Mita Nasipuri and Dipak Kr. Basu, “A Hough Transform based Technique for Text Segmentation”, Journal of computing, Volume 2, Issue 2, February 2010, ISSN 2151-9617 (10 -15).

      [11] Sunanda Dixit, sneha,Nilotap Utkalit and Suresh H.N ,” Text Line Segmentation of Handwritten Documents in Hindi and English”, International Journal on Recent and Innovation Trends in Computing and Communication, Volume: 2 Issue: 4.(16).

      [12] Amreen Singh and Er. Sukhpreet Singh,” Line Segmentation of Handwritten Documents written in Gurumukhi Script”, International Journal of Application or Innovation in Engineering & Management (IJAIEM). Volume 2, Issue 8, August 2013. (17).

      [13] M.Ravi Kumar, Nayana N Shetty and B.P.Pragathi, “ Text Line Segmentation of Handwritten Documents using Clustering Method based on Thresholding Approach”, International Journal of Computer Applications (0975 – 8878) NCACC, April 2012.(18).

      [14] Jayant Kumar Le Kang David Doermann Wael Abd-Almageed,” Segmentation of Handwritten Textlines in Presence of Touching Components”, 2011 International Conference on Document Analysis and Recognition (29).

      [15] Jewoong Ryu, Hyung Il Koo and Nam Ik Cho,” Language Independent Text-Line Extraction Algorithm for Handwritten Documents”, IEEE Signal processing letters, VOL. 21, NO. 9, SEPTEMBER 2014 (15).

      [16] G. Louloudisa B. Gatos, I. Pratikakis C. Halatsis,”Text line and word segmentation of handwritten documents”, Pattern Recognition 42 (2009) 3169 – 3183. (23) https://doi.org/10.1016/j.patcog.2008.12.016.

      [17] Zaidi Razak, Khansa Zulkiflee, Mohd Yamani Idna Idris, Emran Mohd Tamil, Mohd Noorzaily Mohamed Noor, Rosli Salleh,” Off-line Handwriting Text Line Segmentation: A Review”, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 2008.(24).

      [18] Chethana H T and Mamatha H R,” Comparative Study of Text Line Segmentation on Handwritten Kannada Documents”, International Journal of Computer Science and Information Technologies, Vol. 7 (1) , 2016, 26-33.(25).

      [19] Abdollah Amirkhani-Shahraki , Amir Ebrahimi Ghahnavieh and Seyyed Abdollah Mirmahdavi,” A Morphological Approach to Persian Handwritten Text Line Segmentation”, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation.(26).

      [20] Samir Malakar, Sougata Halder, Ram Sarkar, Nibaran Das, Subhadip Baus, Mita Nasipuri: “Text line extraction from handwritten document pages using spiral run length smearing algorithm”. 2012 International Conference on Communications, Devices and Intelligent Systems (CODIS). (27) https://doi.org/10.1109/CODIS.2012.6422278.

      [21] H. I. Koo and N. I. Cho, “Text-line extraction in handwritten Chinese documents based on an energy minimization framework”, IEEE Trans. Image Process., vol. 21, no. 3, pp. 1169–75, Mar. 2012.(21).

      [22] Alireza Alaei, P. Nagabhushan and Umapada Pal,” A New Text-line Alignment Approach Based on Piece-wise Painting Algorithm for Handwritten Documents”, 2011 International Conference on Document Analysis and Recognition.(30).

      [23] Rodolfo P. dos Santos, Gabriela S. Clemente, Tsang Ing Ren and George D.C. Calvalcanti: “Text Line Segmentation Based on Morphology and Histogram Projection”, 2009 10th International Conference on Document Analysis and Recognition.(31) https://doi.org/10.1109/ICDAR.2009.183.

      [24] G. Louloudis, B. Gatos, I. Pratikakisb and C. Halatsis,” Text line detection in handwritten documents”, Pattern Recognition 41 (2008) 3758 – 3772. (32) https://doi.org/10.1016/j.patcog.2008.05.011.

      [25] Vijaya Kumar Koppula, and Atul Negi,” Using Fringe Maps for Text Line Segmentation in Printed or Handwritten Document Images”, 2010 Second Vaagdevi International Conference on Information Technology for Real World Problems.(33).

      [26] Bidyut B. Chaudhuri and Sumedha Bera,” Handwritten Text Line Identification In Indian Scripts”, 2009 10th International Conference on Document Analysis and Recognition. (34) https://doi.org/10.1109/ICDAR.2009.69.

      [27] P. Nagabhushan and Alireza Alaei,” Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique”, International Journal on Computer Science and Engineering. Vol. 02, No. 04, 2010, 907-916.(35).

      [28] Afaz Uddin Ahmed , Taufiq Mahmud Masum , Mohammad Mahbubur Rahman,” Design of an Automated Secure Garage System Using License Plate Recognition Technique”, I.J. Intelligent Systems and Applications, 2014, 02, 22-28 https://doi.org/10.5815/ijisa.2014.02.03.

      [29] Laurence Likforman-Sulem and Claudie Faure,” Extracting text lines in handwritten documents by perceptual grouping”, Published in Advances in Handwriting and Drawing : a multidisciplinary approach,C. Faure, P. Keuss, G. Lorette, A. Winter (eds), pp. 21-38, Europia, Paris, 1994.(36).

      [30] K.S. Sesh Kumar, A.M. Namboodiri, and C.V. Jawahar,” Learning Segmentation of Documents with Complex Scripts”, In the proceedings of ICVGIP 2006, LNCS 4338, pp. 749–760, 2006 (37).

      [31] Mamatha H R and Srikantamurthy K,” Morphological Operations and Projection Profiles based Segmentation of Handwritten Kannada Document”, International Journal of Applied Information Systems (IJAIS) – ISSN: 2249-0868. Volume 4– No.5,October 2012 (19).

      [32] Mamatha Hosalli Ramappa and Srikantamurthy Krishnamurthy,” Skew Detection, Correction and Segmentation of Handwritten Kannada Document”, International Journal of Advanced Science and Technology. Vol. 48, November, 2012.(31).

      [33] Saiprakash Palakollu , Renu Dhir and Rajneesh Rani,” Handwritten Hindi Text Segmentation Techniques for Lines and characters”, Proceedings of the World Congress on Engineering and Computer Science 2012 Vol I WCECS 2012, October 24-26, 2012, San Francisco, USA.(28).

      [34] Pal, U., Datta, S.,” Segmentation of Bangla unconstrained handwritten text”, In: Proceedings of Seventh International Conference on Document Analysis and Recognition, pp. 1128–1132, 2003.(3) https://doi.org/10.1109/ICDAR.2003.1227832.

      [35] Vishwas H. S. and Bindu A.Thomas,” Impact of Smearing Techniques on Text line Localization of Kannada Historical Scripts”, International Journal of Computer Applications (0975 – 8887) (NCESCO 2015).(9).

      [36] Payal Jindal and Balkrishan Jindal,” Line and word segmentation of handwritten text documents written in Gurmukhi script using mid-point detection technique”, International journal of advance research in science and engineering Vol. No. 4,Special issue(01), November 2015.(10).

      [37] Dr.S.Pannirselvam , S.Ponmani,” A Novel Hybrid Model For Tamil Handwritten Character Segmentation”, International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November-2014. ISSN 2229-5518.(12).

      [38] Rahul Garg1 and Naresh Kumar Garg,” An algorithm for Text Line Segmentation in Handwritten Skewed and Overlapped Devanagari Script”,International Journal of Emerging Technology and Advanced Engineering, Volume 4, Issue 5, May 2014.(13).

      [39] Hashem Ghaleb, P. Nagabhushan and Umapada Pal,” Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-words”, I.J. Image, Graphics and Signal Processing,2016, 12, 8-20, Published Online December 2016 in MECS (http://www.mecs-press.org/), DOI: 10.5815/ijigsp.2016.12.02.(47).

      [40] Srinivasa Rao A.V,”Segmentation of Ancient Telugu Text Documents”, I.J. Image, Graphics and Signal Processing,2012, 6, 8-14, Published Online July 2012 in MECS ( https://doi.org/10.5815/ijigsp.2012.06.02.

      [41] Sarbjit Kaur,” An Automatic Number Plate Recognition System under Image Processing”, I.J. Intelligent Systems and Applications, 2016, 3, 14-25, Published Online March 2016 in MECS (http://www.mecs-press.org/), DOI: 10.5815/ijisa.2016.03.02(49).

      [42] Ali Benafia, Smaine Mazouzi and Benafia Sara,” Handwritten Character Recognition on Focused on the Segmentation of Character Prototypes in Small Strips”, I.J. Intelligent Systems and Applications, 2017, 12, 29-45, Published Online December 2017 in MECS (http://www.mecs-press.org/), https://doi.org/10.5815/ijisa.2017.12.04.

      [43] MSLB. Subrahmanyam, V. Vijaya Kumar and B. Eswara Reddy,” A new algorithm for skew detection of Telugu language document based on Principle-axis farthest pairs Quadrilateral (PFPQ)”, I.J. Image, Graphics and Signal Processing,2018, 3, 47-58, Published Online March 2018 in MECS ( https://doi.org/10.5815/ijigsp.2018.03.06.

      [44] N. Shobha Rani and Vasudev T and Pradeep C.H,” A Performance Efficient Technique for Recognition of Telugu Script Using Template Matching”, I.J. Image, Graphics and Signal Processing,2016, 8, 15-23. https://doi.org/10.5815/ijigsp.2016.08.03.

      [45] C. Vasantha Lakshmi, Ritu Jain, C. Patvardhan, “OCR of Printed Telugu Text with High Recognition Accuracies”, ICVGIP 2006, pp. 786 – 795. (38).

      [46] Vijaya Kumar Koppula , Atul Negi,” Fringe Map Based Text Line Segmentation of Printed Telugu Document Images”, Document Analysis and Recognition (ICDAR) ,2011 conference.(40).

      [47] Nobuyuki Otsu (1979), "A threshold selection method from gray-level histograms". IEEE Trans. Sys., Man., Cyber. 9 (1): 62–66. doi:10.1109/TSMC.1979.4310076(41).

      [48] Oztop, E., Mulayim, A.Y., Atalay, V., Yarman-Vural, F.”Repulsive attractive network for baseline extraction on document images”, Signal Process. 75, 1–10 (1999) (45). https://doi.org/10.1016/S0165-1684(98)00220-5.

      [49] B. Gatos, N. Stamatopoulos, and G. Louloudis, “ICDAR 2009 handwriting segmentation contest,” in Int. Conf. Document Analysis and Recognition (ICDAR), 2009, pp. 1393–1397. https://doi.org/10.1109/ICDAR.2009.245.

      [50] N. Stamatopoulos, B. Gatos, G. Louloudis, U. Pal, and A. Alaei, “ICDAR 2013 handwriting segmentation contest,” in Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 1402–1406. https://doi.org/10.1109/ICDAR.2013.283.




Article ID: 13286
DOI: 10.14419/ijet.v7i3.13286

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.