A novel method for segmenting and straightening of text lines in handwritten Telugu documents based on smearing and regression approach

  • Authors

    • Mslb. Subrahmanyam JNTU kakinada
    • V Vijaya Kumar
    • B Eswara Reddy
    2018-08-22
    https://doi.org/10.14419/ijet.v7i3.13286
  • Telugu Languages Text Lines, Compound Characters, Run Length Smearing, Cubic Polynomial Regression.
  • In handwritten document images, segmenting text lines is a very challenging task due to various reasons like variability in intra baseline skew and inter line distance between text lines. So far, no work is reported in the literature for the straightening of handwritten Telugu languages. Telugu is one of the most popular languages of India that is spoken by more than 80 million people especially in South India. Telugu characters are mostly compound characters and that is way the straightening task of Telugu document is more challenging tasks than European languages. This paper introduces a novel approach for segmenting and straightening text lines of handwritten Telugu documents based on smearing and regression approach (SRA). This method initially performs preprocessing and estimates parameters by dividing into connected components of Telugu script. A horizontal and vertical run length-smearing algorithm is used in this paper to shape text lines. To identify text lines more precisely cubic polynomial regression is used between vertical midpoints of two blocks of compound handwritten Telugu characters. A simple logic is derived on this to achieve final process. We tested the proposed algorithm with three different kind of 1000 handwritten documents. The performance of proposed method is evaluated by using matchScore, detection rate, recognition accuracy and F-measure. The experimental results indicates the efficiency of the proposed method over the existing methods. 

     

     

  • References

    1. [1] L. Likforman-Sulem, Abderrazak Zahour and Bruno Taconet, “Text Line Segmentation of Historical Documents: A Surveyâ€, International Journal on Document Analysis and Recognition (2007) 9:123–138. https://doi.org/10.1007/s10032-006-0023-z.

      [2] Zahour, A., Taconet, B., Mercy, P., Ramdane, S.†Arabic hand-written text-line extractionâ€, ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 281–285, 2001.

      [3] B. Yosef, N. Hagbi, K. Kedem, I. Dinstein, “Line Segmentation for Degraded Handwritten Historical Documents,†Proc. 10th ICDAR, pp. 1161-1165, 2009. (4).

      [4] Shi, Z., Govindaraju, V.â€Line separation for complex document images using fuzzy runlength†In: Proceedings of the International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, 23–24 January 2004 (50).

      [5] Darko Brodi, “Text line segmentation with water flow algorithm based on power functionâ€, Journal of electrical engineering, VOL. 66, NO. 3, 2015, 132–141.(6).

      [6] Rajath.A.N,†An Adaptive Approach: Text Line Extraction from Multi-Skewed Hand Written Documents†IJCSET (www.ijcset.net). June 2015 Vol 5, Issue 6,158-161.(7).

      [7] G. G. Rajput, Suryakant B. Ummapure, Preethi N. Patil ,†Text-Line Extraction from Handwritten Document images using Histogram and Connected Componentâ€, International Journal of Computer Applications (0975 – 8887) National conference on Digital Image and Signal Processing, DISP 2015.(8)

      [8] Dibyayan Chakraborty and Umapada Pala,†Baseline Detection of Multi-lingual Unconstrained Handwritten Text Linesâ€. Pattern Recognition Letters (2016), https://doi.org/10.1016/j.patrec.2016.02.003.

      [9] Rahul Garg, Naresh Kumar Garg,†Problems and Review of Line Segmentation of Handwritten Text Documentâ€, International Journal of Advanced Research in Computer Science and Software Engineering 4(4), April - 2014, pp. 1036-1039.(14).

      [10] Satadal Saha, Subhadip Basu, Mita Nasipuri and Dipak Kr. Basu, “A Hough Transform based Technique for Text Segmentationâ€, Journal of computing, Volume 2, Issue 2, February 2010, ISSN 2151-9617 (10 -15).

      [11] Sunanda Dixit, sneha,Nilotap Utkalit and Suresh H.N ,†Text Line Segmentation of Handwritten Documents in Hindi and Englishâ€, International Journal on Recent and Innovation Trends in Computing and Communication, Volume: 2 Issue: 4.(16).

      [12] Amreen Singh and Er. Sukhpreet Singh,†Line Segmentation of Handwritten Documents written in Gurumukhi Scriptâ€, International Journal of Application or Innovation in Engineering & Management (IJAIEM). Volume 2, Issue 8, August 2013. (17).

      [13] M.Ravi Kumar, Nayana N Shetty and B.P.Pragathi, “ Text Line Segmentation of Handwritten Documents using Clustering Method based on Thresholding Approachâ€, International Journal of Computer Applications (0975 – 8878) NCACC, April 2012.(18).

      [14] Jayant Kumar Le Kang David Doermann Wael Abd-Almageed,†Segmentation of Handwritten Textlines in Presence of Touching Componentsâ€, 2011 International Conference on Document Analysis and Recognition (29).

      [15] Jewoong Ryu, Hyung Il Koo and Nam Ik Cho,†Language Independent Text-Line Extraction Algorithm for Handwritten Documentsâ€, IEEE Signal processing letters, VOL. 21, NO. 9, SEPTEMBER 2014 (15).

      [16] G. Louloudisa B. Gatos, I. Pratikakis C. Halatsis,â€Text line and word segmentation of handwritten documentsâ€, Pattern Recognition 42 (2009) 3169 – 3183. (23) https://doi.org/10.1016/j.patcog.2008.12.016.

      [17] Zaidi Razak, Khansa Zulkiflee, Mohd Yamani Idna Idris, Emran Mohd Tamil, Mohd Noorzaily Mohamed Noor, Rosli Salleh,†Off-line Handwriting Text Line Segmentation: A Reviewâ€, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 2008.(24).

      [18] Chethana H T and Mamatha H R,†Comparative Study of Text Line Segmentation on Handwritten Kannada Documentsâ€, International Journal of Computer Science and Information Technologies, Vol. 7 (1) , 2016, 26-33.(25).

      [19] Abdollah Amirkhani-Shahraki , Amir Ebrahimi Ghahnavieh and Seyyed Abdollah Mirmahdavi,†A Morphological Approach to Persian Handwritten Text Line Segmentationâ€, 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation.(26).

      [20] Samir Malakar, Sougata Halder, Ram Sarkar, Nibaran Das, Subhadip Baus, Mita Nasipuri: “Text line extraction from handwritten document pages using spiral run length smearing algorithmâ€. 2012 International Conference on Communications, Devices and Intelligent Systems (CODIS). (27) https://doi.org/10.1109/CODIS.2012.6422278.

      [21] H. I. Koo and N. I. Cho, “Text-line extraction in handwritten Chinese documents based on an energy minimization frameworkâ€, IEEE Trans. Image Process., vol. 21, no. 3, pp. 1169–75, Mar. 2012.(21).

      [22] Alireza Alaei, P. Nagabhushan and Umapada Pal,†A New Text-line Alignment Approach Based on Piece-wise Painting Algorithm for Handwritten Documentsâ€, 2011 International Conference on Document Analysis and Recognition.(30).

      [23] Rodolfo P. dos Santos, Gabriela S. Clemente, Tsang Ing Ren and George D.C. Calvalcanti: “Text Line Segmentation Based on Morphology and Histogram Projectionâ€, 2009 10th International Conference on Document Analysis and Recognition.(31) https://doi.org/10.1109/ICDAR.2009.183.

      [24] G. Louloudis, B. Gatos, I. Pratikakisb and C. Halatsis,†Text line detection in handwritten documentsâ€, Pattern Recognition 41 (2008) 3758 – 3772. (32) https://doi.org/10.1016/j.patcog.2008.05.011.

      [25] Vijaya Kumar Koppula, and Atul Negi,†Using Fringe Maps for Text Line Segmentation in Printed or Handwritten Document Imagesâ€, 2010 Second Vaagdevi International Conference on Information Technology for Real World Problems.(33).

      [26] Bidyut B. Chaudhuri and Sumedha Bera,†Handwritten Text Line Identification In Indian Scriptsâ€, 2009 10th International Conference on Document Analysis and Recognition. (34) https://doi.org/10.1109/ICDAR.2009.69.

      [27] P. Nagabhushan and Alireza Alaei,†Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-techniqueâ€, International Journal on Computer Science and Engineering. Vol. 02, No. 04, 2010, 907-916.(35).

      [28] Afaz Uddin Ahmed , Taufiq Mahmud Masum , Mohammad Mahbubur Rahman,†Design of an Automated Secure Garage System Using License Plate Recognition Techniqueâ€, I.J. Intelligent Systems and Applications, 2014, 02, 22-28 https://doi.org/10.5815/ijisa.2014.02.03.

      [29] Laurence Likforman-Sulem and Claudie Faure,†Extracting text lines in handwritten documents by perceptual groupingâ€, Published in Advances in Handwriting and Drawing : a multidisciplinary approach,C. Faure, P. Keuss, G. Lorette, A. Winter (eds), pp. 21-38, Europia, Paris, 1994.(36).

      [30] K.S. Sesh Kumar, A.M. Namboodiri, and C.V. Jawahar,†Learning Segmentation of Documents with Complex Scriptsâ€, In the proceedings of ICVGIP 2006, LNCS 4338, pp. 749–760, 2006 (37).

      [31] Mamatha H R and Srikantamurthy K,†Morphological Operations and Projection Profiles based Segmentation of Handwritten Kannada Documentâ€, International Journal of Applied Information Systems (IJAIS) – ISSN: 2249-0868. Volume 4– No.5,October 2012 (19).

      [32] Mamatha Hosalli Ramappa and Srikantamurthy Krishnamurthy,†Skew Detection, Correction and Segmentation of Handwritten Kannada Documentâ€, International Journal of Advanced Science and Technology. Vol. 48, November, 2012.(31).

      [33] Saiprakash Palakollu , Renu Dhir and Rajneesh Rani,†Handwritten Hindi Text Segmentation Techniques for Lines and charactersâ€, Proceedings of the World Congress on Engineering and Computer Science 2012 Vol I WCECS 2012, October 24-26, 2012, San Francisco, USA.(28).

      [34] Pal, U., Datta, S.,†Segmentation of Bangla unconstrained handwritten textâ€, In: Proceedings of Seventh International Conference on Document Analysis and Recognition, pp. 1128–1132, 2003.(3) https://doi.org/10.1109/ICDAR.2003.1227832.

      [35] Vishwas H. S. and Bindu A.Thomas,†Impact of Smearing Techniques on Text line Localization of Kannada Historical Scriptsâ€, International Journal of Computer Applications (0975 – 8887) (NCESCO 2015).(9).

      [36] Payal Jindal and Balkrishan Jindal,†Line and word segmentation of handwritten text documents written in Gurmukhi script using mid-point detection techniqueâ€, International journal of advance research in science and engineering Vol. No. 4,Special issue(01), November 2015.(10).

      [37] Dr.S.Pannirselvam , S.Ponmani,†A Novel Hybrid Model For Tamil Handwritten Character Segmentationâ€, International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November-2014. ISSN 2229-5518.(12).

      [38] Rahul Garg1 and Naresh Kumar Garg,†An algorithm for Text Line Segmentation in Handwritten Skewed and Overlapped Devanagari Scriptâ€,International Journal of Emerging Technology and Advanced Engineering, Volume 4, Issue 5, May 2014.(13).

      [39] Hashem Ghaleb, P. Nagabhushan and Umapada Pal,†Graph Modeling based Segmentation of Handwritten Arabic Text into Constituent Sub-wordsâ€, I.J. Image, Graphics and Signal Processing,2016, 12, 8-20, Published Online December 2016 in MECS (http://www.mecs-press.org/), DOI: 10.5815/ijigsp.2016.12.02.(47).

      [40] Srinivasa Rao A.V,â€Segmentation of Ancient Telugu Text Documentsâ€, I.J. Image, Graphics and Signal Processing,2012, 6, 8-14, Published Online July 2012 in MECS ( https://doi.org/10.5815/ijigsp.2012.06.02.

      [41] Sarbjit Kaur,†An Automatic Number Plate Recognition System under Image Processingâ€, I.J. Intelligent Systems and Applications, 2016, 3, 14-25, Published Online March 2016 in MECS (http://www.mecs-press.org/), DOI: 10.5815/ijisa.2016.03.02(49).

      [42] Ali Benafia, Smaine Mazouzi and Benafia Sara,†Handwritten Character Recognition on Focused on the Segmentation of Character Prototypes in Small Stripsâ€, I.J. Intelligent Systems and Applications, 2017, 12, 29-45, Published Online December 2017 in MECS (http://www.mecs-press.org/), https://doi.org/10.5815/ijisa.2017.12.04.

      [43] MSLB. Subrahmanyam, V. Vijaya Kumar and B. Eswara Reddy,†A new algorithm for skew detection of Telugu language document based on Principle-axis farthest pairs Quadrilateral (PFPQ)â€, I.J. Image, Graphics and Signal Processing,2018, 3, 47-58, Published Online March 2018 in MECS ( https://doi.org/10.5815/ijigsp.2018.03.06.

      [44] N. Shobha Rani and Vasudev T and Pradeep C.H,†A Performance Efficient Technique for Recognition of Telugu Script Using Template Matchingâ€, I.J. Image, Graphics and Signal Processing,2016, 8, 15-23. https://doi.org/10.5815/ijigsp.2016.08.03.

      [45] C. Vasantha Lakshmi, Ritu Jain, C. Patvardhan, “OCR of Printed Telugu Text with High Recognition Accuraciesâ€, ICVGIP 2006, pp. 786 – 795. (38).

      [46] Vijaya Kumar Koppula , Atul Negi,†Fringe Map Based Text Line Segmentation of Printed Telugu Document Imagesâ€, Document Analysis and Recognition (ICDAR) ,2011 conference.(40).

      [47] Nobuyuki Otsu (1979), "A threshold selection method from gray-level histograms". IEEE Trans. Sys., Man., Cyber. 9 (1): 62–66. doi:10.1109/TSMC.1979.4310076(41).

      [48] Oztop, E., Mulayim, A.Y., Atalay, V., Yarman-Vural, F.â€Repulsive attractive network for baseline extraction on document imagesâ€, Signal Process. 75, 1–10 (1999) (45). https://doi.org/10.1016/S0165-1684(98)00220-5.

      [49] B. Gatos, N. Stamatopoulos, and G. Louloudis, “ICDAR 2009 handwriting segmentation contest,†in Int. Conf. Document Analysis and Recognition (ICDAR), 2009, pp. 1393–1397. https://doi.org/10.1109/ICDAR.2009.245.

      [50] N. Stamatopoulos, B. Gatos, G. Louloudis, U. Pal, and A. Alaei, “ICDAR 2013 handwriting segmentation contest,†in Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 1402–1406. https://doi.org/10.1109/ICDAR.2013.283.

  • Downloads

  • How to Cite

    Subrahmanyam, M., Vijaya Kumar, V., & Eswara Reddy, B. (2018). A novel method for segmenting and straightening of text lines in handwritten Telugu documents based on smearing and regression approach. International Journal of Engineering & Technology, 7(3), 1846-1853. https://doi.org/10.14419/ijet.v7i3.13286