A Comprehensive Framework for OCR Web Services System for Arabic Calligraphy Documents

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    This paper describes document layout analysis web services approach for OCR systems, in case of integrate with web-based applications using SOAP and REST interfaces. The proposed solution provides accessing way to use different OCR systems. Therefore, these web services are implemented using SOAP and REST interfaces through HTTP or HTTPS requests. Consequently, different developers can communicate with each other’s without time consuming to customize code implementation, operating system barriers, and programming language conditions.

     

    The scientific scope of this paper focuses on three objectives:

    (1)   The document categories on which they are included in the dataset, (2) The related algorithms that are used in the level of document analysis, and (3) The Arabic document image segmentation algorithms they are used. Consequently, the connected components method is used to remove page frame in the old and calligraphy documents. Also, shadow noises in the old and historical documents are removed using the adapted sparse algorithm.

    This paper discusses a number of the major areas where OCR web services have been working comprehensively: in supporting document analysis and OCR service-oriented architecture computing. Using the OCR web services approaches, we are dealing with heterogeneous large scale documents with wide varying structured category. Furthermore, there could be multipage document with different languages. Accordingly, the language domain will be identified within the language script specification module.

     


  • Keywords


    Arabic; Document analysis, connected component; sparse; segmentation; OCR web services.

  • References


      [1] S. Setlur and Z. Shi, (2014). “Asian character Recognition”, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_14, Springer-Verlang London, pp. 459-486.

      [2] H. Cao and P. Natarajan, (2014). “Machine printed character recognition”, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_44, Springer-Verlang London, pp. 331-358.

      [3] H.Al-Barhamtoshy, and M. Rashwan, (2014). “Arabic OCR Segmented-based System”, Life Science Journal, 11 (10), (ISSN: 1097- 8135),http://www.lifesciencesite.com/lsj/life1110/200_27304life111014_1273_1283. pdf&sa=X&scisig=AAGBfm0YM6ykkOm8jGglYVhx2mT-ZU8OIA&oi=scholaralrt, http://www.lifesciencesite.com.

      [4] U. Pal, and N. Dash, (2014). “Language, Script, and Font Recognition”, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_9, Springer-Verlang London, pp. 291-330.

      [5] S. Zha, X. Peng, H. Cao, X. Zhuang, P. Natarajan, and P. Natarajan, (2014). “Text Classification via iVector Based Feature Representation”. 11th IAPR International Workshop on Document Analysis System, IEE, pp. 151-155.

      [6] K. El-Gajoui and F. Ataa-Allah, (2014). “Optical character recognition for multilingual documents”: Amazigh-French Abstract-Optical, IEEE Second World Conference on Complex Systems, pp. 978-1-4799-4647-1.

      [7] M. S. Khorsheed and H. Al-Omari, (2011). “Recognizing Cursive Arabic Text: Using statistical features and interconnected mono-HMMs”, 4th IEEE International Congress on Image and Signal Processing, pp. 1540-1543.

      [8] Krayem, N. Sherkat, L. Evett, and T. Osman, (2013). “Holistic Arabic Whole Word Recognition using HMM and Block-based DCT”. 12th International Conference on Document Analysis and Recognition, pp. 1120-1124.

      [9] M. Baechler, M. Liwicki, R. Ingold, “Text line extraction using DMLP classifiers for historical manuscripts”, in: Proceedings of 12th ICDAR, IEEE, 2013, p. 1029.

      [10] S. Cholia, D. Skinner, and J. Boverhof, “NEWT: A RESTful service for building High Performance Computing web applications,” in 2010

      [11] Gateway Computing Environments Workshop, 2010.

      [12] Lamiroy and D. Lopresti, “An Open Architecture for End-to-End Document Analysis Benchmarking,” in 2011 International Conference on Document Analysis and Recognition, sep 2011, pp. 42–47.

      [13] H. M. Al-barhamtoshy, (2016). “Towards Large Scale Image Similarity Discovery Model”, 2nd International Conference on Advanced Technologies for Signal& Image Processing ATSIP’2016, March 21-24, Monastir Tunisia, http://ieeexplore.ieee.org/stamp/stamp. jsp?tp=&arnumber=7523047

      [14] S. Eskenazi, P. Kramer, J. Ogier, (2017). “A Comprehensive Survey of mostly Textual Document Segmentation Algorithms”, since 2008, Pattern Recognition 64 (2017) 1-14.


 

View

Download

Article ID: 28084
 
DOI: 10.14419/ijet.v8i1.11.28084




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.