A Comprehensive Framework for OCR Web Services System for Arabic Calligraphy Documents

Hassanin M. Al-Barhamtoshy; Abdullah S. Al-Ghamdi

doi:10.14419/ijet.v8i1.11.28084

Article Summary Keywords Abstract References Full Article How to cite

Authors
- Hassanin M. Al-Barhamtoshy
- Abdullah S. Al-Ghamdi
2019-03-01

https://doi.org/10.14419/ijet.v8i1.11.28084
Arabic, Document analysis, connected component, sparse, segmentation, OCR web services.
This paper describes document layout analysis web services approach for OCR systems, in case of integrate with web-based applications using SOAP and REST interfaces. The proposed solution provides accessing way to use different OCR systems. Therefore, these web services are implemented using SOAP and REST interfaces through HTTP or HTTPS requests. Consequently, different developers can communicate with each otherâ€™s without time consuming to customize code implementation, operating system barriers, and programming language conditions.
Â
The scientific scope of this paper focuses on three objectives:
(1)Â Â The document categories on which they are included in the dataset, (2) The related algorithms that are used in the level of document analysis, and (3) The Arabic document image segmentation algorithms they are used. Consequently, the connected components method is used to remove page frame in the old and calligraphy documents. Also, shadow noises in the old and historical documents are removed using the adapted sparse algorithm.
This paper discusses a number of the major areas where OCR web services have been working comprehensively: in supporting document analysis and OCR service-oriented architecture computing. Using the OCR web services approaches, we are dealing with heterogeneous large scale documents with wide varying structured category. Furthermore, there could be multipage document with different languages. Accordingly, the language domain will be identified within the language script specification module.
Â
References
1. [1] S. Setlur and Z. Shi, (2014). â€œAsian character Recognitionâ€, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_14, Springer-Verlang London, pp. 459-486.
  [2] H. Cao and P. Natarajan, (2014). â€œMachine printed character recognitionâ€, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_44, Springer-Verlang London, pp. 331-358.
  [3] H.Al-Barhamtoshy, and M. Rashwan, (2014). â€œArabic OCR Segmented-based Systemâ€, Life Science Journal, 11 (10), (ISSN: 1097- 8135),http://www.lifesciencesite.com/lsj/life1110/200_27304life111014_1273_1283. pdf&sa=X&scisig=AAGBfm0YM6ykkOm8jGglYVhx2mT-ZU8OIA&oi=scholaralrt, http://www.lifesciencesite.com.
  [4] U. Pal, and N. Dash, (2014). â€œLanguage, Script, and Font Recognitionâ€, D. Dormann, K. Tombre (Eds.), Handbook of Document Image processing and Recognition, DOI 10.1007/978-0-85729-859-1_9, Springer-Verlang London, pp. 291-330.
  [5] S. Zha, X. Peng, H. Cao, X. Zhuang, P. Natarajan, and P. Natarajan, (2014). â€œText Classification via iVector Based Feature Representationâ€. 11th IAPR International Workshop on Document Analysis System, IEE, pp. 151-155.
  [6] K. El-Gajoui and F. Ataa-Allah, (2014). â€œOptical character recognition for multilingual documentsâ€: Amazigh-French Abstract-Optical, IEEE Second World Conference on Complex Systems, pp. 978-1-4799-4647-1.
  [7] M. S. Khorsheed and H. Al-Omari, (2011). â€œRecognizing Cursive Arabic Text: Using statistical features and interconnected mono-HMMsâ€, 4th IEEE International Congress on Image and Signal Processing, pp. 1540-1543.
  [8] Krayem, N. Sherkat, L. Evett, and T. Osman, (2013). â€œHolistic Arabic Whole Word Recognition using HMM and Block-based DCTâ€. 12th International Conference on Document Analysis and Recognition, pp. 1120-1124.
  [9] M. Baechler, M. Liwicki, R. Ingold, â€œText line extraction using DMLP classifiers for historical manuscriptsâ€, in: Proceedings of 12th ICDAR, IEEE, 2013, p. 1029.
  [10] S. Cholia, D. Skinner, and J. Boverhof, â€œNEWT: A RESTful service for building High Performance Computing web applications,â€ in 2010
  [11] Gateway Computing Environments Workshop, 2010.
  [12] Lamiroy and D. Lopresti, â€œAn Open Architecture for End-to-End Document Analysis Benchmarking,â€ in 2011 International Conference on Document Analysis and Recognition, sep 2011, pp. 42â€“47.
  [13] H. M. Al-barhamtoshy, (2016). â€œTowards Large Scale Image Similarity Discovery Modelâ€, 2nd International Conference on Advanced Technologies for Signal& Image Processing ATSIPâ€™2016, March 21-24, Monastir Tunisia, http://ieeexplore.ieee.org/stamp/stamp. jsp?tp=&arnumber=7523047
  [14] S. Eskenazi, P. Kramer, J. Ogier, (2017). â€œA Comprehensive Survey of mostly Textual Document Segmentation Algorithmsâ€, since 2008, Pattern Recognition 64 (2017) 1-14.
Downloads
How to Cite
M. Al-Barhamtoshy, H., & S. Al-Ghamdi, A. (2019). A Comprehensive Framework for OCR Web Services System for Arabic Calligraphy Documents. International Journal of Engineering & Technology, 8(1.11), 16-24. https://doi.org/10.14419/ijet.v8i1.11.28084
ACM

ACS

APA

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Endnote/Zotero/Mendeley (RIS)

BibTeX

A Comprehensive Framework for OCR Web Services System for Arabic Calligraphy Documents

Authors

References

Downloads

How to Cite

Published