Generating realistic Arabic handwriting dataset
-
https://doi.org/10.14419/ijet.v8i4.29786
Received date: August 25, 2019
Accepted date: October 5, 2019
Published date: October 19, 2019
-
Arabic handwriting, normalization, ligatures, template learning, Gaussian regression. -
Abstract
During the previous year's holistic approach showing satisfactory results to solve the problem of Arabic handwriting word recognition instead of word letters segmentation. In this paper, we present an efficient system for generation realistic Arabic handwriting dataset from ASCII input text. We carefully selected sample words list that contains most Arabic letters normal and ligature connection cases. To improve the performance of new letters reproduction we developed our normalization method that adapt its clustering process according to created Arabic letters families. We enhanced Gaussian Mixture Model process to learn letters template by detecting the number and position of Gaussian component by implementing Ramer-Douglas-Peucker algorithm which improve the reproduction of new letters shapes by using Gaussian Mixture Regression. We learn the translation distance between word-part to achieve real handwriting word generation shape. Using combination of LSTM and CTC layer as a recognizer to validate the efficiency of our approach in generating new realistic Arabic handwriting words inherit user handwriting style as shown by the experimental results.
-
References
- Mamoun Sakkal," Arabic Alphabet Chart in Naskh Style", www.sakkal.com.
- A. Amin, 2000, “Recognition of Printed Arabic Text Based on Global Features and Decision Tree Learning Techniques”, Pattern Recognition, vol. 33, pp. 1309–1323. https://doi.org/10.1016/S0031-3203(99)00114-4.
- Yannis H.,1995, "The Traditional Arabic Type-case Extended to the Unicode Set of Glyphs" Electronic Publishing, Vol. 8, pp. 111-123.
- A. Graves, “Generating sequences with recurrent neural networks,” CoRR, vol. abs/1308.0850, 2013. [Online]. Available: http://arxiv.org/abs/1308.0850
- Y. Elarian, Husni Al-Muhtaseb, and LahouariGhouti,2010, "Arabic Handwriting Synthesis", International Workshop on Frontiers in Arabic Hand-writing Recognition, Istanbul.
- Margner V, Pechwitz M (2001) Synthetic Data for Arabic OCR System Development. In: Sixth International Conference on Document Analysis and Recognition (ICDAR'01), IEEE: 1159-1163.
- R.M. Saabni, J.A. El-Sana,2013, "Comprehensive synthetic Arabic database for on/offline script recognition research," Int. J. Doc. Anal. Recognit. (IJDAR) 16 (3) pp. 285–294. https://doi.org/10.1007/s10032-012-0189-5.
- Shatnawi M. and Abdallah S.,2015,"Improving Handwritten Arabic Character Recognition by Modeling Human Handwriting Distortions," ACM Transactions on Asian and Low-Resources Information Processing. https://doi.org/10.1145/2764456.
- A. Almaksour, E. Anquetil, R. Plamondon, and C. O'Reilly, Synthetic handwritten gesture generation using sigma-lognormal model for evolving handwriting classifiers, in: Proceedings of the 15th Biennial Conference of the International Graphonomics Society, 2011, pp.98–101.
- Y. Zheng and D. Doermann, “Handwriting matching and its application to handwriting synthesis,” in Proceedings of the Eight International Con-ference on Document Analysis and Recognition (ICDAR), 2005, pp. 861–865.
- Dinges, L.; Al-Hamadi, A.; Elzobi, M.; El etriby, S.; Ghoneim, A. ASM based Synthesis of Handwritten Arabic Text Pages. Sci. World J. 2015, 2015, 323575. https://doi.org/10.1155/2015/323575.
- D. Salomon, “Curves and Surfaces for Computer Graphics”, Ch.1, pp.7-14, Springer, 2006.
- Mustaffa and Yusof. A Comparison of Normalization Techiques in Predicting Dengue Outbreak. International Cinference on Business and Eco-nomics Resaerch, vol.1(2011) © (2011) LACSIT Press, Kuala Lumpur, Malaysia.
- Patel and Mehta. Impact of Outlier Removal and Normalization Approach in Modified k-Means Clustering Algorithm. IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 2, September 2011, ISSN (Online): 1694-0814.
- G.Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 6, 1978, pp. 461-464. https://doi.org/10.1214/aos/1176344136.
- C. Biernacki, G.Celeux and G. Govarert, “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood,” Technical Report 3,521, Inria, 1998.
- A.Likas, N.Vlassis, and J.Verbeek, “The Global k-means clustering algorithm,” Pattern Recognition 36, 2003, pp. 451-461. [12] J.Verbeek, N.Vlassis, and B.Krose, “Efficient Greedy Learning of Gaussian Mixture,” Neural Computation 15, 2003, pp. 469-485. https://doi.org/10.1016/S0031-3203(02)00060-2.
- Y Lee, KY Lee, J Lee.,2006, "The estimating optimal number of Gaussian mixtures based on incremental k-means for speaker identification", Inter-national Journal of Information Technology 12 (7), pp13-21.
- U. Ramer, An iterative procedure for the polygonal approximation of plane curves, Computer Graphics and Image Processing 1(3) (1972) 244-256. https://doi.org/10.1016/S0146-664X(72)80017-0.
- D.H. Douglas, T.K. Peucker, Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, Carto-graphical: The International Journal for Geographic Information and Geovisualization 10(1973) 112-122. https://doi.org/10.3138/FM57-6770-U75U-7727.
- A. Dempster and N. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the Royal Statistical Society, vol. 39(1), pp. 1–38, 1977 https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.
- D. Cohn, Z. Ghahramani, and M. Jordan, Active learning with statistical models. Articial Intelligence Research, vol. 4, pp. 129145, 1996. https://doi.org/10.1613/jair.295.
- Alex Graves and Jürgen S. 2009, "Offline handwriting recognition with multidimensional recurrent neural networks". In Advances in Neural In-formation Processing Systems 21, pp 545-552.
-
Downloads
-
How to Cite
I. Abdalla, M., A. Rashwan, M., & A. Elserafy, M. (2019). Generating realistic Arabic handwriting dataset. International Journal of Engineering and Technology, 8(4), 460-466. https://doi.org/10.14419/ijet.v8i4.29786
