Authorship Identification of Punjabi Poetry

  • Authors

    • A. Pandian
    • Stephen Wahid
    • Yash Tokas
    • V. V.Ramalingam
    2018-11-27
    https://doi.org/10.14419/ijet.v7i4.19.21987
  • Authorship Identification, Punjabi poetry corpus, Feature extraction, J48 Decision Tree, Bayes Net Classifier, Naive Bayes Classifier
  • The problem of identifying the author of an anonymous text is basically Authorship Identification. It is nothing but a single-label text-categorization task, from the ML point-of-view. An assumption is made that an unknown text’s author can be differentiated by comparing a few lexical features extracted from theunknown text with the same of texts having known authors. In this paper, the process of Authorship Identification is executed on Punjabi poetry dataset consisting of Punjabi poems written by 5 different poets. Various features broadly categorised as statistical (word-count, char-count, etc.), syntactical (i.e. lexical) and semantically (language dependent) are first selected using the J48 Decision Tree Algorithm. The selected features are in turn, used as an input to multiple classifiers (like SVM, SMO, Bayes Net & Naive Bayes) and the proposed system’s validation is evaluated on the basis of Precision, Recall, F-score and Accuracy.

     

     

  • References

    1. [1] FarkhundIqbal, HamadBinsalleeh, Benjamin C.M. Fung,MouradDebbabi, 2015, “E-mail authorship attribution usingcustomized associative classificationâ€,DigitalInvestigation(Elsevier),Vol.7,pp.56-64

      [2] Sanjanasri J.P andAnand Kumar M, “A Computational Framework for Tamil DocumentClassification using Random Kitchen Sinkâ€, IEEE 2015, International Conference onAdvances in Computing, Communications and Informatics(ICACCI)

      [3] Mahmoud Khonji, Youssef Iraqi, Andrew Jones,“An Evaluation of Authorship Attribution Using Random Forestsâ€, IEEE 2015, International Conference on Information andCommunication Technology Research (ICTRC2015)

      [4] Ahmed Fawziotoom, Emad E Abdullah, ShifaaJaafar, AseerHamdellh, Dana Amer, “Towards Author Identification of Arabic Text Articlesâ€, IEEE 2014, 5th InternationalConference on Information and Communication Systems(ICICS)

      [5] Pandian, A., and Md. Abdul KarimSadiq, 2014, “AuthorshipCategorization In Email Investigations Using Fisher’s LinearDiscriminate Method With Radial Basis Functionâ€, InternationalJournal of Computer Science, Vol.10,No.6,pp.1003-1014 (SNIP: 0.874)

      [6] Al-Falahi Ahmed, Ramdani Mohammad, Bellahfkimustafa, Al-Sarem Mohammad, “Authorship Attribution in Arabic Poetryâ€,78-1- 4799-7560- 0/15, 2015, IEEE

      [7] Ahmed FawziOtoom, Emad E. Abdullah, ShifaaJaafer, AseelHamdallh, Dana Amer“Towards Author Identification of Arabic Text Articlesâ€, 2014,IEEE, 5th International Conference on Information andCommunication Systems (ICICS)

      [8] BhargavaUrala k, A.G.Ramakrishnan and Sahil Mohammad, “Recognition of Open Vocabulary, Online Tamil HandwrittenPages in Tamil Scriptâ€, 2014 IEEE, Vol.42, No.3, pp.6-9.

      [9] Pandian A. and Md. Abdul KarimSadiq, 2012, “Detection ofFraudulent Emails by Authorship Extractionâ€, InternationalJournal of Computer Application Vol.41, No.7, pp.7 – 12.

      [10] Pandian A. and Md. Abdul KarimSadiq, 2013, “AuthorshipAttribution in Tamil Language Email For Forensic Analysisâ€,International Review on Computers and Software, Vol. 8, No. 12, pp.2882-2888, (SNIP: 1.178).

      [11] M.Mahalakshmi, MalathiSharavanan, “Ancient Tamil ScriptRecognition and Translation Using LabVIEWâ€, IEEE, 2013,International conference on Communication and SignalProcessing, April 3-5.

      [12] FarkhundIqbal, HamadBinsalleeh, Benjamin C.M. Fung,MouradDebbabi, 2010, “Mining writeprints from anonymous e-mails for forensic investigationâ€,Digital Investigation(Elsevier),Vol.7,pp.56-64

      [13] Bagavandas, M., Hameed, A., Manimannan G, 2009, “NeuralComputation in Authorship Attribution: The Case of SelectedTamil Articlesâ€, Journal Quantitative Linguistics, Vol.16, No.2, pp.115-131.

      [14] R Chandrasekaran and G Manimannan, 2013, “Use ofGeneralized Regression Neural Network in AuthorshipAttributionâ€, International Journal of Computer Applications, Vol.62, No.4, pp.7-10.

      [15] Pandian A. and Md. Abdul KarimSadiq, 2014, “A study ofAuthorship Identification Techniques in Tamil Articlesâ€,International Journal of Software and Web Sciences, Vol. 7 No.1, pp.105-108.

  • Downloads

  • How to Cite

    Pandian, A., Wahid, S., Tokas, Y., & V.Ramalingam, V. (2018). Authorship Identification of Punjabi Poetry. International Journal of Engineering & Technology, 7(4.19), 13-16. https://doi.org/10.14419/ijet.v7i4.19.21987