Extreme Learning Machine Classification of File Clusters for Evaluating Content-based Feature Vectors

  • Authors

    • Rabei Raad Ali
    • Kamaruddin Malik Mohamad
    • Sapiee Jamel
    • Shamsul Kamal Ahmad Khalid
  • Multimedia Clusters, JPEG image, Extreme Learning Machine (ELM), Feature selection.
  • In the digital forensic investigation and missing data files retrieval in general, there is a challenge of recovering files that have missing system information. The recovery process entails applying a number of methods to determine the type, the contents and the structure of each data file clusters such as JPEG, DOC, ZIP or TXT. This paper studies the effects of three content-based features extraction methods in improving the classification of JPEG File clusters. The methods are Byte Frequency Distribution, Entropy, and Rate of Change. Consequently, an Extreme Learning Machine (ELM) neural network algorithm is used to evaluate the performance of the three methods in which it classifies the class label of the feature vectors to JPEG and Non-JPEG images for files in different file formats. The files are allocated in a continuous series of clusters. The ELM algorithm is applied to the DFRWS (2006) dataset and the results show that the combination of the three methods produces 93.46% classification accuracy.



  • References

    1. [1] A. Bhagat, R. Chaudhari,.and K. Dongre, “Content-based file sharing in peer-to-peer networks using thresholdâ€. Procedia Computer Science 79, 2016, 53-60.

      [2] Hard drive physical sectors architecture and data reading process, ACE Data Group, <https://www.datarecovery.net/articles/hard-drive-sectordamage.aspx>.

      [3] M. Nadeem Ashraf, “Forensic Multimedia File Carvingâ€, (2013).

      [4] Y. Tang, J. Fang, K. P. Chow, S. M.Yiu, J. Xu, B. Feng, and Q. Han, "Recovery of heavily fragmented JPEG files." Digital Investigation 18, 2016, S108-S117.

      [5] C. J. Veenman,. “Statistical disk cluster classification for file carvingâ€. In Information Assurance and Security, 2007. IAS 2007. Third International Symposium on, August 2007, pp. 393-398. IEEE.

      [6] A. Pal, N. Memon, “The evolution of file carvingâ€. In Signal Processing Magazine, vol. 26, no. 2, 2009, pp. 59—71. IEEE.

      [7] M. Shannon, “Forensic relative strength scoring: ASCII and entropy scoringâ€, International Journal of Digital Evidence, 2(4), 2004, 1-19.

      [8] L. Zhang, D. Zhang, and F. Tian “SVM and ELM: Who Wins? Object recognition with deep convolutional features from ImageNetâ€. In Proceedings of ELM-2015 Volume 1, 2016, pp. 249-263). Springer International Publishing

      [9] M. McDaniel, and M. H. Heydari, “Content based file type detection algorithmsâ€. In System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on, January 2003, (pp. 10-pp). IEEE

      [10] M.B. McDaniel, “An Algorithm for Content-based Automated File Type Recognitionâ€, James Madison University, 2001.

      [11] W. Li, K. Wang, S.J. Stolfo, and B. Herzog, “Fileprints: Identifying file types by n-gram Analysis,†Proceedings of the 6th IEEE Systems, Man and Cybernetics Information Assurance Workshop, June 2005, pp.64-71.

      [12] S. A. Mostafa, A. Mustapha, S. H. Khaleefah, M. S. Ahmad, & M. A. Mohammed, “Evaluating the Performance of Three Classification Methods in Diagnosis of Parkinson’s Diseaseâ€. In Recent Advances on Soft Computing and Data Mining, 2018 pp. 43-52.

      [13] M. A. Mohammed, B. Al-Khateeb, A. N. Rashid, D. A. Ibrahim, M. K. A. Ghani, & S. A. Mostafa, “Neural network and multi-fractal dimension features for breast cancer classification from ultrasound imagesâ€. Computers & Electrical Engineering, 2018.

      [14] M. A. Mohammed, M. K. A. Ghani, R. I. Hamed, S. A. Mostafa, D. A. Ibrahim, H. K. Jameel, &A. H Alallah, “Solving vehicle routing problem by using improved K-nearest neighbor algorithm for best solutionâ€. Journal of Computational Science, 2017, 21, 232-240.

      [15] S. A. Mostafa, A. Mustapha, M. A. Mohammed, M. S. Ahmad, & M. A. Mahmoud, “A fuzzy logic control in adjustable autonomy of a multi-agent system for an automated elderly movement monitoring applicationâ€. International journal of medical informatics, 2018, 112, 173-184.

      [16] M. C. Amirani, M. Toorani, and S. Mihandoost, “Featureâ€based type identification of file fragmentsâ€. Security and Communication Networks, 6(1), 2013, 115-128.

      [17] W. Qiu, R. Zhu, J. Guo, X. Tang, B. Liu, and Z. Huang, “A new approach to multimedia files carvingâ€. In Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on, November 2014, pp. 105-110. IEEE.

      [18] M. Karresand, and N. Shahmehri, “Oscar-file type identification of binary data in disk clusters and ram pagesâ€. Security and privacy in dynamic environments, 2006, 413-424.

      [19] I. Ahmed, K. S. Lhee, H. Shin, and M. P. Hong, “Fast File-type Identification,†Proceedings of 25thSymposium on Applied Computing, 2010, pp. 1601-1602.

      [20] C. J. Veenman, “Statistical disk cluster classification for file carvingâ€. In Information Assurance and Security, 2007. IAS 2007. Third International Symposium on, August 2007, pp. 393-398). IEEE.

      [21] L. Sportiello and S. Zanero, “File block classification by support vector machines,†in Proc. of the 6th Int. Conf. on Availability, Reliability and Security ARES 2011, 2011, pp. 307–312.

      [22] J.G. Dunham, M.T. Sun, and J.C.R. Tseng, “Classifying File Type of Stream Ciphers in Depth Using Neural Networks,†The 3rd ACS/IEEE International Conference on Computer Systems and Applications, Jan. 2005.

      [23] R.M. Harris,Using Artificial Neural Networks for Forensic File Type Identification, Purdue University, 2007

      [24] R. F Rahmat, F. Nicholas, S. Purnamawati, and O. S. Sitompul, “File Type Identification of File Fragments using Longest Common Subsequence (LCS)â€. In Journal of Physics: Conference Series, January 2017, Vol. 801, No. 1, p. 012054. IOP Publishing.â€

      [25] Data dump DFRWS2006, retrieved 12 March 2017, [Online] Available:<http://old.dfrws.org/2006/challenge/dfrws-2006-challenge-files.zip>.

      [26] G. B Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning machine: theory and applicationsâ€. Neurocomputing, 2006, 70 (1), 489-501.

      [27] S. H. Khaleefah, M. F. Nasrudin, and S. A. Mostafa, “Fingerprinting of deformed paper images acquired by scannersâ€, In Research and Development (SCOReD), 2015 IEEE Student Conference on, 2015, December, pp. 393-397. IEEE.

      [28] R. R. Ali, K. M. Mohamad, S. Jamel, and S. K. A Khalid, “Classification of JPEG Files by Using Extreme Learning Machineâ€. In Recent Advances on Soft Computing and Data Mining (pp. 33-42). Springer, Cham.

      [29] M. Xu, and S. Dong, “Reassembling the fragmented JPEG images based on sequential pixel predictionâ€, In Computer Network and Multimedia Technology, International Symposium on, pp. 1-6, 2009.

      [30] W. C. Calhoun, and D. Coles, “Predicting the types of file fragmentsâ€, digital investigation, 5, S14-S20, 2008.

      [31] B. Roux, “Reconstructing Textual File Fragments Using Unsupervised Machine Learning Techniqueâ€, 2008.

      [32] E., Tsamoura, & I. Pitas, “Automatic color based reassembly of fragmented images and paintingsâ€, IEEE Transactions on Image Processing, 19(3), 680-690, 2010.

      [33] S. Axelsson “The normalized compression distance as a file fragment classifierâ€, In: Proceedings of the 2010 Digital Forensics Research Conference (DFRWS); 2010.

      [34] V. Ganesh, “Artificial Intelligence Applied to Computer Forensicsâ€, International Journal, 5(5), 2017.

      [35] U. Karabiyik. “Building an intelligent assistant for digital forensicsâ€. Doctoral dissertation, The Florida State University, 2015.

      [36] Mohammed, M.A., Ghani, M.K.A., Arunkumar, N., Obaid, O.I., Mostafa, S.A., Jaber, M.M., Burhanuddin, M.A., Matar, B.M. and Ibrahim, D.A., 2018. Genetic case-based reasoning for improved mobile phone faults diagnosis. Computers & Electrical Engineering, 71, pp.212-222.

      [37] Ghani, M.K.A., Mohammed, M.A., Ibrahim, M.S., Mostafa, S.A. And Ibrahim, D.A., 2017. Implementing An Efficient Expert System For Services Center Management By Fuzzy Logic Controller. Journal of Theoretical & Applied Information Technology, 95(13).

      [38] KHANAPI ABD GHANI, Mohd et al. The Design of Flexible Telemedicine Framework for Healthcare Big Data. International Journal of Engineering & Technology, v. 7, n. 3.20, p. 461-468, doi:http://dx.doi.org/10.14419/ijet.v7i3.20.20590.

      [39] Mazin Abed Mohammed, Mohd Khanapi Abd Ghani, Salama A. Mostafa and Dheyaa Ahmed Ibrahim, 2017. Using Scatter Search Algorithm in Implementing Examination Timetabling Problem. Journal of Engineering and Applied Sciences, 12: 4792-4800.

      [40] Mutlag, A.A., Ghani, M.K.A., Arunkumar, N., Mohamed, M.A. and Mohd, O., 2019. Enabling technologies for fog computing in healthcare IoT systems. Future Generation Computer Systems, 90, pp.62-78.

  • Downloads

  • How to Cite

    Raad Ali, R., Malik Mohamad, K., Jamel, S., & Kamal Ahmad Khalid, S. (2018). Extreme Learning Machine Classification of File Clusters for Evaluating Content-based Feature Vectors. International Journal of Engineering & Technology, 7(4.36), 167-171. https://doi.org/10.14419/ijet.v7i4.36.23738