Extreme Learning Machine Classification of File Clusters for Evaluating Content-based Feature Vectors
Keywords:Multimedia Clusters, JPEG image, Extreme Learning Machine (ELM), Feature selection.
In the digital forensic investigation and missing data files retrieval in general, there is a challenge of recovering files that have missing system information. The recovery process entails applying a number of methods to determine the type, the contents and the structure of each data file clusters such as JPEG, DOC, ZIP or TXT. This paper studies the effects of three content-based features extraction methods in improving the classification of JPEG File clusters. The methods are Byte Frequency Distribution, Entropy, and Rate of Change. Consequently, an Extreme Learning Machine (ELM) neural network algorithm is used to evaluate the performance of the three methods in which it classifies the class label of the feature vectors to JPEG and Non-JPEG images for files in different file formats. The files are allocated in a continuous series of clusters. The ELM algorithm is applied to the DFRWS (2006) dataset and the results show that the combination of the three methods produces 93.46% classification accuracy.
 A. Bhagat, R. Chaudhari,.and K. Dongre, â€œContent-based file sharing in peer-to-peer networks using thresholdâ€. Procedia Computer Science 79, 2016, 53-60.
 Hard drive physical sectors architecture and data reading process, ACE Data Group, <https://www.datarecovery.net/articles/hard-drive-sectordamage.aspx>.
 M. Nadeem Ashraf, â€œForensic Multimedia File Carvingâ€, (2013).
 Y. Tang, J. Fang, K. P. Chow, S. M.Yiu, J. Xu, B. Feng, and Q. Han, "Recovery of heavily fragmented JPEG files." Digital Investigation 18, 2016, S108-S117.
 C. J. Veenman,. â€œStatistical disk cluster classification for file carvingâ€. In Information Assurance and Security, 2007. IAS 2007. Third International Symposium on, August 2007, pp. 393-398. IEEE.
 A. Pal, N. Memon, â€œThe evolution of file carvingâ€. In Signal Processing Magazine, vol. 26, no. 2, 2009, pp. 59â€”71. IEEE.
 M. Shannon, â€œForensic relative strength scoring: ASCII and entropy scoringâ€, International Journal of Digital Evidence, 2(4), 2004, 1-19.
 L. Zhang, D. Zhang, and F. Tian â€œSVM and ELM: Who Wins? Object recognition with deep convolutional features from ImageNetâ€. In Proceedings of ELM-2015 Volume 1, 2016, pp. 249-263). Springer International Publishing
 M. McDaniel, and M. H. Heydari, â€œContent based file type detection algorithmsâ€. In System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on, January 2003, (pp. 10-pp). IEEE
 M.B. McDaniel, â€œAn Algorithm for Content-based Automated File Type Recognitionâ€, James Madison University, 2001.
 W. Li, K. Wang, S.J. Stolfo, and B. Herzog, â€œFileprints: Identifying file types by n-gram Analysis,â€ Proceedings of the 6th IEEE Systems, Man and Cybernetics Information Assurance Workshop, June 2005, pp.64-71.
 S. A. Mostafa, A. Mustapha, S. H. Khaleefah, M. S. Ahmad, & M. A. Mohammed, â€œEvaluating the Performance of Three Classification Methods in Diagnosis of Parkinsonâ€™s Diseaseâ€. In Recent Advances on Soft Computing and Data Mining, 2018 pp. 43-52.
 M. A. Mohammed, B. Al-Khateeb, A. N. Rashid, D. A. Ibrahim, M. K. A. Ghani, & S. A. Mostafa, â€œNeural network and multi-fractal dimension features for breast cancer classification from ultrasound imagesâ€. Computers & Electrical Engineering, 2018.
 M. A. Mohammed, M. K. A. Ghani, R. I. Hamed, S. A. Mostafa, D. A. Ibrahim, H. K. Jameel, &A. H Alallah, â€œSolving vehicle routing problem by using improved K-nearest neighbor algorithm for best solutionâ€. Journal of Computational Science, 2017, 21, 232-240.
 S. A. Mostafa, A. Mustapha, M. A. Mohammed, M. S. Ahmad, & M. A. Mahmoud, â€œA fuzzy logic control in adjustable autonomy of a multi-agent system for an automated elderly movement monitoring applicationâ€. International journal of medical informatics, 2018, 112, 173-184.
 M. C. Amirani, M. Toorani, and S. Mihandoost, â€œFeatureâ€based type identification of file fragmentsâ€. Security and Communication Networks, 6(1), 2013, 115-128.
 W. Qiu, R. Zhu, J. Guo, X. Tang, B. Liu, and Z. Huang, â€œA new approach to multimedia files carvingâ€. In Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on, November 2014, pp. 105-110. IEEE.
 M. Karresand, and N. Shahmehri, â€œOscar-file type identification of binary data in disk clusters and ram pagesâ€. Security and privacy in dynamic environments, 2006, 413-424.
 I. Ahmed, K. S. Lhee, H. Shin, and M. P. Hong, â€œFast File-type Identification,â€ Proceedings of 25thSymposium on Applied Computing, 2010, pp. 1601-1602.
 C. J. Veenman, â€œStatistical disk cluster classification for file carvingâ€. In Information Assurance and Security, 2007. IAS 2007. Third International Symposium on, August 2007, pp. 393-398). IEEE.
 L. Sportiello and S. Zanero, â€œFile block classification by support vector machines,â€ in Proc. of the 6th Int. Conf. on Availability, Reliability and Security ARES 2011, 2011, pp. 307â€“312.
 J.G. Dunham, M.T. Sun, and J.C.R. Tseng, â€œClassifying File Type of Stream Ciphers in Depth Using Neural Networks,â€ The 3rd ACS/IEEE International Conference on Computer Systems and Applications, Jan. 2005.
 R.M. Harris,Using Artificial Neural Networks for Forensic File Type Identification, Purdue University, 2007
 R. F Rahmat, F. Nicholas, S. Purnamawati, and O. S. Sitompul, â€œFile Type Identification of File Fragments using Longest Common Subsequence (LCS)â€. In Journal of Physics: Conference Series, January 2017, Vol. 801, No. 1, p. 012054. IOP Publishing.â€
 Data dump DFRWS2006, retrieved 12 March 2017, [Online] Available:<http://old.dfrws.org/2006/challenge/dfrws-2006-challenge-files.zip>.
 G. B Huang, Q. Y. Zhu, and C. K. Siew, â€œExtreme learning machine: theory and applicationsâ€. Neurocomputing, 2006, 70 (1), 489-501.
 S. H. Khaleefah, M. F. Nasrudin, and S. A. Mostafa, â€œFingerprinting of deformed paper images acquired by scannersâ€, In Research and Development (SCOReD), 2015 IEEE Student Conference on, 2015, December, pp. 393-397. IEEE.
 R. R. Ali, K. M. Mohamad, S. Jamel, and S. K. A Khalid, â€œClassification of JPEG Files by Using Extreme Learning Machineâ€. In Recent Advances on Soft Computing and Data Mining (pp. 33-42). Springer, Cham.
 M. Xu, and S. Dong, â€œReassembling the fragmented JPEG images based on sequential pixel predictionâ€, In Computer Network and Multimedia Technology, International Symposium on, pp. 1-6, 2009.
 W. C. Calhoun, and D. Coles, â€œPredicting the types of file fragmentsâ€, digital investigation, 5, S14-S20, 2008.
 B. Roux, â€œReconstructing Textual File Fragments Using Unsupervised Machine Learning Techniqueâ€, 2008.
 E., Tsamoura, & I. Pitas, â€œAutomatic color based reassembly of fragmented images and paintingsâ€, IEEE Transactions on Image Processing, 19(3), 680-690, 2010.
 S. Axelsson â€œThe normalized compression distance as a file fragment classifierâ€, In: Proceedings of the 2010 Digital Forensics Research Conference (DFRWS); 2010.
 V. Ganesh, â€œArtificial Intelligence Applied to Computer Forensicsâ€, International Journal, 5(5), 2017.
 U. Karabiyik. â€œBuilding an intelligent assistant for digital forensicsâ€. Doctoral dissertation, The Florida State University, 2015.
 Mohammed, M.A., Ghani, M.K.A., Arunkumar, N., Obaid, O.I., Mostafa, S.A., Jaber, M.M., Burhanuddin, M.A., Matar, B.M. and Ibrahim, D.A., 2018. Genetic case-based reasoning for improved mobile phone faults diagnosis. Computers & Electrical Engineering, 71, pp.212-222.
 Ghani, M.K.A., Mohammed, M.A., Ibrahim, M.S., Mostafa, S.A. And Ibrahim, D.A., 2017. Implementing An Efficient Expert System For Services Center Management By Fuzzy Logic Controller. Journal of Theoretical & Applied Information Technology, 95(13).
 KHANAPI ABD GHANI, Mohd et al. The Design of Flexible Telemedicine Framework for Healthcare Big Data. International Journal of Engineering & Technology, v. 7, n. 3.20, p. 461-468, doi:http://dx.doi.org/10.14419/ijet.v7i3.20.20590.
 Mazin Abed Mohammed, Mohd Khanapi Abd Ghani, Salama A. Mostafa and Dheyaa Ahmed Ibrahim, 2017. Using Scatter Search Algorithm in Implementing Examination Timetabling Problem. Journal of Engineering and Applied Sciences, 12: 4792-4800.
 Mutlag, A.A., Ghani, M.K.A., Arunkumar, N., Mohamed, M.A. and Mohd, O., 2019. Enabling technologies for fog computing in healthcare IoT systems. Future Generation Computer Systems, 90, pp.62-78.