Extreme Learning Machine Classification of File Clusters for Evaluating Content-based Feature Vectors
-
https://doi.org/10.14419/ijet.v7i4.36.23738
Received date: December 12, 2018
Accepted date: December 12, 2018
Published date: December 9, 2018
-
Multimedia Clusters, JPEG image, Extreme Learning Machine (ELM), Feature selection. -
Abstract
In the digital forensic investigation and missing data files retrieval in general, there is a challenge of recovering files that have missing system information. The recovery process entails applying a number of methods to determine the type, the contents and the structure of each data file clusters such as JPEG, DOC, ZIP or TXT. This paper studies the effects of three content-based features extraction methods in improving the classification of JPEG File clusters. The methods are Byte Frequency Distribution, Entropy, and Rate of Change. Consequently, an Extreme Learning Machine (ELM) neural network algorithm is used to evaluate the performance of the three methods in which it classifies the class label of the feature vectors to JPEG and Non-JPEG images for files in different file formats. The files are allocated in a continuous series of clusters. The ELM algorithm is applied to the DFRWS (2006) dataset and the results show that the combination of the three methods produces 93.46% classification accuracy.
-
References
- A. Bhagat, R. Chaudhari,.and K. Dongre, “Content-based file shar-ing in peer-to-peer networks using threshold”. Procedia Computer Science 79, 2016, 53-60.
- Hard drive physical sectors architecture and data reading process, ACE Data Group, <https://www.datarecovery.net/articles/hard-drive-sectordamage.aspx>.
- M. Nadeem Ashraf, “Forensic Multimedia File Carving”, (2013).
- Y. Tang, J. Fang, K. P. Chow, S. M.Yiu, J. Xu, B. Feng, and Q. Han, "Recovery of heavily fragmented JPEG files." Digital Investi-gation 18, 2016, S108-S117.
- C. J. Veenman,. “Statistical disk cluster classification for file carv-ing”. In Information Assurance and Security, 2007. IAS 2007. Third International Symposium on, August 2007, pp. 393-398. IEEE.
- A. Pal, N. Memon, “The evolution of file carving”. In Signal Pro-cessing Magazine, vol. 26, no. 2, 2009, pp. 59—71. IEEE.
- M. Shannon, “Forensic relative strength scoring: ASCII and entropy scoring”, International Journal of Digital Evidence, 2(4), 2004, 1-19.
- L. Zhang, D. Zhang, and F. Tian “SVM and ELM: Who Wins? Ob-ject recognition with deep convolutional features from ImageNet”. In Proceedings of ELM-2015 Volume 1, 2016, pp. 249-263). Springer International Publishing
- M. McDaniel, and M. H. Heydari, “Content based file type detec-tion algorithms”. In System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on, January 2003, (pp. 10-pp). IEEE
- M.B. McDaniel, “An Algorithm for Content-based Automated File Type Recognition”, James Madison University, 2001.
- W. Li, K. Wang, S.J. Stolfo, and B. Herzog, “Fileprints: Identifying file types by n-gram Analysis,” Proceedings of the 6th IEEE Sys-tems, Man and Cybernetics Information Assurance Workshop, June 2005, pp.64-71.
- S. A. Mostafa, A. Mustapha, S. H. Khaleefah, M. S. Ahmad, & M. A. Mohammed, “Evaluating the Performance of Three Classifica-tion Methods in Diagnosis of Parkinson’s Disease”. In Recent Ad-vances on Soft Computing and Data Mining, 2018 pp. 43-52.
- M. A. Mohammed, B. Al-Khateeb, A. N. Rashid, D. A. Ibrahim, M. K. A. Ghani, & S. A. Mostafa, “Neural network and multi-fractal dimension features for breast cancer classification from ultrasound images”. Computers & Electrical Engineering, 2018.
- M. A. Mohammed, M. K. A. Ghani, R. I. Hamed, S. A. Mostafa, D. A. Ibrahim, H. K. Jameel, &A. H Alallah, “Solving vehicle routing problem by using improved K-nearest neighbor algorithm for best solution”. Journal of Computational Science, 2017, 21, 232-240.
- S. A. Mostafa, A. Mustapha, M. A. Mohammed, M. S. Ahmad, & M. A. Mahmoud, “A fuzzy logic control in adjustable autonomy of a multi-agent system for an automated elderly movement monitor-ing application”. International journal of medical informatics, 2018, 112, 173-184.
- M. C. Amirani, M. Toorani, and S. Mihandoost, “Feature‐based type identification of file fragments”. Security and Communication Networks, 6(1), 2013, 115-128.
- W. Qiu, R. Zhu, J. Guo, X. Tang, B. Liu, and Z. Huang, “A new approach to multimedia files carving”. In Bioinformatics and Bio-engineering (BIBE), 2014 IEEE International Conference on, No-vember 2014, pp. 105-110. IEEE.
- M. Karresand, and N. Shahmehri, “Oscar-file type identification of binary data in disk clusters and ram pages”. Security and privacy in dynamic environments, 2006, 413-424.
- I. Ahmed, K. S. Lhee, H. Shin, and M. P. Hong, “Fast File-type Identification,” Proceedings of 25thSymposium on Applied Com-puting, 2010, pp. 1601-1602.
- C. J. Veenman, “Statistical disk cluster classification for file carv-ing”. In Information Assurance and Security, 2007. IAS 2007. Third International Symposium on, August 2007, pp. 393-398). IEEE.
- L. Sportiello and S. Zanero, “File block classification by support vector machines,” in Proc. of the 6th Int. Conf. on Availability, Re-liability and Security ARES 2011, 2011, pp. 307–312.
- J.G. Dunham, M.T. Sun, and J.C.R. Tseng, “Classifying File Type of Stream Ciphers in Depth Using Neural Networks,” The 3rd ACS/IEEE International Conference on Computer Systems and Applications, Jan. 2005.
- R.M. Harris,Using Artificial Neural Networks for Forensic File Type Identification, Purdue University, 2007
- R. F Rahmat, F. Nicholas, S. Purnamawati, and O. S. Sitompul, “File Type Identification of File Fragments using Longest Common Subsequence (LCS)”. In Journal of Physics: Conference Series, Jan-uary 2017, Vol. 801, No. 1, p. 012054. IOP Publishing.
- Data dump DFRWS2006, retrieved 12 March 2017, [Online] Avail-able:<http://old.dfrws.org/2006/challenge/dfrws-2006-challenge-files.zip>.
- G. B Huang, Q. Y. Zhu, and C. K. Siew, “Extreme learning ma-chine: theory and applications”. Neurocomputing, 2006, 70 (1), 489-501.
- S. H. Khaleefah, M. F. Nasrudin, and S. A. Mostafa, “Fingerprint-ing of deformed paper images acquired by scanners”, In Research and Development (SCOReD), 2015 IEEE Student Conference on, 2015, December, pp. 393-397. IEEE.
- R. R. Ali, K. M. Mohamad, S. Jamel, and S. K. A Khalid, “Classifi-cation of JPEG Files by Using Extreme Learning Machine”. In Re-cent Advances on Soft Computing and Data Mining (pp. 33-42). Springer, Cham.
- M. Xu, and S. Dong, “Reassembling the fragmented JPEG images based on sequential pixel prediction”, In Computer Network and Multimedia Technology, International Symposium on, pp. 1-6, 2009.
- W. C. Calhoun, and D. Coles, “Predicting the types of file frag-ments”, digital investigation, 5, S14-S20, 2008.
- B. Roux, “Reconstructing Textual File Fragments Using Unsuper-vised Machine Learning Technique”, 2008.
- E., Tsamoura, & I. Pitas, “Automatic color based reassembly of fragmented images and paintings”, IEEE Transactions on Image Processing, 19(3), 680-690, 2010.
- S. Axelsson “The normalized compression distance as a file frag-ment classifier”, In: Proceedings of the 2010 Digital Forensics Re-search Conference (DFRWS); 2010.
- V. Ganesh, “Artificial Intelligence Applied to Computer Forensics”, International Journal, 5(5), 2017.
- U. Karabiyik. “Building an intelligent assistant for digital forensics”. Doctoral dissertation, The Florida State University, 2015.
- Mohammed, M.A., Ghani, M.K.A., Arunkumar, N., Obaid, O.I., Mostafa, S.A., Jaber, M.M., Burhanuddin, M.A., Matar, B.M. and Ibrahim, D.A., 2018. Genetic case-based reasoning for improved mobile phone faults diagnosis. Computers & Electrical Engineering, 71, pp.212-222.
- Ghani, M.K.A., Mohammed, M.A., Ibrahim, M.S., Mostafa, S.A. And Ibrahim, D.A., 2017. Implementing An Efficient Expert Sys-tem For Services Center Management By Fuzzy Logic Controller. Journal of Theoretical & Applied Information Technology, 95(13).
- KHANAPI ABD GHANI, Mohd et al. The Design of Flexible Tel-emedicine Framework for Healthcare Big Data. International Jour-nal of Engineering & Technology, v. 7, n. 3.20, p. 461-468, doi:http://dx.doi.org/10.14419/ijet.v7i3.20.20590.
- Mazin Abed Mohammed, Mohd Khanapi Abd Ghani, Salama A. Mostafa and Dheyaa Ahmed Ibrahim, 2017. Using Scatter Search Algorithm in Implementing Examination Timetabling Problem. Journal of Engineering and Applied Sciences, 12: 4792-4800.
- Mutlag, A.A., Ghani, M.K.A., Arunkumar, N., Mohamed, M.A. and Mohd, O., 2019. Enabling technologies for fog computing in healthcare IoT systems. Future Generation Computer Systems, 90, pp.62-78.
-
Downloads
-
How to Cite
Raad Ali, R., Malik Mohamad, K., Jamel, S., & Kamal Ahmad Khalid, S. (2018). Extreme Learning Machine Classification of File Clusters for Evaluating Content-based Feature Vectors. International Journal of Engineering and Technology, 7(4.36), 167-171. https://doi.org/10.14419/ijet.v7i4.36.23738
