A classification model on probabilistic semantic relation for big data: an integrated approach
-
https://doi.org/10.14419/ijet.v7i4.17144
Received date: August 8, 2018
Accepted date: September 3, 2018
Published date: November 15, 2018
-
Big Data, Integration, Probabilistic Relation Prediction, Semantic Classification. -
Abstract
Data mining is process of analyzing information repositories. As data store took shape of big data, it is difficult to find relevant patterns with current techniques. Existing framework don’t suit integration and analysis of complex scenario. This insufficiency motivates to pro-pose new solutions. The major problem with big data integration and analysis is due to complex interdependence between the changing data granularity, incompatible data models, and data contents. Hence integration and classification model based on probabilistic semantic relation (PSR) of attribute pattern for big data source is proposed. It learns interrelationships and interdependence pattern among data class and data source. This knowledge is utilized to classify probabilistic relation prediction among the pattern and source data which helps in data classification and future analysis. The model implements Data integration and mapping, Construction of knowledge base, and Naive based (NB) PSR approach. An experiment is done over real crime dataset. Measures like Precision, Recall, Fall-out rate and F-measure are calculated to evaluate results. Experiment shows average of 10% increase in precision and recalls as compared to NB classification and an average of 7% improvisation in F-measure. This improvisation suggest that proposed model can be applied to future data class prediction for various prediction task.
-
References
- Sun Y, Lu C, Bie R, Zhang J. Semantic relation computing theory and its application. In: Journal of Networking and Comput Applica-tion. 2016; 59:219-229. https://doi.org/10.1016/j.jnca.2014.09.017.
- B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and S. Gerd. Evaluating similarity measures for the emergent semantics of social tagging. ACM in Proceedings of the 18th international con-ference on World Wide Web. New York, 2009; p.641-650.
- R. Mao , H. Xu , W. Wu , J. Li , Y. Li , and M. Lu .Overcoming the challenge of variety: Big data abstraction, the next evolution of data management for AAL communication systems. IEEE Communica-tion. Magazine. Vol 53. No one.2015; p. 42 - 47.
- Nagwani, N.K.Summarizing large text collection using topic model-ing and clustering based on MapReduce framework. In:Journal of Big Data. 2015; 2(1).p. 6. https://doi.org/10.1186/s40537-015-0020-5.
- Wang, M., Nie, T., Shen, D., Kou, Y. and Yu, G., November. Intel-ligent similarity joins for big data integration. In: 10th Web Infor-mation System and Application Conference (WISA). IEEE 2013. p. 383-388.
- X.L. Dong, D. Srivastava. Big data integration. In: IEEE Interna-tional Conference in Data Engineering (ICDE). 2013. 29:1245-1248. https://doi.org/10.1109/ICDE.2013.6544914.
- Daniel L. da Silva, Pedro L. P.,Silvio L. Stanzani, Paulo A, A. Sheffer C. A Computational Framework for Integrating and Re-trieving Biodiversity Data on a Large Scale.In: IEEE International Congress on Big Data. 2014.
- Gu, B., Li, Z., Zhang, X., Liu, A., Liu, G., Zheng, K., Zhao, L. and Zhou, X., The Interaction Between Schema Matching and Record Matching in Data Integration. In: IEEE Transactions on Knowledge and Data Engineering, 2017 29(1): p.186-199. https://doi.org/10.1109/TKDE.2016.2611577.
- S. Bergamaschi, L. Po,S. Sorrentino. Automatic annotation in data integration systems.In: OTM Workshops. LNCS 4805, Springer. 2007. p. 27-B. Louie ,L. Detwiler ,N. N. Dalvi ,R. Shaker , P. Tar-czy-Hornoch, D. Suciu. Incorporating uncertainty metrics into a general-purpose data integration system. SSDBM, 19, IEEE Com-puter Society, 2007. https://doi.org/10.1007/978-3-540-76888-3_14.
- Zhang, J., Yao, C., Sun, Y. and Fang, Z. Building text-based tem-porally linked event network for scientific big data analyt-ics. Personal and Ubiquitous Computing. 2016 20(5):743-755. https://doi.org/10.1007/s00779-016-0940-x.
- W. Zhang, J. Wang, and Wei Feng. Combining latent factor model with location features for event-based group recommendation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013. p 910-918. https://doi.org/10.1145/2487575.2487646.
- Yinuo Zhang, Hao Wu, Vikram Sorathia, and Viktor K. Prasanna. Event recommendation in social networks with linked data enable-ment. In: ICEIS Conference 2013.
- Sun Y, Bie R, Zhang J. Measuring semantic-based structural similar-ity in multi-relational networks. International Journal of Data Ware-house and Mining.2016; 12(1): p. 20-33. https://doi.org/10.4018/IJDWM.2016010102.
- Sun Y, Jara AJ. An extensible and active semantic model of infor-mation organizing for the internet of things. Personal and Ubiqui-tous Computing. 2014 18(8):1821-33. https://doi.org/10.1007/s00779-014-0786-z.
- Micheal D. Lee. Brandon Pincombe, Matthew Welsh. An Emperi-cal evaluation of models of text document similarity. Inproceddings of the 27th annual conference of the Cognitive Science Society, 2005, pp. 1254-1259.
- Shvaiko, Pavel, and Jérôme Euzenat. A survey of schema-based matching approaches. In: Journal on data semantics IV. Springer Berlin Heidelberg, 2005. P 146-171. https://doi.org/10.1007/11603412_5.
- Magnani M, Rizopoulos N, Brien PM, Montesi D. Schema integra-tion based on uncertain semantic mappings. In: International Con-ference on Conceptual Modeling 2005. pp. 31-46. Springer Berlin Heidelberg.
- M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning. In: Proceedings of SDM-06 workshop on Link Analysis, Counter terrorism and Security. 2006.
- Popescul A, Ungar LH. Statistical relational learning for link predic-tion. In: IJCAI workshop on learning statistical models from rela-tional data 2003 2003.
- Agichtein E, Ganti V.Mining reference tables for automatic text segmentation. In: Proceedings of the tenth ACM SIGKDD interna-tional conference on knowledge discovery and data mining. ACM. 2004. P 20-29. https://doi.org/10.1145/1014052.1014058.
- G. Kumaran and J. Allan.Text classification and named entities for new event detection. In: Proceedings of the 27th annual interna-tional ACM SIGIR conference on Research and development in in-formation retrieval, NY. 2004. p. 297-304.
- Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S. and Zhou, X., 2013. Big data challenge: a data management perspective. Frontiers of Computer Science, 2013; 7(2):157-164. https://doi.org/10.1007/s11704-013-3903-7.
- Dalvi, N. and Suciu, D. Management of probabilistic data: founda-tions and challenges. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 2007. p.1-12. https://doi.org/10.1145/1265530.1265531.
- Mary Alaine Califf, Raymond j. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. In:Journal of Machine Leraning:2003:4:117:210.
- Califf M. E, Mooney R J. Bottom-up relational learning of pattern matching rules for information extraction. In: Journal of Machine Learning Research. 2003; 4:177-210.
- Marthi B, Milch B, Russell S. First-order probabilistic models for information extraction. In: IJCAI workshop on learning statistical models from relational data. 2003.
- SFPD Datasets: City and Country of San Francisco-SF Open Data. https://data.sfgov.org, Accessed May 2015.
-
Downloads
-
How to Cite
M.Nashipudimath, M., & K.Shinde, S. (2018). A classification model on probabilistic semantic relation for big data: an integrated approach. International Journal of Engineering and Technology, 7(4), 4429-4434. https://doi.org/10.14419/ijet.v7i4.17144
