A classification model on probabilistic semantic relation for big data: an integrated approach


  • Madhu M.Nashipudimath Research Scholar, Faculty of Computer Engineering, Pacific Academy of Higher Education and Research University, Udaipur
  • Subhash K.Shinde Vice-Principal & Professor in Computer Engineering Lokmanya Tilak College of Engineering, Navi Mumbai,






Big Data, Integration, Probabilistic Relation Prediction, Semantic Classification.


Data mining is process of analyzing information repositories. As data store took shape of big data, it is difficult to find relevant patterns with current techniques. Existing framework don’t suit integration and analysis of complex scenario. This insufficiency motivates to pro-pose new solutions. The major problem with big data integration and analysis is due to complex interdependence between the changing data granularity, incompatible data models, and data contents. Hence integration and classification model based on probabilistic semantic relation (PSR) of attribute pattern for big data source is proposed. It learns interrelationships and interdependence pattern among data class and data source. This knowledge is utilized to classify probabilistic relation prediction among the pattern and source data which helps in data classification and future analysis. The model implements Data integration and mapping, Construction of knowledge base, and Naive based (NB) PSR approach. An experiment is done over real crime dataset. Measures like Precision, Recall, Fall-out rate and F-measure are calculated to evaluate results. Experiment shows average of 10% increase in precision and recalls as compared to NB classification and an average of 7% improvisation in F-measure. This improvisation suggest that proposed model can be applied to future data class prediction for various prediction task.




[1] Sun Y, Lu C, Bie R, Zhang J. Semantic relation computing theory and its application. In: Journal of Networking and Comput Application. 2016; 59:219-229. https://doi.org/10.1016/j.jnca.2014.09.017.

[2] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and S. Gerd. Evaluating similarity measures for the emergent semantics of social tagging. ACM in Proceedings of the 18th international conference on World Wide Web. New York, 2009; p.641-650.

[3] R. Mao , H. Xu , W. Wu , J. Li , Y. Li , and M. Lu .Overcoming the challenge of variety: Big data abstraction, the next evolution of data management for AAL communication systems. IEEE Communication. Magazine. Vol 53. No one.2015; p. 42 - 47.

[4] Nagwani, N.K.Summarizing large text collection using topic modeling and clustering based on MapReduce framework. In:Journal of Big Data. 2015; 2(1).p. 6. https://doi.org/10.1186/s40537-015-0020-5.

[5] Wang, M., Nie, T., Shen, D., Kou, Y. and Yu, G., November. Intelligent similarity joins for big data integration. In: 10th Web Information System and Application Conference (WISA). IEEE 2013. p. 383-388.

[6] X.L. Dong, D. Srivastava. Big data integration. In: IEEE International Conference in Data Engineering (ICDE). 2013. 29:1245-1248. https://doi.org/10.1109/ICDE.2013.6544914.

[7] Daniel L. da Silva, Pedro L. P.,Silvio L. Stanzani, Paulo A, A. Sheffer C. A Computational Framework for Integrating and Retrieving Biodiversity Data on a Large Scale.In: IEEE International Congress on Big Data. 2014.

[8] Gu, B., Li, Z., Zhang, X., Liu, A., Liu, G., Zheng, K., Zhao, L. and Zhou, X., The Interaction Between Schema Matching and Record Matching in Data Integration. In: IEEE Transactions on Knowledge and Data Engineering, 2017 29(1): p.186-199. https://doi.org/10.1109/TKDE.2016.2611577.

[9] S. Bergamaschi, L. Po,S. Sorrentino. Automatic annotation in data integration systems.In: OTM Workshops. LNCS 4805, Springer. 2007. p. 27-B. Louie ,L. Detwiler ,N. N. Dalvi ,R. Shaker , P. Tarczy-Hornoch, D. Suciu. Incorporating uncertainty metrics into a general-purpose data integration system. SSDBM, 19, IEEE Computer Society, 2007. https://doi.org/10.1007/978-3-540-76888-3_14.

[10] Zhang, J., Yao, C., Sun, Y. and Fang, Z. Building text-based temporally linked event network for scientific big data analytics. Personal and Ubiquitous Computing. 2016 20(5):743-755. https://doi.org/10.1007/s00779-016-0940-x.

[11] W. Zhang, J. Wang, and Wei Feng. Combining latent factor model with location features for event-based group recommendation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013. p 910-918. https://doi.org/10.1145/2487575.2487646.

[12] Yinuo Zhang, Hao Wu, Vikram Sorathia, and Viktor K. Prasanna. Event recommendation in social networks with linked data enablement. In: ICEIS Conference 2013.

[13] Sun Y, Bie R, Zhang J. Measuring semantic-based structural similarity in multi-relational networks. International Journal of Data Warehouse and Mining.2016; 12(1): p. 20-33. https://doi.org/10.4018/IJDWM.2016010102.

[14] Sun Y, Jara AJ. An extensible and active semantic model of information organizing for the internet of things. Personal and Ubiquitous Computing. 2014 18(8):1821-33. https://doi.org/10.1007/s00779-014-0786-z.

[15] Micheal D. Lee. Brandon Pincombe, Matthew Welsh. An Emperical evaluation of models of text document similarity. Inproceddings of the 27th annual conference of the Cognitive Science Society, 2005, pp. 1254-1259.

[16] Shvaiko, Pavel, and Jérôme Euzenat. A survey of schema-based matching approaches. In: Journal on data semantics IV. Springer Berlin Heidelberg, 2005. P 146-171. https://doi.org/10.1007/11603412_5.

[17] Magnani M, Rizopoulos N, Brien PM, Montesi D. Schema integration based on uncertain semantic mappings. In: International Conference on Conceptual Modeling 2005. pp. 31-46. Springer Berlin Heidelberg.

[18] M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning. In: Proceedings of SDM-06 workshop on Link Analysis, Counter terrorism and Security. 2006.

[19] Popescul A, Ungar LH. Statistical relational learning for link prediction. In: IJCAI workshop on learning statistical models from relational data 2003 2003.

[20] Agichtein E, Ganti V.Mining reference tables for automatic text segmentation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM. 2004. P 20-29. https://doi.org/10.1145/1014052.1014058.

[21] G. Kumaran and J. Allan.Text classification and named entities for new event detection. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, NY. 2004. p. 297-304.

[22] Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S. and Zhou, X., 2013. Big data challenge: a data management perspective. Frontiers of Computer Science, 2013; 7(2):157-164. https://doi.org/10.1007/s11704-013-3903-7.

[23] Dalvi, N. and Suciu, D. Management of probabilistic data: foundations and challenges. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 2007. p.1-12. https://doi.org/10.1145/1265530.1265531.

[24] Mary Alaine Califf, Raymond j. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. In:Journal of Machine Leraning:2003:4:117:210.

[25] Califf M. E, Mooney R J. Bottom-up relational learning of pattern matching rules for information extraction. In: Journal of Machine Learning Research. 2003; 4:177-210.

[26] Marthi B, Milch B, Russell S. First-order probabilistic models for information extraction. In: IJCAI workshop on learning statistical models from relational data. 2003.

[27] SFPD Datasets: City and Country of San Francisco-SF Open Data. https://data.sfgov.org, Accessed May 2015.

View Full Article:

Additional Files