Multiple data linkage using SSCCT for direct and semantic matching pair

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Data Linking is a method of integrating multiple data items located on different sources and establishing links among entities of the same type or semantically relevant type. It is necessary to develop the data linkage techniques for different and semantically related items for interconnecting multiple data sources. In this paper, multiple data linkage is used to establish relation between matching entities of different types that are semantically related. The proposed method used Semantically Similar Class Clustering Tree (SSCCT) for implementing multiple data linkage. The SSCCT is built in such a way that it is easy to understand and can be transformed into association rules which are verified using WordNet ontology. The data source properties are represented as tree and the inner node, which consists of features from the first data set. The leaves of the tree represent features from the second data set that is matching with the first data set entities. The proposed method used semantic similarity estimation for pre-pruning process which is used to create Semantically Similar Class Clustering Tree effectively. Threshold value is used for decision making either the record pair is match or non-match.

  • Keywords

    Multiple data linkage, semantic data, clustering

  • References

      [1] D.J. Rohde, M.R. Gallagher, M.J. Drinkwater, and K.A. Pimbblet,“Matching of Catalogues by Probabilistic Pattern Classification,” Monthly Notices of the Royal Astronomical Soc., vol. 369, no. 1, pp. 2-14, May 2006.

      [2] P. Christen and K. Goiser, “Quality and Complexity Measures for Data Linkage and Deduplication,” Quality Measures in Data Mining, vol. 43, pp. 127-151, 2007.

      [3] S. Ivie, G. Henry, H. Gatrell, and C. Giraud-Carrier,“A Metric-Based Machine Learning Approach to Genealogical Record Linkage,” Proc.Seventh Ann. Workshop Technology for Family History and Genealogical Research, 2007.

      [4] H. Blockeel, L.D. Raedt, and J. Ramon, “Top-Down Induction of Clustering Trees,” ArXiv Computer Science e-prints, pp. 55-63, 1998.

      [5] O. Benjelloun, H. Garcia, D. Menestrina, Q. Su, S. Whang, and J. Widom, “Swoosh: A Generic Approach to Entity Resolution,” The VLDB J., vol. 18, no. 1, pp. 255-276, 2009.

      [6] V. Torra and J. Domingo-Ferrer, “Record Linkage Methods for Multidatabase Data Mining,” Studies in Fuzziness and Soft Computing, vol. 123, pp. 101-132, 2003.

      [7] S. Guha, R. Rastogi, and K. Shim, “Rock: A Robust Clustering Algorithm for Categorical Attributes,” Information Systems, vol. 25, no. 5, pp. 345-366, July 2000.

      [8] A. Gershman et al., “A Decision Tree Based Recommender System,” Proc. 10th Int’l Conf. Innovative Internet Community Services, pp. 170-179, 2010.

      [9] C. Li, Y. Zhang, X. Li, "OcVFDT: One-Class Very Fast Decision Tree for One-Class Classification of Data Streams", Proc. Third Int'l Workshop Knowledge Discovery from Sensor Data, pp. 79-86, 2009.

      [10] J. Struyf, S. Dzeroski, "Clustering Trees with Instance Level Constraints", Proc. 18th European Conf. Machine Learning, pp. 359-370, 2007.

      [11] P. Christen, "A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication", IEEE Trans. Knowledge and Data Eng., vol. 24, no. 9, pp. 1537-1555, Sept. 2012.

      [12] M. Dror, A.Shabtai, L.Rokach, Y. Elovici, “OCCT: A One-Class Clustering Tree for Implementing One-to- Many Data Linkage,” IEEE Trans. on Knowledge and Data Engineering, VOL. 26, NO. 3, 2014.

      [13] M.Yakout, A.K.Elmagarmid, H.Elmeleegy, M.Quzzani and A.Qi, “Behavior Based Record Linkage,” in Proc. of the VLDB Endowment, vol. 3, no 1-2, pp. 439-448, 2010.

      [14] Adomavicius G. and Tuzhilin A “ Its for the Next Generation of Data Mining Recommender smart Systems: A Survey of the Possible Extensions”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005.

      [15] A. Bouza, G. Reif, A. Bernstein, and H. Gall, “ Sem-tree: Ontology- Based Decision Tree Algorithm for Recommender Systems,” Proc. Int’l Semantic Web Conf., 2008.

      [16] C. Ferri, P. Flach, and J. Herna´ndez-Orallo, “Learning Decision Trees Using the Area under the ROC Curve,” Proc. Ninth Int’l Conf. Machine Learning, pp. 139-146, 2002.

      Manali Pare Guha, Anju Singh and Divaker Singh, “OCCT: A One –Class Clustering Tree for Implementing One – to- Many and Many – to- Many Data Linkage”, International Journal of Computer Applications (0975 – 8887), Volume 137 – No.3, March 2016




Article ID: 10665
DOI: 10.14419/ijet.v7i1.3.10665

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.