Record linkage and deduplication using traditional blocking


  • G Somasekhar
  • SeshaSravani K
  • Keerthi P
  • Sai Sandeep G





Blocking, Blocking Key, Blocking Key Value, Deduplication, Record Linkage, Traditional Blocking.


Record Linkage and Deduplication are the two process that are used in matching records. Matching of records is done to remove the duplicate records. These duplicate records highly influence the outputs of data mining and data processing. If the matching of records is done on the single database, it is called Deduplication. In Deduplication we check for the duplicate records in the single database. Unlike deduplication if the matching of the records is done on the several databases it is called as record linkage. In this paper we also discuss about the indexing technique called as traditional blocking which is used to remove non matching pairs that leads to the less number of record pair to be compared.


[1] Peter Christen, “A Survey of Indexing techniques for Scalable Record Linkage and Deduplication,†Journal of Knowledge and Data Engineering, Vol 24, September 2012.

[2] J. Jonas and J. Harper, “Effective Counterterrorism and the Limited Role of Predictive Data Mining,†Policy Analysis, no. 584, pp. 1-11, 2006.

[3] Carlo Batini, Monica Scannapieco, “Data and Information Quality: Dimensions, Principles and Techniques “ pp 228.

[4] D.E. Clark, “Practical Introduction to Record Linkage for Injury Research,†Injury Prevention, vol. 10, pp. 186-191, 2004.

C.W. Kelman, J. Bass, and D. Holman, “Research Use of Linked Health Data—A Best Practice Protocol,†Australian NZ J. Public Health, vol. 26, pp. 251-255, 2002.

View Full Article: