Record linkage and deduplication using traditional blocking
Keywords:Blocking, Blocking Key, Blocking Key Value, Deduplication, Record Linkage, Traditional Blocking.
Record Linkage and Deduplication are the two process that are used in matching records. Matching of records is done to remove the duplicate records. These duplicate records highly influence the outputs of data mining and data processing. If the matching of records is done on the single database, it is called Deduplication. In Deduplication we check for the duplicate records in the single database. Unlike deduplication if the matching of the records is done on the several databases it is called as record linkage. In this paper we also discuss about the indexing technique called as traditional blocking which is used to remove non matching pairs that leads to the less number of record pair to be compared.
 Peter Christen, â€œA Survey of Indexing techniques for Scalable Record Linkage and Deduplication,â€ Journal of Knowledge and Data Engineering, Vol 24, September 2012.
 J. Jonas and J. Harper, â€œEffective Counterterrorism and the Limited Role of Predictive Data Mining,â€ Policy Analysis, no. 584, pp. 1-11, 2006.
 Carlo Batini, Monica Scannapieco, â€œData and Information Quality: Dimensions, Principles and Techniques â€œ pp 228.
 D.E. Clark, â€œPractical Introduction to Record Linkage for Injury Research,â€ Injury Prevention, vol. 10, pp. 186-191, 2004.https://doi.org/10.1136/ip.2003.004580.C.W. Kelman, J. Bass, and D. Holman, â€œResearch Use of Linked Health Dataâ€”A Best Practice Protocol,â€ Australian NZ J. Public Health, vol. 26, pp. 251-255, 2002.https://doi.org/10.1111/j.1467-842X.2002.tb00682.x