Classification Rule Generation for Cancer Prediction using Locality Sensitive Hashing Similarity Measure


  • Gautam Amiya
  • J Anuradha Associate Professor,Scope,Vellore Institute of Technology.
  • Venkatesh B Research Scholor





CBR (Case Based Reasoning), Discretization, Euclidean Distance Metric, Gaussian distribution, LSH (Locality Sensitive Hashing).


This paper aims to develop a decision support system for healthcare in predicting stage of cancer (whether benign or malignant) using a novel classifier technique based on Locality Sensitive Hashing (LSH). We propose a new classification rule generations scheme based on Locality Sensitive Hashing. By applying LSH based classification instance selection algorithms, we get a minimal set of class representative patterns, on which we apply discretization and classification rule generation manually. Thus, have high chances of coming up with best prediction. Confusion matrix is used to compare test results. The above technique is applied on two datasets –Iris and Breast Cancer Wisconsin. We get better accuracy, specificity, sensitivity and precision than traditional classifiers. Manual diagnosis takes time and is a trial-error procedure and needs knowledge from medical specialists. We better the accuracy and speed of this manual procedure. classification model concept is used.


[1] D. Rossille, J.-F. Laurent, and A. Burgun, “Modelling a decisionsupport system for oncology using rule-based and case-based reasoning

methodologies,†International journal of medical informatics, vol. 74,

no. 2-4, pp. 299–306, 2005.

[2] C. Marling, M. Sqalli, E. Rissland, H. Munoz-Avila, and D. Aha, ˜

“Case-based reasoning integrations,†AI magazine, vol. 23, no. 1, p. 69,


[3] J. Prentzas and I. Hatzilygeroudis, “Categorizing approaches combining rule-based and case-based reasoning,†Expert Systems, vol. 24,

no. 2, pp. 97–122, 2007.

[4] J. Kolodner, Case-based reasoning. Morgan Kaufmann, 2014.

[5] R. Saraiva, M. Perkusich, L. Silva, H. Almeida, C. Siebra, and A. Perkusich, “Early diagnosis of gastrointestinal cancer by using case-based

and rule-based reasoning,†Expert Systems with Applications, vol. 61,

pp. 192–202, 2016.

[6] H. Gomez-Vallejo, B. Uriel-Latorre, M. Sande-Meijide, B. Villamar ´ ´ınBello, R. Pavon, F. Fdez-Riverola, and D. Glez-Pe ´ na, “A case-based ˜

reasoning system for aiding detection and classification of nosocomial

infections,†Decision Support Systems, vol. 84, pp. 104–116, 2016.

[7] A. Mansoul and B. Atmani, “Clustering to enhance case-based reasoning,†in Modelling and Implementation of Complex Systems. Springer,

2016, pp. 137–151.

[8] S. Petrovic, G. Khussainova, and R. Jagannathan, “Knowledge-light

adaptation approaches in case-based reasoning for radiotherapy treatment planning,†Artificial intelligence in medicine, vol. 68, pp. 17–28,


[9] P. Chazara, S. Negny, and L. Montastruc, “Flexible knowledge representation and new similarity measure: Application on case based

reasoning for waste treatment,†Expert Systems with Applications,

vol. 58, pp. 143–154, 2016.

[10] Y. Shen, J. Colloc, A. Jacquet-Andrieu, and K. Lei, “Emerging medical

informatics with case-based reasoning for aiding clinical decision in

multi-agent system,†Journal of biomedical informatics, vol. 56, pp.

307–317, 2015.

[11] J. Vilhena, H. Vicente, M. R. Martins, J. M. Graneda, F. Caldeira, ˜

R. Gusmao, J. Neves, and J. Neves, “A case-based reasoning view of ˜

thrombophilia risk,†Journal of biomedical informatics, vol. 62, pp.

265–275, 2016.

[12] A. Arnaiz-Gonz ´ alez, J.-F. D ´ ´ıez-Pastor, J. J. Rodr´ıguez, and C. Garc´ıaOsorio, “Instance selection of linear complexity for big data,â€

Knowledge-Based Systems, vol. 107, pp. 83–95, 2016.

[13] X. Gu, Y. Zhang, L. Zhang, D. Zhang, and J. Li, “An improved

method of locality sensitive hashing for indexing large-scale and highdimensional features,†Signal Processing, vol. 93, no. 8, pp. 2244–2255,


[14] M. Slaney and M. Casey, “Locality-sensitive hashing for finding nearest

neighbors [lecture notes],†IEEE Signal processing magazine, vol. 25,

no. 2, pp. 128–131, 2008.

[15] J. Oliver, C. Cheng, and Y. Chen, “Tlsh–a locality sensitive hash,â€

in Cybercrime and Trustworthy Computing Workshop (CTC), 2013

Fourth. IEEE, 2013, pp. 7–13.

[16] L. Pauleve, H. J ´ egou, and L. Amsaleg, “Locality sensitive hashing: A ´

comparison of hash function types and querying mechanisms,†Pattern

Recognition Letters, vol. 31, no. 11, pp. 1348–1358, 2010.

[17] G. Shakhnarovich, T. Darrell, and P. Indyk, Nearest-neighbor methods in learning and vision: theory and practice (neural information

processing). The MIT press, 2006.

[18] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive

datasets. Cambridge university press, 2014.

[19] B. Van Durme and A. Lall, “Online generation of locality sensitive hash

signatures,†in Proceedings of the ACL 2010 conference short papers.

Association for Computational Linguistics, 2010, pp. 231–235.

[20] T. Cover and P. Hart, “Nearest neighbor pattern classification,†IEEE

transactions on information theory, vol. 13, no. 1, pp. 21–27, 1967.

[21] A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,†in Foundations of

Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on.

IEEE, 2006, pp. 459–468.

[22] S. Garcia, J. Derrac, J. Cano, and F. Herrera, “Prototype selection for

nearest neighbor classification: Taxonomy and empirical study,†IEEE

transactions on pattern analysis and machine intelligence, vol. 34,

no. 3, pp. 417–435, 2012.

[23] K. Bache and M. Lichman, “Uci machine learning repository,†2013.

View Full Article: