Classification Rule Generation for Cancer Prediction using Locality Sensitive Hashing Similarity Measure
Keywords:CBR (Case Based Reasoning), Discretization, Euclidean Distance Metric, Gaussian distribution, LSH (Locality Sensitive Hashing).
This paper aims to develop a decision support system for healthcare in predicting stage of cancer (whether benign or malignant) using a novel classifier technique based on Locality Sensitive Hashing (LSH). We propose a new classification rule generations scheme based on Locality Sensitive Hashing. By applying LSH based classification instance selection algorithms, we get a minimal set of class representative patterns, on which we apply discretization and classification rule generation manually. Thus, have high chances of coming up with best prediction. Confusion matrix is used to compare test results. The above technique is applied on two datasets â€“Iris and Breast Cancer Wisconsin. We get better accuracy, specificity, sensitivity and precision than traditional classifiers. Manual diagnosis takes time and is a trial-error procedure and needs knowledge from medical specialists. We better the accuracy and speed of this manual procedure. classification model concept is used.
 D. Rossille, J.-F. Laurent, and A. Burgun, â€œModelling a decisionsupport system for oncology using rule-based and case-based reasoning
methodologies,â€ International journal of medical informatics, vol. 74,
no. 2-4, pp. 299â€“306, 2005.
 C. Marling, M. Sqalli, E. Rissland, H. Munoz-Avila, and D. Aha, Ëœ
â€œCase-based reasoning integrations,â€ AI magazine, vol. 23, no. 1, p. 69,
 J. Prentzas and I. Hatzilygeroudis, â€œCategorizing approaches combining rule-based and case-based reasoning,â€ Expert Systems, vol. 24,
no. 2, pp. 97â€“122, 2007.
 J. Kolodner, Case-based reasoning. Morgan Kaufmann, 2014.
 R. Saraiva, M. Perkusich, L. Silva, H. Almeida, C. Siebra, and A. Perkusich, â€œEarly diagnosis of gastrointestinal cancer by using case-based
and rule-based reasoning,â€ Expert Systems with Applications, vol. 61,
pp. 192â€“202, 2016.
 H. Gomez-Vallejo, B. Uriel-Latorre, M. Sande-Meijide, B. Villamar Â´ Â´Ä±nBello, R. Pavon, F. Fdez-Riverola, and D. Glez-Pe Â´ na, â€œA case-based Ëœ
reasoning system for aiding detection and classification of nosocomial
infections,â€ Decision Support Systems, vol. 84, pp. 104â€“116, 2016.
 A. Mansoul and B. Atmani, â€œClustering to enhance case-based reasoning,â€ in Modelling and Implementation of Complex Systems. Springer,
2016, pp. 137â€“151.
 S. Petrovic, G. Khussainova, and R. Jagannathan, â€œKnowledge-light
adaptation approaches in case-based reasoning for radiotherapy treatment planning,â€ Artificial intelligence in medicine, vol. 68, pp. 17â€“28,
 P. Chazara, S. Negny, and L. Montastruc, â€œFlexible knowledge representation and new similarity measure: Application on case based
reasoning for waste treatment,â€ Expert Systems with Applications,
vol. 58, pp. 143â€“154, 2016.
 Y. Shen, J. Colloc, A. Jacquet-Andrieu, and K. Lei, â€œEmerging medical
informatics with case-based reasoning for aiding clinical decision in
multi-agent system,â€ Journal of biomedical informatics, vol. 56, pp.
 J. Vilhena, H. Vicente, M. R. Martins, J. M. Graneda, F. Caldeira, Ëœ
R. Gusmao, J. Neves, and J. Neves, â€œA case-based reasoning view of Ëœ
thrombophilia risk,â€ Journal of biomedical informatics, vol. 62, pp.
 A. Arnaiz-Gonz Â´ alez, J.-F. D Â´ Â´Ä±ez-Pastor, J. J. RodrÂ´Ä±guez, and C. GarcÂ´Ä±aOsorio, â€œInstance selection of linear complexity for big data,â€
Knowledge-Based Systems, vol. 107, pp. 83â€“95, 2016.
 X. Gu, Y. Zhang, L. Zhang, D. Zhang, and J. Li, â€œAn improved
method of locality sensitive hashing for indexing large-scale and highdimensional features,â€ Signal Processing, vol. 93, no. 8, pp. 2244â€“2255,
 M. Slaney and M. Casey, â€œLocality-sensitive hashing for finding nearest
neighbors [lecture notes],â€ IEEE Signal processing magazine, vol. 25,
no. 2, pp. 128â€“131, 2008.
 J. Oliver, C. Cheng, and Y. Chen, â€œTlshâ€“a locality sensitive hash,â€
in Cybercrime and Trustworthy Computing Workshop (CTC), 2013
Fourth. IEEE, 2013, pp. 7â€“13.
 L. Pauleve, H. J Â´ egou, and L. Amsaleg, â€œLocality sensitive hashing: A Â´
comparison of hash function types and querying mechanisms,â€ Pattern
Recognition Letters, vol. 31, no. 11, pp. 1348â€“1358, 2010.
 G. Shakhnarovich, T. Darrell, and P. Indyk, Nearest-neighbor methods in learning and vision: theory and practice (neural information
processing). The MIT press, 2006.
 J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive
datasets. Cambridge university press, 2014.
 B. Van Durme and A. Lall, â€œOnline generation of locality sensitive hash
signatures,â€ in Proceedings of the ACL 2010 conference short papers.
Association for Computational Linguistics, 2010, pp. 231â€“235.
 T. Cover and P. Hart, â€œNearest neighbor pattern classification,â€ IEEE
transactions on information theory, vol. 13, no. 1, pp. 21â€“27, 1967.
 A. Andoni and P. Indyk, â€œNear-optimal hashing algorithms for approximate nearest neighbor in high dimensions,â€ in Foundations of
Computer Science, 2006. FOCSâ€™06. 47th Annual IEEE Symposium on.
IEEE, 2006, pp. 459â€“468.
 S. Garcia, J. Derrac, J. Cano, and F. Herrera, â€œPrototype selection for
nearest neighbor classification: Taxonomy and empirical study,â€ IEEE
transactions on pattern analysis and machine intelligence, vol. 34,
no. 3, pp. 417â€“435, 2012.
 K. Bache and M. Lichman, â€œUci machine learning repository,â€ 2013.