Logistic Regression and Data Analysis on Privacy Methods on Data Streams


  • P Chandrakanth
  • Anbarasi M.S






concept drift, Logistic Regression, data utility, data streams, data Privacy, Privacy Preserving in Data Mining (PPDM).


The problem data privacy in streams is completely put in a myopic view by hitherto researchers. Research and experimentations have been well fortified on static data, in which predominantly spelled easy with approaches based on perturbation using random data values. Approaches based on large data sets and high dimension data sets are not adequate consequences. By using the phenomenon of autocorrelation of multivariable streams and their leveraging structures, identifying the suitable areas to add noise maximally preserves privacy and in a irreversible manner. Drift checking and ensemble classifier building is the basic requirements for privacy preserving data stream, which makes clear in experimentation with the support of sensitivity analysis. In this paper we present the results of experimentation at all the stages.




[1] Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters 27.8 (2006): 861-874.

[2] Walker, SH; Duncan, DB (1967). "Estimation of the probability of an event as a function of several independent variables". Biometrika. 54: 167-178. doi:10.2307/2333860.

[3] Jump up ^ Cox, DR (1958). "The regression analysis of binary sequences (with discussion)". J Roy Stat Soc B. 20: 215-242. JSTOR 2983890.

[4] Charu C. Aggarwal and Philip S. Yu, “Privacy-Preserving Data Mining - Models and Algorithmsâ€, © 2008 Springer Science+Business Media, LLC. ISBN: 978-0-387-70991-8 [524 pages].

[5] Jaideep Vaidya, Chris Clifton and Michael Zhu, “Privacy Preserving Data Miningâ€, © 2006 Springer Science+Business Media, Inc.

[6] Yaping Li, Minghua Chen, Qiwei Li, and Wei Zhang, “Enabling Multilevel Trust in Privacy Preserving Data Miningâ€, IEEE Transactions On Knowledge And Data Engineering, Vol. 24, No. 9, Pp. 1598, © September 2012.

[7] Aristides Gionis and Tamir Tassa, “k-Anonymization with Minimal Loss of Informationâ€, IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No.2 pp.205, © February 2009.

[8] Murat Kantarcioglou and Chris Clifton, “Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Dataâ€, IEEE Transactions on Knowledge and Data Engineering, Vol. 16. No.9, pp.1025, © September 2004.

[9] Tamir Tassa, “Secure Mining of Association Rules in Horizontally Distributed Databasesâ€, IEEE Transactions on Knowledge Discovery and Data Engineering, Vol. 26. No. 4, pp.969. © April 2014.

[10] Xue, Yanbing, and Milos Hauskrecht. “Active learning of classification models with Likert-scale feedback.†Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2017.

[11] Guigo, Roderic, et al. “An assessment of gene prediction accuracy in large DNA sequences.†Genome Research 10.10 (2000): 1631-1642.

[12] Boone, Harry N., and Deborah A. Boone. “Analyzing Likert data.†Journal of extension 50.2 (2012): 1-5.

[13] Wu, Huiping, and Shing-On Leung. “Can Likert Scales be Treated as Interval Scales?—A Simulation Study.†Journal of Social Service Research 43.4 (2017): 527-532.

[14] Cao, Xi Hang, Ivan Stojkovic, and Zoran Obradovic. “A robust data scaling algorithm to improve classification accuracies in biomedical data.†BMC bioinformatics 17.1 (2016): 359.

[15] Bornholt, James, et al. “A DNA-based archival storage system.†ACM SIGOPS Operating Systems Review 50.2 (2016): 637-649.

[16] Chormunge, Smita, and Sudarson Jena. “Efficient Feature Subset Selection Algorithm for High Dimensional Data.†International Journal of Electrical and Computer Engineering6.4 (2016): 1880.

[17] Hira, Zena M., and Duncan F. Gillies. “A review of feature selection and feature extraction methods applied on microarray data.†Advances in bioinformatics 2015 (2015).

[18] Tijmstra, Jesper, Maria Bolsinova, and Minjeong Jeon. “General mixture item response models with different item response structures: Exposition with an application to Likert scales.†Behavior research methods (2018): 1-20.

[19] Hochbaum, Dorit S., and Philipp Baumann. “Sparse computation for large-scale data mining.†IEEE Transactions on Big Data 2.2 (2016): 151-174.

[20] Göb, Rainer, Christopher McCollin, and Maria Fernanda Ramalhoto. “Ordinal methodology in the analysis of Likert scales.†Quality & Quantity 41.5 (2007): 601-626.

[21] Koufakou, Anna, Justin Gosselin, and Dahai Guo. “Using data mining to extract knowledge from student evaluation comments in undergraduate courses.†Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.

[22] Michalopoulou, Catherine, and Maria Symeonaki. “Improving Likert Scale Raw Scores Interpretability with K-means Clustering.†Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 135.1 (2017): 101-109.

[23] Jain, Y. Kumar, and Santosh Kumar Bhandare. “Min max normalization based data perturbation method for privacy protection.†International Journal of Computer & Communication Technology 2.8 (2011): 45-50.

[24] Fernandes, Maria, et al. “Sensitivity Levels: Optimizing the Performance of Privacy Preserving DNA Alignment.†bioRxiv (2018): 292227.

[25] Prasser, Fabian, et al. “Lightning: Utility-Driven Anonymization of High-Dimensional Data.†Transactions on Data Privacy 9.2 (2016): 161-185.

View Full Article:

How to Cite

Chandrakanth, P., & M.S, A. (2018). Logistic Regression and Data Analysis on Privacy Methods on Data Streams. International Journal of Engineering & Technology, 7(3.12), 411–414. https://doi.org/10.14419/ijet.v7i3.12.16117
Received 2018-07-23
Accepted 2018-07-23
Published 2018-07-20