Performance analysis on least absolute shrinkage selection operator, elastic net and correlation adjusted  elastic net regression methods

Pascalis Kadaro Matthew; Abubakar Yahaya

doi:10.14419/ijasp.v3i1.4364

Authors

Pascalis Kadaro Matthew
Department of Mathematics,Faculty of Science,Ahmadu Bello University, Zaria,Nigeria.
Abubakar Yahaya
Department of Mathematics,Faculty of Science,Ahmadu Bello University, Zaria,Nigeria.

Received date: February 16, 2015

Accepted date: March 17, 2015

Published date: May 16, 2015

DOI:

https://doi.org/10.14419/ijasp.v3i1.4364

Keywords:

Convex Optimization, Cross Validation, Multicollinearity, Penalized Regression.

Abstract

Some few decades ago, penalized regression techniques for linear regression have been developed specifically to reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this paper, we used a diabetes data set obtained from previous literature to compare three of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Elastic Net and Correlation Adjusted Elastic Net (CAEN). After thorough analysis, it was observed that CAEN generated a less complex model.

References

[1] Adams, J., â€œA computer experiment to evaluate regression strategiesâ€, Proceedings of the Statistical Computing Section, American Statistical Association, (1990), pp: 55 - 62.
[2] Beer, D. G., Kardia, S. L., Huang, C. C, Giordano, T. J., Levin, A. M., â€œGene-expression profiles predict survival of patients with lung adenocarcinomaâ€, Nat. Med., 8, (2002), pp: 816 â€“ 824. http://dx.doi.org/10.1038/nm733.
[3] Bornn, L., Gottardo, R., and Doucet, A., â€œGrouping priors and the Bayesian elasticnetâ€, Technical Report 254, Department of Statistics. University of British Columbia, (2010).
[4] BÃ¸velstad, H. M., Nygard, S., Storvold, H. L., Aldrin, M., Borgan, O., Frigessi, A., Lingjarde, O. C., â€œPredicting survival from microarray data a comparative studyâ€, Bioinformatics, 23, (2007), pp: 2080 â€“ 2087. http://dx.doi.org/10.1093/bioinformatics/btm305.
[5] Breiman, L., Friedman, J., â€œPredicting multiple responses in multiple linear regression (with discussion)â€, Journal of the Royal Statistical Society: Series B59, (1997), pp: 3 â€“ 54. http://dx.doi.org/10.1111/1467-9868.00054.
[6] BÃ¼hlmann, P., van de Geer, S., Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer-Verlag, NewYork, (2011), pp: 97 â€“ 115.
[7] Chen, H. Y., Yu, S. L., Chen, C. H., â€œA five gene signature and clinical outcome in nonâ€“small cell lung cancerâ€, N. Engl. J. Med., 356, (2007), pp: 11 â€“ 20. http://dx.doi.org/10.1056/NEJMoa060096.
[8] Cho, S., Kim, K., Lee, J. K., â€œJoint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysisâ€, Ann. Hum. Genet. 74, (2010), pp: 416 â€“ 428. http://dx.doi.org/10.1111/j.1469-1809.2010.00597.x.
[9] Draper, N. R., Smith, H., Applied Regression Analysis, 2nd Ed. John Wiley and Sons, Inc. New York, (1981), pp: 75 â€“ 95.
[10] Efromyson, M. A., Multiple Regression Analysis. Mathematical Methods for Digital Computers, John Wiley and Sons, Inc. NewYork, (1960), pp: 65 - 79.
[11] Efron, B., Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambrige University Press, Cambrige, UK, (2010), pp: 46 â€“ 67. http://dx.doi.org/10.1017/CBO9780511761362.005.
[12] Efron, B., Turnbull, B. B., Narasimhan, B., Locfdr:Computes Local False Discovery Rates. R-packageVersion1.1-7, (2011), Available online: http://CRAN.R- project.org/package=locfdr.
[13] Fan, J., Li, J., â€œA selective overview of variable selection in high dimensional feature spaceâ€, Stat.Sin, 20, (2010), pp: 101 â€“ 148.
[14] Friedman, J., Hastie, T., Tibshirani, R., â€œRegularization paths for generalized linear models via coordinate descentâ€, Journal of Statistical Software, 33, (2010), pp: 1 â€“ 22.
[15] Friedman, J., Hastie, T., Hoefling, H., Tibshirani, R. â€œPathwise Coordinate Optimizationâ€, Annals of Applied Statistics, 2, (2007), pp: 302 â€“ 332. http://dx.doi.org/10.1214/07-AOAS131.
[16] Hastie, T. R., Tibshirani, R., Friedman, J., Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd Edition, Springer-Verlag, NewYork, (2009), pp: 37 - 71.
[17] Hesterberg, T., Choi, N. H., Meier, L., Fraley, C., â€œLeast angle and L1 penalized regression: a reviewâ€, Statistical .Survey, 2, (2008), pp: 61 â€“ 93. http://dx.doi.org/10.1214/08-SS035.
[18] Hoerl, A. E., Kennard, R., â€œRidge regression: biased estimation for non-orthogonal problemsâ€, Technometrics, 12, (1970), pp: 55 â€“ 67. http://dx.doi.org/10.1080/00401706.1970.10488634.
[19] Hurvich, C., Tsai, C., â€œThe impact of model selection on inference in linear regressionâ€, American Statistician, 44, (1990), pp: 214 - 217. http://dx.doi.org/10.2307/2685338.
[20] Kooperberg, C., LeBlanc, M., Obenchain, V., â€œRisk prediction using genome-wide association studiesâ€, Genet. Epidemiol., 34, (2010), pp: 643 â€“ 652. http://dx.doi.org/10.1002/gepi.20509.
[21] Kutner, M. H., Nachtsheim, C. J., Neter, J., Li, W., Applied linear statistical models (5th edition), McGraw-Hill/Irwin, New York, (2005), pp: 67 - 83.
[22] Kyung, M., Gill, J., Ghosh, M., Casella, G., â€œPenalized regression, standard errors, and Bayesian Lassosâ€, Bay.Anal., 5, (2010), pp: 369 â€“ 412. http://dx.doi.org/10.1214/10-BA607.
[23] Li, Q., Lin, N., â€œThe Bayesian elasticnetâ€, Bay.Anal, 5, (2010), pp: 151 â€“ 170. http://dx.doi.org/10.1214/10-BA506.
[24] Neter, J., Kutner, M. H., Nachtsheim, C. J., Wasserman, W., Applied Linear Regression Models. 3rd Ed. McGraw-Hill/Irwin, Chicago, IL, (1996), pp: 49 - 87.
[25] Shedden, K., Taylor, J. M., Enkemann, S. A., â€œGene expression-based survival prediction in lung adenocarcinoma: a Multi-site, blinded validation studyâ€, Nat. Med., 14, (2008), pp: 822 â€“ 827. http://dx.doi.org/10.1038/nm.1790.
[26] Shieh, G., â€œSuppression situations in multiple linear regressionsâ€, Educational and Psychological Measurement, (2006), pp: 435 - 447. http://dx.doi.org/10.1177/0013164405278584.
[27] SÃ¸rlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen M. B., Van de Rijn, M., Jeffrey, S. S., Thorsen, T., Quist, H., â€œGene expression patterns of breast carcinomas distinguish tumor Subclasses with clinical implicationsâ€, Proc. Natl Acad. Sci. USA, 98, (2001), 10869â€“10877. http://dx.doi.org/10.1073/pnas.191367098.
[28] Tan, Q., Correlation Adjusted Penalization in Regression Analysis. PhD Thesis, (2012), Department of Statistics, University of Manitoba.
[29] Tibshirani, R., â€œRegression shrinkage and selection via the lassoâ€. Journal of Royal Statistical Society, B58, (1996), pp: 267 â€“ 288.
[30] Turlach, B., Venables, W., Wright, S., â€œSimultaneous variable selectionâ€, Technometrics, 47, (2005), pp: 349 â€“ 363. http://dx.doi.org/10.1198/004017005000000139.
[31] Tutz, G., Ulbricht, J., â€œPenalized regression with correlation-based penaltyâ€, Statistical Computing, 19, (2009), pp: 239 â€“ 253. http://dx.doi.org/10.1007/s11222-008-9088-5.
[32] Van de Vijver, M. J., He, Y. D., Vanâ€™t Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., â€œA gene-expression signature as a predictor of survival in breast cancerâ€, N. Engl. J. Med., 347, (2002), pp: 1999 â€“ 2009. http://dx.doi.org/10.1056/NEJMoa021967.
[33] Wigle, D. A., Jurisica, I., Radulovich, N., Pintilie, M., Rossant, J., Liu, N., Lu, C., Woodgett, J., â€œMolecular profiling of non-small cell lung cancer and correlation with disease-free survivalâ€, Cancer Res., 62, (2002), pp: 3005 â€“ 3008.
[34] Wahba, G., â€œSplines models for observational dataâ€, SIAM CBMS-NFS regional conference in applied mathematics, V.59, (1990).
[35] Zhao, P., Yu, B., â€œOn model selection consistency of Lassoâ€, Journal of Machine Learning Research, 7, (2006), pp: 2541 - 2563.
[36] Zou, H., Hastie, T., â€œRegularization and variable selection via the elastic netâ€, Journal of Royal Statistical Society, B67, (2005), pp: 301 - 320. http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x

Performance analysis on least absolute shrinkage selection operator, elastic net and correlation adjusted elastic net regression methods

Authors

Pascalis Kadaro Matthew

Abubakar Yahaya

DOI:

Keywords:

Abstract

References

Downloads