Performance analysis on least absolute shrinkage selection operator, elastic net and correlation adjusted elastic net regression methods

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Some few decades ago, penalized regression techniques for linear regression have been developed specifically to reduce the flaws inherent in the prediction accuracy of the classical ordinary least squares (OLS) regression technique. In this paper, we used a diabetes data set obtained from previous literature to compare three of these well-known techniques, namely: Least Absolute Shrinkage Selection Operator (LASSO), Elastic Net and Correlation Adjusted Elastic Net (CAEN). After thorough analysis, it was observed that CAEN generated a less complex model.


  • Keywords


    Convex Optimization; Cross Validation; Multicollinearity; Penalized Regression.

  • References


      [1] Adams, J., “A computer experiment to evaluate regression strategies”, Proceedings of the Statistical Computing Section, American Statistical Association, (1990), pp: 55 - 62.

      [2] Beer, D. G., Kardia, S. L., Huang, C. C, Giordano, T. J., Levin, A. M., “Gene-expression profiles predict survival of patients with lung adenocarcinoma”, Nat. Med., 8, (2002), pp: 816 – 824. http://dx.doi.org/10.1038/nm733.

      [3] Bornn, L., Gottardo, R., and Doucet, A., “Grouping priors and the Bayesian elasticnet”, Technical Report 254, Department of Statistics. University of British Columbia, (2010).

      [4] Bøvelstad, H. M., Nygard, S., Storvold, H. L., Aldrin, M., Borgan, O., Frigessi, A., Lingjarde, O. C., “Predicting survival from microarray data a comparative study”, Bioinformatics, 23, (2007), pp: 2080 – 2087. http://dx.doi.org/10.1093/bioinformatics/btm305.

      [5] Breiman, L., Friedman, J., “Predicting multiple responses in multiple linear regression (with discussion)”, Journal of the Royal Statistical Society: Series B59, (1997), pp: 3 – 54. http://dx.doi.org/10.1111/1467-9868.00054.

      [6] Bühlmann, P., van de Geer, S., Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer-Verlag, NewYork, (2011), pp: 97 – 115.

      [7] Chen, H. Y., Yu, S. L., Chen, C. H., “A five gene signature and clinical outcome in non–small cell lung cancer”, N. Engl. J. Med., 356, (2007), pp: 11 – 20. http://dx.doi.org/10.1056/NEJMoa060096.

      [8] Cho, S., Kim, K., Lee, J. K., “Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis”, Ann. Hum. Genet. 74, (2010), pp: 416 – 428. http://dx.doi.org/10.1111/j.1469-1809.2010.00597.x.

      [9] Draper, N. R., Smith, H., Applied Regression Analysis, 2nd Ed. John Wiley and Sons, Inc. New York, (1981), pp: 75 – 95.

      [10] Efromyson, M. A., Multiple Regression Analysis. Mathematical Methods for Digital Computers, John Wiley and Sons, Inc. NewYork, (1960), pp: 65 - 79.

      [11] Efron, B., Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambrige University Press, Cambrige, UK, (2010), pp: 46 – 67. http://dx.doi.org/10.1017/CBO9780511761362.005.

      [12] Efron, B., Turnbull, B. B., Narasimhan, B., Locfdr:Computes Local False Discovery Rates. R-packageVersion1.1-7, (2011), Available online: http://CRAN.R- project.org/package=locfdr.

      [13] Fan, J., Li, J., “A selective overview of variable selection in high dimensional feature space”, Stat.Sin, 20, (2010), pp: 101 – 148.

      [14] Friedman, J., Hastie, T., Tibshirani, R., “Regularization paths for generalized linear models via coordinate descent”, Journal of Statistical Software, 33, (2010), pp: 1 – 22.

      [15] Friedman, J., Hastie, T., Hoefling, H., Tibshirani, R. “Pathwise Coordinate Optimization”, Annals of Applied Statistics, 2, (2007), pp: 302 – 332. http://dx.doi.org/10.1214/07-AOAS131.

      [16] Hastie, T. R., Tibshirani, R., Friedman, J., Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd Edition, Springer-Verlag, NewYork, (2009), pp: 37 - 71.

      [17] Hesterberg, T., Choi, N. H., Meier, L., Fraley, C., “Least angle and L1 penalized regression: a review”, Statistical .Survey, 2, (2008), pp: 61 – 93. http://dx.doi.org/10.1214/08-SS035.

      [18] Hoerl, A. E., Kennard, R., “Ridge regression: biased estimation for non-orthogonal problems”, Technometrics, 12, (1970), pp: 55 – 67. http://dx.doi.org/10.1080/00401706.1970.10488634.

      [19] Hurvich, C., Tsai, C., “The impact of model selection on inference in linear regression”, American Statistician, 44, (1990), pp: 214 - 217. http://dx.doi.org/10.2307/2685338.

      [20] Kooperberg, C., LeBlanc, M., Obenchain, V., “Risk prediction using genome-wide association studies”, Genet. Epidemiol., 34, (2010), pp: 643 – 652. http://dx.doi.org/10.1002/gepi.20509.

      [21] Kutner, M. H., Nachtsheim, C. J., Neter, J., Li, W., Applied linear statistical models (5th edition), McGraw-Hill/Irwin, New York, (2005), pp: 67 - 83.

      [22] Kyung, M., Gill, J., Ghosh, M., Casella, G., “Penalized regression, standard errors, and Bayesian Lassos”, Bay.Anal., 5, (2010), pp: 369 – 412. http://dx.doi.org/10.1214/10-BA607.

      [23] Li, Q., Lin, N., “The Bayesian elasticnet”, Bay.Anal, 5, (2010), pp: 151 – 170. http://dx.doi.org/10.1214/10-BA506.

      [24] Neter, J., Kutner, M. H., Nachtsheim, C. J., Wasserman, W., Applied Linear Regression Models. 3rd Ed. McGraw-Hill/Irwin, Chicago, IL, (1996), pp: 49 - 87.

      [25] Shedden, K., Taylor, J. M., Enkemann, S. A., “Gene expression-based survival prediction in lung adenocarcinoma: a Multi-site, blinded validation study”, Nat. Med., 14, (2008), pp: 822 – 827. http://dx.doi.org/10.1038/nm.1790.

      [26] Shieh, G., “Suppression situations in multiple linear regressions”, Educational and Psychological Measurement, (2006), pp: 435 - 447. http://dx.doi.org/10.1177/0013164405278584.

      [27] Sørlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen M. B., Van de Rijn, M., Jeffrey, S. S., Thorsen, T., Quist, H., “Gene expression patterns of breast carcinomas distinguish tumor Subclasses with clinical implications”, Proc. Natl Acad. Sci. USA, 98, (2001), 10869–10877. http://dx.doi.org/10.1073/pnas.191367098.

      [28] Tan, Q., Correlation Adjusted Penalization in Regression Analysis. PhD Thesis, (2012), Department of Statistics, University of Manitoba.

      [29] Tibshirani, R., “Regression shrinkage and selection via the lasso”. Journal of Royal Statistical Society, B58, (1996), pp: 267 – 288.

      [30] Turlach, B., Venables, W., Wright, S., “Simultaneous variable selection”, Technometrics, 47, (2005), pp: 349 – 363. http://dx.doi.org/10.1198/004017005000000139.

      [31] Tutz, G., Ulbricht, J., “Penalized regression with correlation-based penalty”, Statistical Computing, 19, (2009), pp: 239 – 253. http://dx.doi.org/10.1007/s11222-008-9088-5.

      [32] Van de Vijver, M. J., He, Y. D., Van’t Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., “A gene-expression signature as a predictor of survival in breast cancer”, N. Engl. J. Med., 347, (2002), pp: 1999 – 2009. http://dx.doi.org/10.1056/NEJMoa021967.

      [33] Wigle, D. A., Jurisica, I., Radulovich, N., Pintilie, M., Rossant, J., Liu, N., Lu, C., Woodgett, J., “Molecular profiling of non-small cell lung cancer and correlation with disease-free survival”, Cancer Res., 62, (2002), pp: 3005 – 3008.

      [34] Wahba, G., “Splines models for observational data”, SIAM CBMS-NFS regional conference in applied mathematics, V.59, (1990).

      [35] Zhao, P., Yu, B., “On model selection consistency of Lasso”, Journal of Machine Learning Research, 7, (2006), pp: 2541 - 2563.

      [36] Zou, H., Hastie, T., “Regularization and variable selection via the elastic net”, Journal of Royal Statistical Society, B67, (2005), pp: 301 - 320. http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x


 

View

Download

Article ID: 4364
 
DOI: 10.14419/ijasp.v3i1.4364




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.