# Modelling House Price Using Ridge Regression and Lasso Regression

• Seng Jia Xin
• Kamil Khalid

2018-11-30

## Keywords:

Adjusted R-squared, Lasso Regression, Ridge Regression, Root mean square error (RMSE)

## Abstract

House price prediction is important for the government, finance company, real estate sector and also the house owner.  The data of the house price at Ames, Iowa in United State which from the year 2006 to 2010 is used for multivariate analysis. However, multicollinearity is commonly occurred in the multivariate analysis and gives a serious effect to the model. Therefore, in this study investigates the performance of the Ridge regression model and Lasso regression model as both regressions can deal with multicollinearity. Ridge regression model and Lasso regression model are constructed and compared. The root mean square error (RMSE) and adjusted R-squared are used to evaluate the performance of the models. This comparative study found that the Lasso regression model is performing better compared to the Ridge regression model. Based on this analysis, the selected variables includes the aspect of  house size, age of house, condition of house and also the location of the house.

## References

[1] Bajari P, Benkard CL & Krainer J, â€œHouse prices and consumer welfareâ€, Journal of Urban Economics, 58(3), (2010), pp.474â€“487.

[2] Amri S & Tularam GA, â€œPerformance of Mulitple Linear Regression and Nonlinear Neural Networks and Fuzzy Logic Techniques in Modelling House Pricesâ€, Journal of Mathematics and Statistics, 8(4), (2012), pp.419â€“434.

[3] Mak S, Choy L & Ho W, â€œQuantile Regression Estimates of Hong Kong Real Estate Pricesâ€, Urban Studies, 47(11), (2010), pp.2461â€“2472.

[4] Limsombunchai V, Gan C, & Lee M, â€œHouse Price Prediction : Hedonic Price Model vs Artificial Neural Networkâ€, American Journal of Applied Sciences, 1(3), (2004), pp.193â€“201.

[5] Graham MH, â€œConfronting multicollinearity in ecological multiple regressionâ€, Ecology, 84(11), (2003), pp.2809-2815.

[6] Kraha A, Turner H, Nimon K, Zientek & Henson RK, â€œInterpreting multiple regression in the face of multicollinearityâ€, Frontiers in Psychology, 3, (2012), pp.1â€“10.

[7] Bin Shafi MA, Bin Rusiman MS and Che Yusof NSH, â€œDeterminants Status of Patient After Receiving Treatment at Intensive Care Unit: A Case Study in Johor Bahruâ€, I4CT 2014 - 1st International Conference on Computer, Communications, and Control Technology, 6914150, (2014), pp.80 â€“ 82.

[8] Pasha GR & Shah MA, â€œApplication of Ridge regression to multicollinear dataâ€, Journal of Research Science, 15(1), (2004), pp.97â€“106.

[9] Meinshausen N & BÃ¼hlmann P, â€œHigh-dimensional graphs and variable selection with the Lassoâ€, The annals of statistics, 34(3), (2006), pp.1436â€“1462.

[10] Calhoun CA, â€œProperty Valuation Methods and Data in the United Statesâ€, Housing Finance International, 16(2), (2001), pp.12â€“23.

[11] Khalid K, Mohamed I and Abdullah NA, â€œAn Additive Outlier Detection Procedure in Random Coefficient Autoregressive Modelsâ€, AIP Conference Proceedings, 1682, (2015), 050017.

[12] Mohamed I, Khalid K And Yahya MS, â€œCombined Estimating Function for Random Coefficient Models with Correlated Errorsâ€, Communications In Statisticsâ€”Theory And Methods, 45(4), (2016), pp.967-975.

[13] Rusiman MS, Hau OC, Abdullah AW, Sufahani SF, Azmi NA, â€œAn Analysis of Time Series for the Prediction of Barramundi (Ikan Siakap) Price in Malaysiaâ€, Far East Journal of Mathematical Sciences, 102(9), (2017) pp.2081-2093.

[14] Hastie T, Tibshirani R & Friedman J, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 1st ed. New York: Springer, (2001).

[15] Wakefield J, Bayesian and Frequentist Regression Methods, 1st ed. New York: Springer Science and Business Media, (2013).

[16] Chai T & Draxler RR, â€œRoot mean square error (RMSE) or mean absolute error (MAE)â€“Arguments against avoiding RMSE in the literatureâ€, Geoscientific Model Development, 7(3), (2014), pp.1247â€“1250.

[17] Rusiman MS, Nasibov E and Adnan R, â€œThe Optimal Fuzzy C-regression Models (OFCRM) in Miles per Gallon of Cars Predictionâ€, Proceedings â€“ 2011 IEEE Student Conference on Research and Development, SCOReD 2011, 6148760, (2011), pp.333-338.

[18] Shafi MA and Rusiman MS, â€œThe Use of Fuzzy Linear Regression Models for Tumor Size in Colorectal Cancer in Hospital of Malaysiaâ€, Applied Mathematical Sciences 9 (56), (2015), pp.2749-2759.

[19] Kutner MH, Nachtsheim CJ & Neter J, Applied Linear Regression Models 4th ed., New York: McGraw-Hill Higher Education, (2003).