Cross- Validation Method for Overfitting Research Data using R Programming
Abstract
This review analytical paper clearly discusses the mathematically compression between various types of regression analysis using R programming when data sets of large dimension (Boston housing) of having overfitting and multicolinarity problems. Whose machine learning outputs were analysed with different log lambda and varimp plots were analysed and explained sufficiently for interpretation of data. Its primary purpose is to explain the different types of regression models (like as Ridge, Lasso and Elastic net) whose data structure were suffers from cross validation and overfitted data structure using R software, whose mixing percentage plot, log lambda plots were sufficiently explain with various intermediate output and graphical interpretation to reach the final conclusion. Beside that this paper clearly shows the steps for analysing ridge, lasso and elastic net regression with different sample data sets so that the research gap between which tools will be followed by the researcher after data collection will clearly explained when the data sets of different attributes on relationship. Therefore, this paper presents easiest way of regression analysis when data sets with multicolinarity and its strengths for data analysis using R programming.
References
Cirillo, A. (2017). Data mining r for beginners. University of Muntiplier.
Eaton, F. (2018). The collinearity problem in regression discontinuity model. University of Multiplier.
Fiona, G. (2018). Overfitting regression analysis. www.bmj.com.
Ihaka, R. (1996). Mining big data: Current status and forecast to the future.
Jeevan, M. (2018). How i choose the right programming language for data science. Data Science.
Kopf, D. (2017). Which programming should you learn. www.http://qz.com/python.
Lane, D. (2018). Measures of central tendency, variability, introduction to sampling distributions, sampling distribution of the mean, introduction to estimation, degrees of freedom. Introduction to Estimation, Degrees of Freedom.
Michis, S. (2018). Computer world data analysis. Applied Statistics.
Nasridinov, A. (2013). The third international conference on ieee, visual analytics for big data using r. In Cloud and Green Computing (CGC).
Piatetsky, G. (2014). Four machine language. Analytics data-mining.
Rimal, Y. (2018). Cross- validation method for obverting research data using r programming. ICC.
Smith, D. (2018). Revolutionary analytics. Blog, Revolutionary Analytics.