Cross- Validation Method for Overfitting Research Data using R Programming

  • Yagyanath Rimal Faculty of Science and Technology, Pokhara University, Nepal
Keywords: Data Analytics, Big Data, Regression Equation, Multicolinarity, Overfitting, Lasso, Ridge, Elastic Net

Abstract

This review analytical paper clearly discusses the mathematically compression between various types of regression analysis using R programming when data sets of large dimension (Boston housing) of having overfitting and multicolinarity problems. Whose machine learning outputs were analysed with different log lambda and varimp plots were analysed  and explained sufficiently for interpretation of data. Its primary purpose is to explain the different types of regression models (like as Ridge, Lasso and Elastic net)  whose data structure were suffers from cross validation and overfitted data structure using R software, whose  mixing percentage plot, log lambda  plots were sufficiently explain with various intermediate output and graphical interpretation to reach the final conclusion. Beside that this paper clearly shows the steps for analysing ridge,  lasso and elastic net regression with different sample data sets so that the research gap between which tools will be followed by the researcher after data collection will clearly explained when the data sets of different attributes on relationship. Therefore, this paper presents easiest way of regression analysis when data sets with multicolinarity and its strengths for data analysis using R programming.

References

Cirillo, A. (2017). Data mining r for beginners. University of Muntiplier.

Eaton, F. (2018). The collinearity problem in regression discontinuity model. University of Multiplier.

Fiona, G. (2018). Overfitting regression analysis. www.bmj.com.

Ihaka, R. (1996). Mining big data: Current status and forecast to the future.

Jeevan, M. (2018). How i choose the right programming language for data science. Data Science.

Kopf, D. (2017). Which programming should you learn. www.http://qz.com/python.

Lane, D. (2018). Measures of central tendency, variability, introduction to sampling distributions, sampling distribution of the mean, introduction to estimation, degrees of freedom. Introduction to Estimation, Degrees of Freedom.

Michis, S. (2018). Computer world data analysis. Applied Statistics.

Nasridinov, A. (2013). The third international conference on ieee, visual analytics for big data using r. In Cloud and Green Computing (CGC).

Piatetsky, G. (2014). Four machine language. Analytics data-mining.

Rimal, Y. (2018). Cross- validation method for obverting research data using r programming. ICC.

Smith, D. (2018). Revolutionary analytics. Blog, Revolutionary Analytics.

Published
2020-09-04
How to Cite
Rimal, Y. (2020). Cross- Validation Method for Overfitting Research Data using R Programming. CENTRAL ASIAN JOURNAL OF MATHEMATICAL THEORY AND COMPUTER SCIENCES, 1(11), 27-34. Retrieved from https://cajmtcs.centralasianstudies.org/index.php/CAJMTCS/article/view/3
Section
Articles