Bias and Variance Trade-off in Regression Models

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables [3]. Regression is a supervised form of machine learning which learns from training data. A good regression model, trained over a sample, helps us estimate the population relationship as close as possible, No regression model is perfect and can model the entire population without any error. Error in regression models classified as reducible and irreducible errors. The reducible error consists of Bias and variance, which impacts the predictive power of our model [8]. The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting) [4]. The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting)[4]. In this article we presented graphically how model complexity, sample size and number of features or independent varialbles affect Bias and variance. We used some simulated and some actual data sets to show these dependency relationships.