Understanding Adjusted and Unadjusted Coefficient of Determination (R²)

Understanding Adjusted and Unadjusted Coefficient of Determination (R²)

In regression analysis, the coefficient of determination, often denoted as , is a key metric used to assess the goodness-of-fit of a model. It indicates how well the independent variables in the model explain the variation in the dependent variable. While the unadjusted R² gives an overall measure of fit, the adjusted R² provides a more nuanced measure, particularly when dealing with multiple predictors. In this post, we will explain both adjusted and unadjusted R², their differences, and when each is appropriate to use.

What is the Unadjusted Coefficient of Determination (R²)?

The unadjusted R², or simply R², measures the proportion of variance in the dependent variable that is explained by the independent variables in a regression model. It is calculated as the ratio of the explained variance to the total variance.

Mathematically, the formula for R² is:

R² = 1 - (SS_res / SS_tot)
    

Where:

  • SS_res is the sum of squares of residuals (the variation not explained by the model).
  • SS_tot is the total sum of squares (the total variation in the dependent variable).

The value of R² ranges from 0 to 1:

  • An R² value of 0 means that the independent variables do not explain any of the variation in the dependent variable.
  • An R² value of 1 means that the independent variables perfectly explain all the variation in the dependent variable.

While R² is a useful metric, it has limitations. Specifically, R² always increases as more variables are added to the model, even if those variables do not contribute significantly to explaining the variation in the dependent variable. This is where the adjusted R² comes into play.

What is the Adjusted Coefficient of Determination (Adjusted R²)?

The adjusted R² adjusts the R² value for the number of predictors in the model. It penalizes the addition of unnecessary independent variables that do not significantly improve the model. This makes adjusted R² a more accurate measure of model performance, especially when comparing models with different numbers of predictors.

The formula for adjusted R² is:

Adjusted R² = 1 - [ (1 - R²) * (n - 1) / (n - k - 1) ]
    

Where:

  • n is the number of observations (data points).
  • k is the number of independent variables (predictors).

The adjusted R² can be lower than the unadjusted R² if the added predictors do not significantly improve the model. Unlike R², adjusted R² can decrease when irrelevant predictors are added, making it a better indicator of the true explanatory power of the model.

Key Differences Between R² and Adjusted R²

  • always increases or stays the same as more variables are added, regardless of their relevance.
  • Adjusted R² increases only if the new variable improves the model; it decreases if the variable does not contribute significantly.
  • Adjusted R² accounts for the number of predictors and penalizes overfitting, while R² does not.
  • Adjusted R² is typically lower than R², especially in models with many predictors that do not improve the model's performance.

Conclusion

Both R² and adjusted R² are valuable tools for assessing the goodness-of-fit in regression models. However, while R² gives a general sense of how well the model explains the variance in the dependent variable, adjusted R² provides a more reliable measure by accounting for the number of predictors. When building complex models or comparing models with different sets of predictors, adjusted R² is the preferred metric for evaluating the true explanatory power of the model.

Previous
Previous

Understanding PDFs and CDFs of Probability Distributions

Next
Next

Understanding Sampling With and Without Replacement