OLS & Multiple Regression
OLS Β· Coefficients Β· Model Fit Β· Multiple Regression
Β© Dr. Rainer DΓΌsing Β· Interactive Tools by Claude
N = 30 Students X₁ = Study time (h/week) Xβ‚‚ = Sleep hours (h/night) Y = Exam score (0–100 pts)
The Least-Squares Principle
Why does OLS minimize squared deviations β€” and what does this mean?
Study time (h/week) β†’ Exam score
bβ‚€
β€”
b₁
β€”
RΒ²
β€”
RSS
β€”
RSS comparison (smaller = better)
Your line
β€”
OLS (min)
β€”
Model:
Y = β€” + β€” Β· X₁
Tutorial β€” The Least-Squares Principle

What you see
30 simulated students: Study time X₁ (h/week) β†’ Exam score Y (points). The orange line is your estimation β€” draggable via two round anchors at the left and right edges. Enable ● Residuals: green bars = positive residuals (point lies above the line), red bars = negative residuals (point lies below). A residual eα΅’ = yα΅’ βˆ’ Ε·α΅’ is the deviation of the observed from the predicted value.

What to do
Drag the anchors until your RSS (orange bar, absolute value) is as small as possible. Then click β—Ž Show OLS β€” the blue OLS line appears. The second bar shows its RSS. Comparison: How close did you get to the minimum?

What is RSS? (Residual Sum of Squares)
RSS = Ξ£eα΅’Β² = Ξ£(yα΅’ βˆ’ Ε·α΅’)Β². OLS finds analytically the exact bβ‚€ and b₁ for which RSS is globally minimal β€” uniquely and provably. No trial and error: b₁ = Cov(X,Y)/Var(X), bβ‚€ = Θ³ βˆ’ b₁·xΜ„.

Why squares, not absolute values?
(1) For every OLS line Ξ£eα΅’ = 0 β€” positive and negative residuals trivially cancel out. (2) Squaring penalizes large residuals disproportionately: a residual of 10 costs 100, not 10. (3) The square provides a differentiable function β†’ closed, unique solution.

Gauss-Markov: Under the 5 OLS assumptions (β†’ Module β‘’) OLS is the BLUE β€” Best Linear Unbiased Estimator. No other linear unbiased estimator has smaller variance.

What is a Residual? eα΅’ = yα΅’ βˆ’ yα΅’ is the deviation of the observed value from the predicted. Why square? First, positive and negative residuals are treated equally (|+3| = |βˆ’3|). Second, large residuals are penalized disproportionately: a residual of 10 counts 100, not 10. OLS finds the unique bβ‚€, b₁ for which: Ξ£eα΅’ = 0 and Ξ£eα΅’Β² is minimal β€” this solution is unique.
I β€” linear
II β€” quadratic
III β€” outlier
IV β€” leverage point
All four datasets have (nearly) identical statistics: b₁ β‰ˆ 0.50, bβ‚€ β‰ˆ 3.0, r β‰ˆ .816, RΒ² β‰ˆ .667. Yet scatterplot and residual plot show fundamentally different patterns. Conclusion: Coefficients and RΒ² alone are not enough β€” residual diagnostics are mandatory.
Understanding bβ‚€, b₁ and Ξ²
Slope, intercept and standardized coefficient β€” visually and formally
Regression line with coefficient visualization
bβ‚€
β€”
b₁
β€”
Ξ²
β€”
b₁: Per +1 h study time the expected score increases by β€” points.
Ξ² = r = β€” β€” bivariate: Ξ² = r always.
bβ‚€: Expected score at 0 h study time (extrapolated, often not meaningful in context).
Tutorial β€” Understanding bβ‚€, b₁ and Ξ²

What you see
The OLS line through all 30 data points. Enable β–³ Slope for the slope triangle and βœ› Mean for the centroid (xΜ„, Θ³) in the plot.

b₁ = β€” β€” Slope (unstandardized)
Per +1 h study time, the expected score increases by β€” points. Unit: points/hour. Interpretation: descriptive, not a causal effect. b₁ is scale-dependent.
The slope triangle in the plot shows concretely: +3 h study time β†’ +β€” points (= 3 Β· b₁).

Ξ² = β€” β€” standardized coefficient
Per +1 SD in X₁, Y increases by Ξ² SD. Bivariate: Ξ² = r (Pearson correlation) always. Ξ² allows comparison of predictors on different scales within one study.
Important: Ξ² does not directly indicate importance β€” with correlated predictors Ξ² can deviate strongly from the partial effect (β†’ Module β‘£).

bβ‚€ = β€” β€” Intercept
Expected score at X₁ = 0 h study time. Since 0 h lies outside the observed range, this is an extrapolation β€” usually not meaningfully interpretable in context.

Centroid (xΜ„ = β€” h | Θ³ = β€” pts)
The OLS line always passes through (xΜ„, Θ³) β€” mathematically necessary, since bβ‚€ = Θ³ βˆ’ b₁·xΜ„.

Calculation formulas:
b₁ = Ξ£(xα΅’ βˆ’ x)(yα΅’ βˆ’ y) / Ξ£(xα΅’ βˆ’ x)Β² = Cov(X,Y) / Var(X)
bβ‚€ = y βˆ’ b₁ Β· x  Β·  Ξ² = b₁ Β· (SDX / SDY) = r  (bivariate)
b₁ (unstandardized) is reported when the unit is substantively meaningful: "+1 h study time β†’ +β€” points".

Ξ² (standardized) allows comparison of predictors on different scales within one study. Caution: Ξ² varies with SD(X) and SD(Y) β€” comparisons between studies are not valid.

In bivariate OLS: Ξ² = r = β€” always. In the multiple model (Module β‘£) Ξ² β‰  r.
Model Fit & Variance Decomposition
SST = SSR + RSS, RΒ², adjusted RΒ² and the assumptions of the OLS model
Total variance (TSS) β€” Deviations from mean Θ³
TSS = SSR + RSS
SSR
RSS
SSR = β€” SSE = β€”
RΒ²
β€”
adj. RΒ²
β€”
fΒ²
β€”
r (Pearson)
β€”
RΒ² = SSR/TSS = β€”% of total variance in Y is explained by X₁.
fΒ² = RΒ²/(1βˆ’RΒ²) = SSR/RSS:  small β‰₯ .02 Β· medium β‰₯ .15 Β· large β‰₯ .35
Tutorial β€” Variance Decomposition & Model Fit

What you see
TSS mode: Θ³ as a blue dashed line; blue bars show each point's deviation from Θ³. SSR + RSS mode: Shows the OLS line's residuals β€” green bars = positive residuals (point above line), red bars = negative residuals (point below line). The bar shows the SSR : RSS ratio of the variance decomposition.

TSS = β€” β€” Total Sum of Squares
TSS = Ξ£(yα΅’ βˆ’ Θ³)Β². Total variance in Y β€” does not depend on the model.

SSR = β€” (β€”% of TSS) β€” Sum of Squares Regression (Model)
SSR = Ξ£(Ε·α΅’ βˆ’ Θ³)Β². How far are the predicted values Ε·α΅’ from the overall mean Θ³? This measures how much variance the model explains. OLS maximizes SSR.

RSS = β€” (β€”% of TSS) β€” Residual Sum of Squares
RSS = Ξ£eα΅’Β² β€” unexplained variance. TSS = SSR + RSS is mathematically exact.

RΒ² and fΒ² (Cohen's effect size)
RΒ² = SSR/TSS: proportion of explained variance. Note: RΒ² always increases with each additional predictor! Adjusted RΒ² corrects with a penalty term per predictor.
fΒ² = RΒ²/(1βˆ’RΒ²) = β€” β†’ β€” effect. Cohen (1988): .02 small Β· .15 medium Β· .35 large.

β‘ 
Linearity: The relationship between X and Y is linear. Violation: residual-vs-fitted plot shows a systematic pattern. Remedy: transformation or polynomial terms.
β‘‘
Independence: Observations are independent (no autocorrelation, no clustering). Violated in longitudinal or nested data β†’ mixed models required.
β‘’
Homoscedasticity: Variance of residuals is constant across all X values β€” no fan shape. Violation: heteroscedasticity-robust standard errors (HC3) or WLS.
β‘£
Normality of residuals: Only required for inference β€” not for OLS estimation itself (Gauss-Markov theorem!). At n β‰₯ 30 the central limit theorem applies.
β‘€
No perfect multicollinearity (multiple regression): If two predictors are perfectly correlated, (Xα΅€X) is not invertible. High collinearity inflates standard errors β†’ VIF > 10 critical. See Module β‘£.
Multiple Regression & Added Variable Plot
What does "controlling for Xβ‚‚" mean? Understanding partial slopes visually.
Step β‘  β€” Bivariate Model: Y ~ X₁
Step
Correlation X₁↔Xβ‚‚
Coefficient comparison
Model b₁ β₁ RΒ²
Bivariate β€” β€” β€”
Partial β€” β€” β€”
AVP-Slope
β€”
Ξ”b₁
β€”
Step β‘  shows the bivariate regression of Y on X₁ β€” as in Module β‘ . The bivariate b₁ still contains the influence of Xβ‚‚ when X₁ and Xβ‚‚ are correlated.
Tutorial β€” Multiple Regression & AVP

From line to plane
Bivariate (Y ~ X₁) the solution is a line in 2D space. With two predictors (Y ~ X₁ + Xβ‚‚) it becomes a regression surface (plane) in 3D space. b₁ is the slope in the X₁ direction, bβ‚‚ in the Xβ‚‚ direction β€” each holding the other predictor fixed.

Why multiple regression?
X₁ (study time) and Xβ‚‚ (sleep) correlate with r₁₂ = β€”. The bivariate model does not measure the pure X₁ effect β€” b₁ also contains the influence of Xβ‚‚. The multiple model holds Xβ‚‚ statistically constant.

Reading the coefficients (table left)
Bivariate b₁: slope from Y ~ X₁ alone β€” contains confounding by Xβ‚‚.
Partial b₁: slope from Y ~ X₁ + Xβ‚‚ β€” adjusted for Xβ‚‚.
Ξ”b₁: the difference quantifies the confounding. At r₁₂ = 0, Ξ”b₁ = 0.

Step β‘  β€” Bivariate starting point
What you see: The bivariate OLS line Y ~ X₁ (study time β†’ exam score) β€” identical to Module β‘ .
Tip: Note the bivariate b₁ (table left), then change r₁₂ and compare.

Confounding comparison
bivariate b₁ = β€” Β· partial b₁ = β€” Β· Ξ”b₁ = β€”
The larger r₁₂, the more the bivariate deviates from the partial b₁.

AVP Principle: An Added Variable Plot (partial regression plot) makes the partial effect of X₁ on Y visible β€” adjusted for Xβ‚‚. To do this, Xβ‚‚ is partialled out of both Y (e(Y|Xβ‚‚)) and X₁ (e(X₁|Xβ‚‚)). The slope of the regression line through the residual vectors equals exactly the partial coefficient b₁ from the multiple model.
Learning Cards β€” OLS & Multiple Regression
β‘  OLS Criterion
OLS minimizes the sum of squared residuals: min Ξ£(yα΅’ βˆ’ Ε·α΅’)Β². Squaring serves two purposes: signs are neutralized, and large residuals are penalized disproportionately more than small ones. The solution is unique and always yields Ξ£eα΅’ = 0.
β‘‘ Coefficient b₁
Interpretation: "Per +1 unit X, the expected value of Y increases by b₁ units β€” all other predictors held constant (ceteris paribus)." In the bivariate case: b₁ = Ξ£(xα΅’βˆ’xΜ„)(yα΅’βˆ’Θ³) / Ξ£(xα΅’βˆ’xΜ„)Β². The numerator is the covariance, the denominator the variance of X.
β‘’ Intercept bβ‚€
bβ‚€ is the expected Y value when all predictors equal zero. This is often not meaningful in context (e.g., 0 study hours, 0 sleep). bβ‚€ is needed for prediction but should usually not be interpreted substantively. Always: Θ³ = bβ‚€ + b₁·xΜ„.
β‘£ RΒ² and adj. RΒ²
RΒ² = SSR/SST ∈ [0, 1] β€” proportion of explained variance. RΒ² increases with every added predictor, even useless ones. Adjusted RΒ² corrects for this: adj.RΒ² = 1 βˆ’ (1βˆ’RΒ²)Β·(nβˆ’1)/(nβˆ’kβˆ’1). adj.RΒ² decreases when a new predictor explains less than expected by chance.
β‘€ Partial Slope
b₁ in the multiple model is not the same as in the bivariate model β€” it is the slope in the Added Variable Plot (AVP): regression of e(Y|Xβ‚‚) on e(X₁|Xβ‚‚). This value equals the partial slope: effect of X₁ on Y, after the shared portion of Xβ‚‚ has been removed from both.
β‘₯ Standardization Ξ²
Ξ² = b Β· (SD_X / SD_Y) β€” the standardized regression coefficient. Ξ² indicates how many standard deviations Y increases when X increases by one SD. Allows comparison of predictors with different scales, but only within one sample. Between studies, Ξ² values are not directly comparable due to different SDs.
? Help β€” OLS & Multiple Regression

What is OLS?

Ordinary Least Squares (OLS) is the standard method for estimating linear regression models. It finds those coefficients bβ‚€ and b₁ that minimize the sum of squared residuals: min Ξ£(yα΅’ βˆ’ Ε·α΅’)Β². The solution follows from the normal equations:

  • b₁ = Ξ£(xα΅’βˆ’xΜ„)(yα΅’βˆ’Θ³) / Ξ£(xα΅’βˆ’xΜ„)Β²
  • bβ‚€ = Θ³ βˆ’ b₁·xΜ„
β–Έ Matrix notation (for the curious)
In the multiple case with design matrix X (nΓ—(k+1), first column ones) and vector y:
b = (Xα΅€X)⁻¹ Xα΅€y
Prerequisite: (Xα΅€X) is invertible β†’ no perfect multicollinearity.

Coefficient interpretation

b₁ (unstandardized): "Per +1 unit X, ΕΆ increases by b₁ units, when all other predictors are held constant." The "ceteris paribus" is crucial β€” in the multiple model b₁ is a partial effect, not a marginal raw effect.

Ξ² (standardized): Comparison of predictors on different scales. Caution: Ξ² is sample-specific and must not be compared between studies.

RΒ² and model fit

RΒ² = SSR/SST = 1 βˆ’ RSS/SST. In the bivariate case RΒ² = rΒ². RΒ² increases with every predictor, even random variables (Freedman's Paradox). Adjusted RΒ² corrects for the number of predictors k:

adj.RΒ² = 1 βˆ’ (1βˆ’RΒ²)Β·(nβˆ’1)/(nβˆ’kβˆ’1)

Effect size fΒ²

fΒ² = RΒ²/(1βˆ’RΒ²) β€” effect size for multiple regression. Conventions (Cohen 1988): small β‰₯ .02, medium β‰₯ .15, large β‰₯ .35. Better than conventions: SESOI β€” define the smallest effect of substantive interest before the study.

When does OLS fail?

  • Non-linearity β€” residual-vs-fitted plot shows a pattern
  • Heteroscedasticity β€” fan shape in residual plot
  • Outliers/leverage β€” individual points determine the line (β†’ Anscombe)
  • Multicollinearity β€” high r(X₁,Xβ‚‚) inflates standard errors; VIF > 10 critical
  • Measurement error β€” attenuation of b₁ (coefficients biased toward zero)

Related Tools in Bayes Thinking Lab