From LM to GLM
Wrong Likelihood · Compare fit · Link function · Distribution choice
© Dr. Rainer Düsing · Interactive Tools by Claude
Three Scenarios
OLS regression assumes normally distributed errors (ε). Three scenarios from body image research show what happens when you force an LM anyway — and how a GLM with an appropriate distribution resolves the situation.
Therapy outcome CBT — 0/1 → Bernoulli Body checking — count data → Poisson Reaction time Stroop — positively skewed → Gamma
Problem
LM — Normal distribution (wrong)
GLM — appropriate distribution (right)
Fit comparison: AIC & residuals
LM with Normal distribution ⚠ Problematic
Log-Likelihood (LM)
AIC (LM)
GLM with appropriate distribution ✓ Better
Log-Likelihood (GLM)
AIC (GLM)
Residual distribution & AIC comparison
ΔAIC (LM − GLM)
■ LM-AIC
■ GLM-AIC
RMSE (always comparable — in y-units; smaller = better)
■ LM-RMSE
■ GLM-RMSE
Sample size n 50
Link Function What the link function does — visually
Linear predictor η = a + b·x
Value range: −∞ to +∞
Concepts What you learn here
What happens when you choose the wrong likelihood?
The LM minimises RSS — which is equivalent to maximum-likelihood estimation under a normal distribution assumption. When this assumption is violated, the estimates can still be computed, but:

• Predictions fall outside the valid range (p < 0 or p > 1)
• The log-likelihood is suboptimal — AIC/BIC worse
• Residuals are systematically non-normal
• Inference (confidence intervals, tests) is biased
What is a link function — and why do you need one?
The problem: The linear predictor η = a + b·x can take any value (−∞ to +∞). But many outcome variables have a restricted range: probabilities lie in (0,1), count rates must be positive.

The solution: A link function transforms the expected value into a range where linear modelling makes sense:

Logit link (Bernoulli): log(p/(1-p)) = η
The logit maps p ∈ (0,1) to (−∞,+∞). Inversely: every η gives a p = 1/(1+e^(−η)) ∈ (0,1). The S-curve in the plot.

Log link (Poisson, Gamma): log(λ) = η
Logarithm maps λ > 0 to (−∞,+∞). Inversely: λ = e^η is always positive. The exponential curve in the plot.

Identity link (Normal): E[y] = η
No transformation — the LM is a GLM special case with normal distribution and identity link.
How to choose the right family — with examples
Look at the nature of the outcome variable:

Bernoulli (Scenario 1): Did someone achieve a clinically significant improvement in body image disorder after CBT? (0=no, 1=yes). Predictor: therapeutic alliance (WAI-S, z-transformed). Generally: binary outcomes, diagnoses, decisions.

Poisson (Scenario 2): How many times per day does someone inspect negatively rated body parts in the mirror? (Body checking, 0, 1, 2, …). Generally: count data, event frequencies. Note: with overdispersion → Negative Binomial.

Gamma (Scenario 3): How long (ms) does someone take in an emotional Stroop test with body-related words? Generally: reaction times, waiting times, costs — always positive, right-skewed.

Normal/LM: z-standardised questionnaire scores, IQ, continuous symmetric measures with no hard natural zero boundary.

Check residuals via Posterior Predictive Check — AIC/BIC for comparing alternative families on the same data.
ΔAIC as a decision guide
AIC penalises poor fit and number of parameters: AIC = −2ℓ̂ + 2k

Rule of thumb for ΔAIC = AIC(LM) − AIC(GLM):
• ΔAIC < 2: barely any difference
• ΔAIC 2–6: moderate advantage for GLM
• ΔAIC > 10: strong advantage, LM clearly worse

Here you see the ΔAIC live. With binary data and count data the advantage of the GLM is typically large — with continuous, symmetric data the LM may be sufficient.
ℹ From LM to GLM — Help
What will I learn here?
This tool shows why a linear model (LM) is structurally wrong for certain data types — and how a GLM with an appropriate distribution solves the problem.
The three scenarios
Link function — briefly explained
The linear predictor η = a + b·x can take any value (−∞ to +∞). But many outcomes have a restricted range:

Logit link (Bernoulli): log(p/(1-p)) = η → every η gives a p ∈ (0,1)
Log link (Poisson, Gamma): log(λ) = η → λ = e^η is always positive
Identity (Normal): E[y] = η — the classical LM
Reading the AIC comparison
ΔAIC = AIC(LM) − AIC(GLM) — the larger, the worse the LM.
Rules of thumb: < 2 = barely any difference · 2–10 = moderate advantage · > 10 = LM clearly unsuitable.

Important: Both models are evaluated with the same likelihood — a fair comparison.
Why this matters for Bayes
The choice of likelihood is just as central in Bayes as in MLE. A Bayesian model consists of Likelihood × Prior — if you choose the wrong likelihood (e.g. Normal for 0/1 data), the model is structurally wrong, regardless of how good the prior is.
Next → GLM Conditional Distributions: how GLMs model a separate distribution for each x-value