From LM to GLM
Wrong Likelihood Β· Compare fit Β· Link function Β· Distribution choice
Β© Dr. Rainer DΓΌsing Β· Interactive Tools by Claude
Three Scenarios
OLS regression assumes normally distributed errors (Ξ΅). Three scenarios from body image research show what happens when you force an LM anyway β€” and how a GLM with an appropriate distribution resolves the situation.
β‘ Therapy outcome CBT β€” 0/1 β†’ Bernoulli β‘‘Body checking β€” count data β†’ Poisson β‘’Reaction time Stroop β€” positively skewed β†’ Gamma
Problem
β‘ LM β€” Normal distribution (wrong)
β†’
β‘‘GLM β€” appropriate distribution (right)
β†’
β‘’Fit comparison: AIC & residuals
LM with Normal distribution ⚠ Problematic
Log-Likelihood (LM)β€”
AIC (LM)β€”
GLM with appropriate distribution βœ“ Better
Log-Likelihood (GLM)β€”
AIC (GLM)β€”
Residual distribution & AIC comparison
Ξ”AIC (LM βˆ’ GLM)β€”
β–  LM-AIC β€”
β–  GLM-AIC β€”
RMSE (always comparable β€” in y-units; smaller = better)
β–  LM-RMSE β€”
β–  GLM-RMSE β€”
Sample size n 50
Link Function What the link function does β€” visually
Linear predictor Ξ· = a + bΒ·x
Value range: βˆ’βˆž to +∞
β†’
Concepts What you learn here
What happens when you choose the wrong likelihood?
The LM minimises RSS β€” which is equivalent to maximum-likelihood estimation under a normal distribution assumption. When this assumption is violated, the estimates can still be computed, but:

β€’ Predictions fall outside the valid range (p < 0 or p > 1)
β€’ The log-likelihood is suboptimal β€” AIC/BIC worse
β€’ Residuals are systematically non-normal
β€’ Inference (confidence intervals, tests) is biased
What is a link function β€” and why do you need one?
The problem: The linear predictor Ξ· = a + bΒ·x can take any value (βˆ’βˆž to +∞). But many outcome variables have a restricted range: probabilities lie in (0,1), count rates must be positive.

The solution: A link function transforms the expected value into a range where linear modelling makes sense:

Logit link (Bernoulli): log(p/(1-p)) = Ξ·
The logit maps p ∈ (0,1) to (βˆ’βˆž,+∞). Inversely: every Ξ· gives a p = 1/(1+e^(βˆ’Ξ·)) ∈ (0,1). The S-curve in the plot.

Log link (Poisson, Gamma): log(Ξ») = Ξ·
Logarithm maps Ξ» > 0 to (βˆ’βˆž,+∞). Inversely: Ξ» = e^Ξ· is always positive. The exponential curve in the plot.

Identity link (Normal): E[y] = Ξ·
No transformation β€” the LM is a GLM special case with normal distribution and identity link.
How to choose the right family β€” with examples
Look at the nature of the outcome variable:

Bernoulli (Scenario 1): Did someone achieve a clinically significant improvement in body image disorder after CBT? (0=no, 1=yes). Predictor: therapeutic alliance (WAI-S, z-transformed). Generally: binary outcomes, diagnoses, decisions.

Poisson (Scenario 2): How many times per day does someone inspect negatively rated body parts in the mirror? (Body checking, 0, 1, 2, …). Generally: count data, event frequencies. Note: with overdispersion β†’ Negative Binomial.

Gamma (Scenario 3): How long (ms) does someone take in an emotional Stroop test with body-related words? Generally: reaction times, waiting times, costs β€” always positive, right-skewed.

Normal/LM: z-standardised questionnaire scores, IQ, continuous symmetric measures with no hard natural zero boundary.

Check residuals via Posterior Predictive Check β€” AIC/BIC for comparing alternative families on the same data.
Ξ”AIC as a decision guide
AIC penalises poor fit and number of parameters: AIC = βˆ’2β„“Μ‚ + 2k

Rule of thumb for Ξ”AIC = AIC(LM) βˆ’ AIC(GLM):
β€’ Ξ”AIC < 2: barely any difference
β€’ Ξ”AIC 2–6: moderate advantage for GLM
β€’ Ξ”AIC > 10: strong advantage, LM clearly worse

Here you see the Ξ”AIC live. With binary data and count data the advantage of the GLM is typically large β€” with continuous, symmetric data the LM may be sufficient.
β„Ή From LM to GLM β€” Help
What will I learn here?
This tool shows why a linear model (LM) is structurally wrong for certain data types β€” and how a GLM with an appropriate distribution solves the problem.
The three scenarios
Link function β€” briefly explained
The linear predictor Ξ· = a + bΒ·x can take any value (βˆ’βˆž to +∞). But many outcomes have a restricted range:

Logit link (Bernoulli): log(p/(1-p)) = Ξ· β†’ every Ξ· gives a p ∈ (0,1)
Log link (Poisson, Gamma): log(Ξ») = Ξ· β†’ Ξ» = e^Ξ· is always positive
Identity (Normal): E[y] = Ξ· β€” the classical LM
Reading the AIC comparison
Ξ”AIC = AIC(LM) βˆ’ AIC(GLM) β€” the larger, the worse the LM.
Rules of thumb: < 2 = barely any difference Β· 2–10 = moderate advantage Β· > 10 = LM clearly unsuitable.

Important: Both models are evaluated with the same likelihood β€” a fair comparison.
Why this matters for Bayes
The choice of likelihood is just as central in Bayes as in MLE. A Bayesian model consists of Likelihood Γ— Prior β€” if you choose the wrong likelihood (e.g. Normal for 0/1 data), the model is structurally wrong, regardless of how good the prior is.
Next β†’ GLM Conditional Distributions: how GLMs model a separate distribution for each x-value