From LM to GLM — Bayes Thinking Lab

Three Scenarios

OLS regression assumes normally distributed errors (ε). Three scenarios from body image research show what happens when you force an LM anyway — and how a GLM with an appropriate distribution resolves the situation.

①Therapy outcome CBT — 0/1 → Bernoulli ②Body checking — count data → Poisson ③Reaction time Stroop — positively skewed → Gamma

Problem

①LM — Normal distribution (wrong)

→

②GLM — appropriate distribution (right)

→

③Fit comparison: AIC & residuals

LM with Normal distribution ⚠ Problematic

Log-Likelihood (LM)—

AIC (LM)—

GLM with appropriate distribution ✓ Better

Log-Likelihood (GLM)—

AIC (GLM)—

Residual distribution & AIC comparison

ΔAIC (LM − GLM)—

■ LM-AIC —

■ GLM-AIC —

          RMSE (always comparable — in y-units; smaller = better)
        

■ LM-RMSE —

■ GLM-RMSE —

Sample size n 50

Link Function What the link function does — visually

        Linear predictor η = a + b·x

        Value range: −∞ to +∞

→

        E[y] = g⁻¹(η)

        Value range: restricted

What happens when you choose the wrong likelihood?

The LM minimises RSS — which is equivalent to maximum-likelihood estimation under a normal distribution assumption. When this assumption is violated, the estimates can still be computed, but:

• Predictions fall outside the valid range (p < 0 or p > 1)
• The log-likelihood is suboptimal — AIC/BIC worse
• Residuals are systematically non-normal
• Inference (confidence intervals, tests) is biased

What is a link function — and why do you need one?

The problem: The linear predictor η = a + b·x can take any value (−∞ to +∞). But many outcome variables have a restricted range: probabilities lie in (0,1), count rates must be positive.

The solution: A link function transforms the expected value into a range where linear modelling makes sense:

Logit link (Bernoulli): log(p/(1-p)) = η
The logit maps p ∈ (0,1) to (−∞,+∞). Inversely: every η gives a p = 1/(1+e^(−η)) ∈ (0,1). The S-curve in the plot.

Log link (Poisson, Gamma): log(λ) = η
Logarithm maps λ > 0 to (−∞,+∞). Inversely: λ = e^η is always positive. The exponential curve in the plot.

Identity link (Normal): E[y] = η
No transformation — the LM is a GLM special case with normal distribution and identity link.

How to choose the right family — with examples

Look at the nature of the outcome variable:

Bernoulli (Scenario 1): Did someone achieve a clinically significant improvement in body image disorder after CBT? (0=no, 1=yes). Predictor: therapeutic alliance (WAI-S, z-transformed). Generally: binary outcomes, diagnoses, decisions.

Poisson (Scenario 2): How many times per day does someone inspect negatively rated body parts in the mirror? (Body checking, 0, 1, 2, …). Generally: count data, event frequencies. Note: with overdispersion → Negative Binomial.

Gamma (Scenario 3): How long (ms) does someone take in an emotional Stroop test with body-related words? Generally: reaction times, waiting times, costs — always positive, right-skewed.

Normal/LM: z-standardised questionnaire scores, IQ, continuous symmetric measures with no hard natural zero boundary.

Check residuals via Posterior Predictive Check — AIC/BIC for comparing alternative families on the same data.

ΔAIC as a decision guide

AIC penalises poor fit and number of parameters: AIC = −2ℓ̂ + 2k

Rule of thumb for ΔAIC = AIC(LM) − AIC(GLM):
• ΔAIC < 2: barely any difference
• ΔAIC 2–6: moderate advantage for GLM
• ΔAIC > 10: strong advantage, LM clearly worse

Here you see the ΔAIC live. With binary data and count data the advantage of the GLM is typically large — with continuous, symmetric data the LM may be sufficient.

ℹ From LM to GLM — Help

What will I learn here?

This tool shows why a linear model (LM) is structurally wrong for certain data types — and how a GLM with an appropriate distribution solves the problem.

What happens when you choose the wrong likelihood?
What is a link function — and what is it needed for?
How do you choose the right distribution family?

The three scenarios

Binary data (0/1): The LM can predict probabilities < 0 or > 1. GLM with Bernoulli + logit link keeps predictions in (0,1).
Count data (0, 1, 2, …): The LM can produce negative predictions. GLM with Poisson + log link guarantees positive expected values.
Positive, skewed data: The normal distribution assumption of the LM does not fit. Gamma-GLM with log link correctly models the right skew.

Link function — briefly explained

The linear predictor η = a + b·x can take any value (−∞ to +∞). But many outcomes have a restricted range:

Logit link (Bernoulli): log(p/(1-p)) = η → every η gives a p ∈ (0,1)
Log link (Poisson, Gamma): log(λ) = η → λ = e^η is always positive
Identity (Normal): E[y] = η — the classical LM

Reading the AIC comparison

ΔAIC = AIC(LM) − AIC(GLM) — the larger, the worse the LM.
Rules of thumb: < 2 = barely any difference · 2–10 = moderate advantage · > 10 = LM clearly unsuitable.

Important: Both models are evaluated with the same likelihood — a fair comparison.

Why this matters for Bayes

The choice of likelihood is just as central in Bayes as in MLE. A Bayesian model consists of Likelihood × Prior — if you choose the wrong likelihood (e.g. Normal for 0/1 data), the model is structurally wrong, regardless of how good the prior is.

Next → GLM Conditional Distributions: how GLMs model a separate distribution for each x-value