Bayesian Regression — Interactive

True Parameters

α · Intercept0.00

β · Slope1.00

σ · Residual SD1.00

n · Observations30

Priors & Hyperparameters

α ~ N(μ_α, σ_α)
β ~ N(μ_β, σ_β)
σ ~ HalfNormal(0, s_σ)

μ_α · Prior Mean α0.00

σ_α · Prior SD α1.00

μ_β · Prior Mean β0.00

σ_β · Prior SD β1.00

s_σ · HalfNormal Scale1.00

Likelihood Function

Explanations & Background

What are Priors? Priors encode our knowledge before observing the data. A prior N(0, 1) for β means: we expect a slope near 0, values beyond ±2 are unlikely. A wide prior (large σ) is weakly informative.

Kruschke Diagram Visualises the generative model structure (Kruschke 2014). Arrows show stochastic dependencies: Hyperpriors → Priors → Likelihood → Data.

Prior Predictive Check (McElreath 2020, Ch. 4) Samples from the prior are simulated and shown as regression lines. A good prior produces plausible — not arbitrary — predictions.

Prior → Posterior Update P(θ|y) ∝ P(y|θ) · P(θ). The posterior combines prior knowledge with data information. With small n the prior dominates; with large n the likelihood does.

Marginal Distributions Show how each parameter α, β, σ shifts from prior to posterior. A narrow posterior → high certainty from the data.

MCMC Metropolis-Hastings Samples (β, σ) jointly. σ is proposed on the log scale (guarantees positivity). The heatmap shows log P(β,σ|y). Trace plots show convergence; histograms show the marginal posterior after burn-in.

95% CI Mean vs. 95% PPI New Observation The Posterior Predictive Lines (50 samples from the posterior) fade back when a band is active — the band is their formal summary.

95% CI Mean (green, narrow): Shows uncertainty about where the regression line lies. Contains only parameter uncertainty (σ_α and σ_β). McElreath: link().
Width ∝ √(σ_α² + x²·σ_β²) — narrowest at x̄, wider at the ends.

95% PPI New Obs. (blue, dashed, wider): Shows where a new data point y_new would fall. Also includes residual scatter σ. McElreath: sim().
Width ∝ √(σ_α² + x²·σ_β² + σ²)

Key difference: With small σ (little noise) CI and PPI are close together. With large σ the PPI is much wider than the CI — because even if we know the line perfectly, new observations scatter by σ.

STEP 1 OF 7

MODEL STRUCTURE

Kruschke Diagram

Arrows = stochastic dependence ↓ · ■ Hyperpriors ■ Priors ■ Likelihood ■ Data

PRIOR PREDICTIVE CHECK

What does the model say before the data? (McElreath approach)

60 Prior Lines

Each line = α~N(μ_α,σ_α), β~N(μ_β,σ_β).
Large σ_α or σ_β → many plausible worlds.

PRIOR → POSTERIOR UPDATE

How the data update the prior

Prior Pred. Post. Pred. Post. Median True Line

95% CI Mean 95% PPI New Obs.

Intercept α Prior Post.

—

Slope β Prior Post.

—

Residual SD σ Prior Post.

—

P(θ|y) ∝ P(y|θ)·P(θ)
Narrow posterior curve = more certainty.
Prior curve shifts with a tight prior.

MCMC — METROPOLIS-HASTINGS

How the sampler explores the posterior space

            Heatmap = log P(β,σ|y)  · 
            ● accept.  
            ● reject.  
            ✕ true  
            ● current
          

Iter: 0 Acceptance: — Speed:15/s

            Trace Plot β  — true
          

            Histogram β  (post burn-in)
          

            Trace Plot σ  — true
          

            Histogram σ  (post burn-in)
          

Metropolis Step:
1. Propose β*~N(β,0.15²), log σ*~N(log σ,0.12²)
2. r = P(β*,σ*|y) / P(β,σ|y)
3. Accept if U(0,1) < min(1,r)

🎓 Tutorial — Bayesian Regression

The Example

You are analysing data from n = 30 psychology students. Predictor x: average sleep duration (z-scored). Outcome y: perceived stress (PSS — Perceived Stress Scale, Cohen et al. 1983, z-scored).

Hypothesis: more sleep → less stress (β < 0). The true effect is β = −0.8.

What you will learn

In 7 guided steps you experience the complete Bayes cycle:

① Setup — configure parameters, explore the dataset
② Prior Predictive Check — plausible slopes a priori
③ Prior → Posterior Update — Bayesian learning
④ Effect of sample size n
⑤ Outlier influence under Normal likelihood
⑥ Robustness via Student-t likelihood
⑦ MCMC sampler — joint posterior β × σ

How it works

Each step tells you what to do and what to observe. The active panel is outlined in colour and the relevant step card is shown bottom-right.

Use ⚙ Apply Values to automatically set all sliders to the recommended values. You can also explore freely — the tutorial only provides guidance.

The Steps panel appears as a green box in the bottom-right corner — always visible, no scrolling needed.

ℹ Bayesian Regression — Help

What will I learn here?

The complete Bayes cycle for linear regression: Prior → Likelihood → Posterior. You control the true parameters and the prior assumptions — and immediately see how both shape the posterior.

The Model

α ~ N(μ_α, σ_α) · β ~ N(μ_β, σ_β) · σ ~ HalfNormal(s_σ)
y ~ N(α + β·x, σ) — the likelihood

Left sidebar: true parameters (simulate the data) · priors (your assumptions about α, β, σ)

The Five Panels

Kruschke Diagram — generative model structure: Hyperpriors → Priors → Likelihood → Data
Prior Predictive — what do regression lines look like before we see data? (McElreath Ch. 4)
Prior vs. Posterior — how do the data shift our beliefs about the regression line?
Marginal Distributions — α, β, σ: Prior → Posterior in direct comparison
MCMC Sampler — joint sampling of (β, σ) with Metropolis-Hastings; heatmap + trace plots

CI vs. Prediction Interval

95% CI Mean (green, narrow): Uncertainty about the location of the regression line — contains only parameter uncertainty.

95% PPI New Observation (dashed, wider): where will a new data point fall? Also includes residual scatter σ.

With small σ the CI and PPI are close — with large σ the PPI is much wider.

Gaussian vs. Outlier-robust

Gaussian: y ~ N(α + β·x, σ) — standard
Outliers: y ~ t(ν, α + β·x, σ) — heavier tails, more robust against individual extreme values

Next → Bayesian PP Check: Posterior Predictive Checks for model diagnostics