Prior Lab — Bayes Thinking Lab

⟳ Credible Interval → Calculate Parameters

Lower bound

Upper bound

Mass (%)

      The core problem when choosing priors: As a researcher you often have a substantive intuition — e.g. "95% of plausible effects lie between −2 and +5" — but don't know which parameter values correspond to that intuition. For the Normal distribution this is directly computable (μ = midpoint, σ from quantile function). For all other distributions no closed-form formula is available. The solver above finds the parameters numerically.
    

brms Prior Syntax

What is a Prior?

A prior P(θ) is a probability distribution over possible parameter values, before you see the data. It encodes your prior knowledge.

Too vague (e.g. Uniform): you make no assumptions, but the sampler must explore huge spaces → slow, often divergent.

Too tight: you impede learning from the data. The posterior ≈ prior, regardless of the data.

Weakly informative → the golden middle ground: roughly excludes impossible values, but leaves ample room for learning.

Which distribution to choose?

Unconstrained parameters (α, β): NormalStudent-tCauchy

Positive parameters (σ, τ, λ): ExponentialHalf-NormalHalf-tGammaLog-Normal

Proportions / probabilities: Beta

Correlation matrices: LKJ

Rule of thumb: Student-t(3,0,σ) as a robust alternative to Normal, as heavier tails tolerate outliers in the prior.

Gelman's Recommendations

For z-standardised variables (SD=1):
· Intercept α: Normal(0, 2.5)
· Slope β: Normal(0, 1)
· Dispersion σ: Exponential(1) or Half-Normal(0,1)

For raw scales you need to rescale:
σ_prior ≈ 2–3 × SD(y) / SD(x)

Example reaction times (M≈400ms, SD≈80ms):
α: Normal(400, 100), β per SD(x): Normal(0, 160)

Common Pitfalls

Prior scale not matched to the outcome: Normal(0,1) for a slope on raw scales (e.g. kg, €, ms) implies +1 unit x → ±1 unit y — depending on the scale, absurdly tight or far too wide. Always keep the metric of the outcome in mind.

σ as Normal(0,x) in brms: This is correct — brms automatically applies lb=0 (truncation). No special handling needed, no half_normal().

ν for Student-t too small: ν=1 equals Cauchy (no E[x]), ν=2 has no finite variance. Recommended: ν ≥ 3, typically 3–7 for robustness.

Forgetting a Beta prior for probabilities: zi, zoi, coi parameters in brms are on [0,1] — Beta(1,1) is Uniform, Beta(2,8) says: mostly little zero-inflation.

Comparison Table — Priors for σ and τ

Distribution	brms syntax	E[σ]	Heavy tails	When?
Exponential(1)	`exponential(1)`	1.0	✗	brms default; weakly informative
Half-Normal(1)	`normal(0,1)`	0.80	✗	When σ < 2 is expected
Half-Student-t(3,0,1)	`student_t(3,0,1)`	~0.90	✓	Compromise; good default choice
Half-Cauchy(1)	`cauchy(0,1)`	—	✓✓	Very vague; large σ values possible
Gamma(2, 0.1)	`gamma(2,0.1)`	20	✗	brms default for ν in Student-t

GLM Prior — Help

The Core Problem

Priors in brms must be specified on the model scale (logit or log) — but you think in probabilities and rates. These scales are non-linear: Normal(0, 10) on the logit scale sounds vague, but pushes almost all prior mass to the extremes p≈0% and p≈100%.

The solver bridges this gap: you specify your substantive bounds on the outcome scale — the solver returns the matching prior parameters on the model scale, ready to use in brms.

The two plots visualise exactly this relationship:

Top plot — Model scale: The prior distribution as it must be entered in brms (logit or log scale). Not intuitively readable, but technically correct.

Bottom plot — Response scale: What this prior implies on the outcome scale (probability, expected value, OR/RR). This is what we actually care about — but cannot be directly specified in brms. Visual reference only.

Why exp(β) = Multiplier?

Log-link (Poisson/Gamma):
log(μ) = α + β·X → μ = exp(α) · exp(β·X)

Comparing two groups (X: 0 → 1):

X=0: μ₀ = exp(α)
X=1: μ₁ = exp(α + β) = exp(α) · exp(β)
Rate Ratio = μ₁ / μ₀ = exp(β)

A β of e.g. 0.69 means: exp(0.69) ≈ 2 → rate doubles. Adding β on the log scale = multiplying μ by exp(β). That is why we think about the prior for β in terms of ratios, not differences.

Logit-link (Bernoulli/Binomial) — exactly the same structure, just with odds instead of rates:
logit(P) = log(P/(1−P)) = α + β·X

X=0: odds₀ = exp(α)
X=1: odds₁ = exp(α + β) = exp(α) · exp(β)
Odds Ratio = odds₁ / odds₀ = exp(β)

Intercept — Baseline Level

The intercept determines the baseline level when all predictors equal 0.

Logit-link: sigmoid(α) = baseline probability

"Without therapy, 10–40% of patients recover"
→ Input: 0.10 to 0.40
→ logit(0.10) ≈ −2.20 | logit(0.40) ≈ −0.41
→ Solver: Normal(−1.30, 0.46) on logit scale
μ < 0, because the baseline is below 50% ✓

Log-link: exp(α) = baseline expected value

"Baseline: 3–15 hospital admissions per year"
→ Input: 3 to 15
→ log(3) ≈ 1.10 | log(15) ≈ 2.71
→ Solver: Normal(1.90, 0.41) on log scale
μ ≈ 1.90, because exp(1.90) ≈ 6.7 — midpoint of 3–15 ✓

Slope / β — Effect per Unit of X

Logit-link → Odds Ratio (OR) = exp(β):

"Therapy might slightly reduce or strongly increase odds
— OR between 0.7 and 4.0 is plausible"
→ Input: OR 0.70 to 4.0
→ log(0.70) ≈ −0.36 | log(4.0) ≈ 1.39
→ Solver: Normal(0.52, 0.44) on logit scale
μ > 0, because the prior leans slightly toward positive effects ✓

Log-link → Rate Ratio (RR) = exp(β):

"Treatment increases event rate by 50%–5-fold"
→ Input: RR 1.5 to 5.0
→ log(1.5) ≈ 0.41 | log(5.0) ≈ 1.61
→ Solver: Normal(1.01, 0.31) on log scale
μ > 0, because the prior expects only positive effects ✓

Rule of Thumb for Slopes

      Normal(0, 0.5) → 95%-CrI OR/RR ≈ 0.37–2.7  (moderate)

      Normal(0, 1.0) → 95%-CrI OR/RR ≈ 0.14–7.4  (large)

      Normal(0, 2.0) → 95%-CrI OR/RR ≈ 0.02–55   (very vague)

Note on μ = 0: Only when the ratio bounds are symmetric around 1 (product = 1, e.g. 0.5 × 2.0 = 1), the solver necessarily returns μ = 0. This is the classic "no-effect prior". Asymmetric bounds — as in the examples above — yield μ ≠ 0 and encode substantive expectations.

ℹ Prior Lab — Help

What will I learn here?

The Prior Lab shows all important prior distributions for brms — interactively, with CI-solver and direct brms syntax.

Which distribution fits which parameter?
How to translate substantive intuitions (e.g. 95%-CrI) into parameter values?
What is the brms syntax for the chosen prior?
What are Gelman's recommendations for standard situations?

Workflow

Choose a distribution (left): pay attention to category — unbounded / positive / [0,1].
Use the CI-solver: enter substantive bounds (e.g. "95% between −2 and +5") → parameters are calculated automatically and transferred to the sliders.
Copy brms syntax: the code below updates live and can be pasted directly into R.

Prior Types

Vague (uniform, Cauchy): almost no prior assumptions, but sampler sluggish and often divergent.
Weakly informative: roughly excludes impossible values, leaves ample room for learning — the recommended approach.
Strongly informative: only when genuine prior knowledge is available (e.g. from meta-analysis).
Typical defaults for common parameters can be found in the "Load & explore brms defaults" section in the sidebar — click to load parameters directly.

Common Decisions

Intercept α: Student-t(3, 0, 2.5) — brms default
Slope β (z-std.): Normal(0, 1) — Gelman
Dispersion σ / τ: Exponential(1) or Half-Normal(0,1)
Probability / Prop.: Beta(1,1) to Beta(2,8)
Correlation: LKJ(2) — regularised around 0
Reaction times: Log-Normal(μ, σ) on log scale

Why this matters for Bayes

The prior is not a nuisance — it is part of the model. Posterior ∝ Likelihood × Prior. A prior that is too tight dominates the model, one that is too vague slows down the sampler. The CI-solver helps justify prior parameters substantively rather than choosing them arbitrarily.

Next → Prior Predictive Check: verify whether the chosen priors imply sensible predictions — before you see the data