Too vague (e.g. Uniform): you make no assumptions, but the sampler must explore huge spaces → slow, often divergent.
Too tight: you impede learning from the data. The posterior ≈ prior, regardless of the data.
Weakly informative → the golden middle ground: roughly excludes impossible values, but leaves ample room for learning.
Positive parameters (σ, τ, λ): ExponentialHalf-NormalHalf-tGammaLog-Normal
Proportions / probabilities: Beta
Correlation matrices: LKJ
Rule of thumb: Student-t(3,0,σ) as a robust alternative to Normal, as heavier tails tolerate outliers in the prior.
· Intercept α: Normal(0, 2.5)
· Slope β: Normal(0, 1)
· Dispersion σ: Exponential(1) or Half-Normal(0,1)
For raw scales you need to rescale:
σ_prior ≈ 2–3 × SD(y) / SD(x)
Example reaction times (M≈400ms, SD≈80ms):
α: Normal(400, 100), β per SD(x): Normal(0, 160)
σ as Normal(0,x) in brms: This is correct — brms automatically applies lb=0 (truncation). No special handling needed, no
half_normal().ν for Student-t too small: ν=1 equals Cauchy (no E[x]), ν=2 has no finite variance. Recommended: ν ≥ 3, typically 3–7 for robustness.
Forgetting a Beta prior for probabilities: zi, zoi, coi parameters in brms are on [0,1] — Beta(1,1) is Uniform, Beta(2,8) says: mostly little zero-inflation.
| Distribution | brms syntax | E[σ] | Heavy tails | When? |
|---|---|---|---|---|
| Exponential(1) | exponential(1) | 1.0 | ✗ | brms default; weakly informative |
| Half-Normal(1) | normal(0,1) | 0.80 | ✗ | When σ < 2 is expected |
| Half-Student-t(3,0,1) | student_t(3,0,1) | ~0.90 | ✓ | Compromise; good default choice |
| Half-Cauchy(1) | cauchy(0,1) | — | ✓✓ | Very vague; large σ values possible |
| Gamma(2, 0.1) | gamma(2,0.1) | 20 | ✗ | brms default for ν in Student-t |