Causal Calculator
G-Computation · ATE · ATT · ATU · Counterfactuals
© Dr. Rainer Düsing · Interactive Tools by Claude
InteractiveCausal Inference G-Computationmarginaleffects
The Problem — and the Solution
Scenario: L = students’ prior knowledge  ·  A = voluntary tutorial attendance  ·  Y = exam grade
Students with more prior knowledge (high L) attend the tutorial more often (L→A) and achieve better grades anyway (L→Y). This creates a spurious correlation: tutorial participants perform better — but not only because of the tutorial, but because they already knew more.

Why does the naive group comparison mislead? Students who attend the tutorial (A=1) already have more prior knowledge on average — better grades would partly come “on their own”. The naive mean comparison E[Y|A=1] − E[Y|A=0] mixes the true tutorial effect with the prior-knowledge advantage: Naive estimate = Causal effect + Confounding bias. Only in a randomised experiment (RCT) would both groups be comparable — in observational studies we need G-Computation.

G-Computation asks for each person the counterfactual question: “What would the grade be with the same prior knowledge — once with, once without tutorial?” The averaged difference is the causal effect — adjusted for prior knowledge.
Current in tool
Naive:
G-Comp:
True:
Bias:
Estimand — For which population?
Average Treatment Effect — Average effect in the total population. As if every person had been randomly assigned to treatment.
① Why does the simple group comparison mislead?
② What does G-Computation estimate for the causal effect?
G-Computation (dark=50% HDI, light=95% HDI) True Naive
ⓘ Browser approximation via Normal-Normal conjugacy — not MCMC-based; serves as a conceptual illustration of G-Computation.
Bias Anatomy — Why does the naive estimator mislead?
Arrow width ∝ |path strength| · Orange = backdoor path · Green = causal path
Decomposition of the naive estimator
β̂naiv = θ  +  γY · δA∼L
Number line (ATE)
θ (ATE) Naive
OVB =  (γY = × δA∼L = )
③ How does it work? — Counterfactual Predictions
Ŷ(1) — under treatment Ŷ(0) — without treatment Raw data
Each person receives a model-based prediction under A=1 (●) and A=0 (●). The vertical distance = individual treatment effect. G-Computation averages these gaps across the target population (depending on the estimand).
④ Does the treatment have the same effect for everyone?
Treated (A=1) Control (A=0) Mean effect (ATE)
Each point = individual causal effect Ŷ(1)−Ŷ(0) by confounder L. Without effect heterogeneity (ξ=0): all points on a line. With ξ≠0: slope visible — persons with higher L benefit more. Coloured points show the target population of the chosen estimand.
G-Computation
95% HDI: —
True value (DGP)
Data-generating process
Naive (uncorrected)
Bias: —
R Code (brms + marginaleffects)

      
    
Methodological Background

DGP of this tool.
All data are simulated according to the following data-generating process:
L ~ Normal(0,1) — confounder (e.g. health status)
A | L ~ Bernoulli(σ(γ_A·L)) — treatment depends on L (via γ_A)
Y | A, L ~ Normal((θ + ξ·L)·A + γ_Y·L, 1) — outcome
θ is the true causal effect of A on Y. γ_A controls how strongly L drives treatment selection; γ_Y controls how strongly L affects the outcome. In real data θ, γ_A and γ_Y are unknown; here we can set all three.

G-Computation: fully Bayesian.
The outcome model Y ~ A * L is fit in a Bayesian framework: flat priors, then K samples from the posterior. For each sample we compute for each person "What would Y be under A=1 vs. A=0?" and average the difference across the target population. The result is a proper posterior distribution of the estimand.

Why different populations for ATE, ATT, ATU?
The outcome model applies to everyone, but the question of for whom we average defines the estimand:
ATE: Ŷ(1)−Ŷ(0) averaged over all n persons → "What would the effect be in the total population?" In code: avg_comparisons(fit_gc, variables = "A")
ATT: only the n₁ actually treated (A=1) → "Did the treatment help those who received it?" Their L-values are systematically higher (since treatment correlates with L). In code: newdata = subset(dat, A==1)
ATU: only the n₀ untreated (A=0) → "What would happen if we extended treatment to them?" In code: newdata = subset(dat, A==0)
Without effect heterogeneity (ξ=0) all three values are equal. With ξ≠0 they diverge — because the subpopulations have different L-distributions and the effect depends on L.

Browser approximation.
For illustration this tool uses Normal-Normal conjugacy: with a Normal likelihood and Normal prior the posterior is analytically tractable, no MCMC needed. The R code shows brms with full MCMC — which is the recommended approach for your own analyses.

Effect heterogeneity (ξ).
Without heterogeneity (ξ=0): Y = θ·A + γ_Y·L + ε — the causal effect θ is the same for everyone, regardless of L. ATE = ATT = ATU = θ, and in the effect plot (4th panel) all points lie on a horizontal line.
With ξ=0.5: Y = (θ + 0.5·L)·A + γ_Y·L + ε — persons with higher L benefit more. A positive slope appears in the effect plot. Since with positive confounding the treated have higher L-values on average (they were more often treated): ATT > ATE > ATU. Same data, same method, three different correct answers — because three different population questions are asked.

Overlap (positivity assumption).
Overlap means: for every observed L-value there are both treated and untreated persons — 0 < P(A=1|L=l) < 1. This is visible in the first scatter plot: with good overlap, orange (A=1) and blue (A=0) points mix across the entire L-range. In the "No overlap" scenario the groups are almost completely separated — persons with high L are almost always treated.
This is a problem because G-Computation then extrapolates the outcome model into L-regions where no (or very few) controls were observed. The estimate becomes highly model-dependent — visible as a wider posterior and greater uncertainty.

When is G-Computation essential — and when does the regression coefficient suffice?
In the linear model without interaction (Y ~ A + L) the G-Computation ATE equals exactly the regression coefficient β̂_A: linearity ensures that conditional and marginal effects coincide (without confounding, G-Computation is even identical to a t-test).

In the linear model with interaction (Y ~ A + L + A:L) this no longer holds in general: β̂_A is only the effect at L=0, while G-Computation correctly averages over the actual L-distribution of the target population (ATE = β̂_A + β̂_{AL}·Ē[L]). Moreover, G-Computation computes ATT and ATU by restricting to the respective subpopulation — this is not possible with the coefficient alone.

Formula choice: Y ~ A + L versus Y ~ A * L
Standard DAGs encode causal structure (which variable influences which) — but not the functional form of those relationships. Whether the effect of A on Y differs across levels of L (effect heterogeneity, A:L interaction) is a substantive claim that goes beyond the graph: “Is it plausible that L moderates the effect of A?”

The tool therefore always specifies the outcome model as Y ~ A * L — a conservative default that allows heterogeneity without assuming it. Whether A:L is actually needed can be tested empirically with loo_compare(loo(fit_gc), loo(fit_add)) (see R code). If evidence for the interaction is lacking, Y ~ A + L is the more parsimonious model. The Golem Builder helps decide whether a moderation relationship is theoretically justified in the DAG.

G-Computation is essential for GLMs with non-linear link functions (e.g. logit for logistic regression, log for Poisson): there the coefficient is a conditional effect on the link scale (e.g. log-odds ratio). Due to the non-collapsibility of the logit link, conditional and marginal effects differ fundamentally — even without confounding. G-Computation delivers the marginal effect on the response scale (e.g. risk difference in probability points), which the coefficient alone cannot provide. avg_comparisons() in marginaleffects does exactly that, model-class-agnostically.

Extensive examples of G-Computation with brms for various GLMs (logistic, Poisson, multinomial, etc.) can be found on the blog of Solomon Kurz: Boost your power with baseline covariates.

Bias Deep Dive

OVB formula — why the naive estimator is biased.
The naive regression coefficient β̂naiv from Y ~ A equals (in the population):

β̂naiv = Cov(Y, A) / Var(A)
       = θ + γY · Cov(L, A) / Var(A)
       = θ  (true effect)  +  γY · δA∼L  (confounding bias)

δA∼L = Cov(L,A)/Var(A) is the slope coefficient from the auxiliary regression A ~ L — it measures how strongly L predicts treatment assignment. The product γY · δA∼L is the Omitted Variable Bias: it vanishes exactly when either L does not affect the outcome (γY=0) or L is independent of A (determined by γA) (δA∼L=0).

Sign of the bias. In this tool γ_A (L→A) and γ_Y (L→Y) can be set independently, yielding the following sign structure:

L→A L→Y Bias = (L→Y)×(L→A) Naive vs. True
++ + (overestimation) Naive larger
+ (−×− = +) Naive larger (often surprising!)
+ − (underestimation) Naive smaller
+ − (underestimation) Naive smaller (wrong sign possible!)

Rule of thumb: Bias > 0 when both paths have the same sign (both + or both −). Bias < 0 when signs differ. In the tool γ_A and γ_Y can be set independently — same sign means positive bias, opposite signs mean negative bias.

Heckman decomposition: Naive comparison = ATE + Selection Bias + HTEB.
The naive comparison E[Y(1)|A=1] − E[Y(0)|A=0] generally does not equal the ATE (Cunningham, 2021; Morgan & Winship, 2007). The ATE is first a weighted sum of ATT and ATU (π = P(A=1)):

ATE = π · ATT + (1−π) · ATU

After algebraic rearrangement (add and subtract E[Y(0)|A=1]) the following holds directly:

Naive comparison
E[Y(1)|A=1] − E[Y(0)|A=0] = ATE

                                 + {E[Y(0)|A=1] − E[Y(0)|A=0]} ← Selection Bias / Baseline Bias

                                 + (1−π) · (ATT − ATU) ← Heterogenous Treatment Effect Bias (HTEB)

Interpretation of the terms:
Selection Bias / Baseline Bias = E[Y(0)|A=1] − E[Y(0)|A=0]: How would the two groups differ if there were no treatment from the start? It is simply a description of baseline differences between the two groups under the control condition.
HTEB (Heterogenous Treatment Effect Bias) = (1−π)·(ATT−ATU): The expected difference in treatment effect between those in the treatment and control groups (multiplied by the population share). Arises when ξ≠0 in this tool.

When does Naive = ATE? Exactly when both terms vanish:
Selection Bias = 0 ⟺ E[Y(0)|A=1] = E[Y(0)|A=0] ⟺ Y(0) ⊥ A  (no baseline confounding)
HTEB = 0 ⟺ ATT = ATU  (no heterogeneity problem) or π = 1

In an RCT treatment is randomly assigned → Y(0) ⊥ A (Selection Bias = 0) and ATT ≈ ATU (HTEB ≈ 0) → Naive = ATE ✓

Source: Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press. Freely available online — with many further techniques for causal analysis: IPW, Difference-in-Differences, Regression Discontinuity, Instrumental Variables and more. An excellent introduction to the full breadth of causal inference methods.

ℹ Causal Calculator — Help
What will I learn here?
The fundamental problem of causal inference: we never observe the same person simultaneously under treatment and control. G-Computation solves this via an outcome model: estimate Y for each person under both treatment arms — and compare. Prerequisite: no unmeasured confounding (which variables are needed is shown by the Golem Builder).
What does "confounder" mean here?
L influences both who gets treated (L → A) and the outcome (L → Y). This creates a spurious correlation. The naive estimate (simple difference treated − untreated) mixes this confounder influence with the true effect. G-Computation corrects this by explicitly conditioning on L in the outcome model.
Recommended steps
ATE · ATT · ATU
Three different questions from the same data:

ATE: "What would the effect be if the entire population were randomly treated?" — population-level decisions.

ATT: "How much did the treatment help those who received it?" — programme evaluation.

ATU: "What would happen if we extended treatment to the untreated?" — scale-up decisions.

Without effect heterogeneity (ξ=0) all three are equal. With ξ≠0 they diverge — because the subpopulations have different L-distributions (visible in the effect plot: orange = treated, blue = controls).
G-Computation (Standardisation)
Step 1: Fit outcome model — Y ~ A * L (conservative default; see Methodology).
Step 2: For each posterior sample compute all Ŷ(1) and Ŷ(0) and average over the target population.
Step 3: Difference Ŷ(1) − Ŷ(0) = causal effect for that sample.
Result: A posterior distribution of the ATE/ATT/ATU — visible in the posterior panel and as individual points in the effect plot (fourth panel).
Related tools
Golem Builder — DAG-based causal analysis: which variables must be controlled in the model? (backdoor criterion, collider warning, data simulation)

brms Model Builder — transfer brms formula and priors from the Golem Builder into R code