Causal Calculator
G-Computation Β· Counterfactuals Β· Individual treatment effects
Β© Dr. Rainer DΓΌsing Β· Interactive Tools by Claude
Causal Calculator
G-Computation Β· ATE Β· ATT Β· ATU Β· Counterfactuals
Β© Dr. Rainer DΓΌsing Β· Interactive Tools by Claude
InteractiveCausal Inference G-Computationmarginaleffects
The Problem β€” and the Solution
Scenario: L = students’ prior knowledge  Β·  A = voluntary tutorial attendance  Β·  Y = exam grade
Students with more prior knowledge (high L) attend the tutorial more often (L→A) and achieve better grades anyway (L→Y). This creates a spurious correlation: tutorial participants perform better — but not only because of the tutorial, but because they already knew more.

Why does the naive group comparison mislead? Students who attend the tutorial (A=1) already have more prior knowledge on average β€” better grades would partly come β€œon their own”. The naive mean comparison E[Y|A=1] βˆ’ E[Y|A=0] mixes the true tutorial effect with the prior-knowledge advantage: Naive estimate = Causal effect + Confounding bias. Only in a randomised experiment (RCT) would both groups be comparable β€” in observational studies we need G-Computation.

G-Computation asks for each person the counterfactual question: β€œWhat would the grade be with the same prior knowledge β€” once with, once without tutorial?” The averaged difference is the causal effect β€” adjusted for prior knowledge.
Current in tool
β–  Naive: β€”
β–  G-Comp: β€”
β–  True: β€”
Bias: β€”
Estimand β€” For which population?
Average Treatment Effect β€” Average effect in the total population. As if every person had been randomly assigned to treatment.
β‘  Why does the simple group comparison mislead?
β‘‘ What does G-Computation estimate for the causal effect?
G-Computation (dark=50% HDI, light=95% HDI) True Naive
β“˜ Browser approximation via Normal-Normal conjugacy β€” not MCMC-based; serves as a conceptual illustration of G-Computation.
Bias Anatomy β€” Why does the naive estimator mislead?
Arrow width ∝ |path strength| · Orange = backdoor path · Green = causal path
Decomposition of the naive estimator
Ξ²Μ‚naiv = ΞΈ  +  Ξ³Y · Ξ΄A∼L
Number line (ATE)
ΞΈ (ATE) Naive
OVB =  (Ξ³Y = × Ξ΄A∼L = )
β‘’ How does it work? β€” Counterfactual Predictions
ΕΆ(1) β€” under treatment ΕΆ(0) β€” without treatment Raw data
Each person receives a model-based prediction under A=1 (●) and A=0 (●). The vertical distance = individual treatment effect. G-Computation averages these gaps across the target population (depending on the estimand).
β‘£ Does the treatment have the same effect for everyone?
Treated (A=1) Control (A=0) Mean effect (ATE)
Each point = individual causal effect ΕΆ(1)βˆ’ΕΆ(0) by confounder L. Without effect heterogeneity (ΞΎ=0): all points on a line. With ΞΎβ‰ 0: slope visible β€” persons with higher L benefit more. Coloured points show the target population of the chosen estimand.
G-Computation
β€”
95% HDI: β€”
True value (DGP)
β€”
Data-generating process
Naive (uncorrected)
β€”
Bias: β€”
R Code (brms + marginaleffects)β–Ά

      
    
Methodological Backgroundβ–Ά

DGP of this tool.
All data are simulated according to the following data-generating process:
L ~ Normal(0,1) β€” confounder (e.g. health status)
A | L ~ Bernoulli(Οƒ(Ξ³_AΒ·L)) β€” treatment depends on L (via Ξ³_A)
Y | A, L ~ Normal((ΞΈ + ΞΎΒ·L)Β·A + Ξ³_YΒ·L, 1) β€” outcome
ΞΈ is the true causal effect of A on Y. Ξ³_A controls how strongly L drives treatment selection; Ξ³_Y controls how strongly L affects the outcome. In real data ΞΈ, Ξ³_A and Ξ³_Y are unknown; here we can set all three.

G-Computation: fully Bayesian.
The outcome model Y ~ A * L is fit in a Bayesian framework: flat priors, then K samples from the posterior. For each sample we compute for each person "What would Y be under A=1 vs. A=0?" and average the difference across the target population. The result is a proper posterior distribution of the estimand.

Why different populations for ATE, ATT, ATU?
The outcome model applies to everyone, but the question of for whom we average defines the estimand:
β€” ATE: ΕΆ(1)βˆ’ΕΆ(0) averaged over all n persons β†’ "What would the effect be in the total population?" In code: avg_comparisons(fit_gc, variables = "A")
β€” ATT: only the n₁ actually treated (A=1) β†’ "Did the treatment help those who received it?" Their L-values are systematically higher (since treatment correlates with L). In code: newdata = subset(dat, A==1)
β€” ATU: only the nβ‚€ untreated (A=0) β†’ "What would happen if we extended treatment to them?" In code: newdata = subset(dat, A==0)
Without effect heterogeneity (ΞΎ=0) all three values are equal. With ΞΎβ‰ 0 they diverge β€” because the subpopulations have different L-distributions and the effect depends on L.

Browser approximation.
For illustration this tool uses Normal-Normal conjugacy: with a Normal likelihood and Normal prior the posterior is analytically tractable, no MCMC needed. The R code shows brms with full MCMC β€” which is the recommended approach for your own analyses.

Effect heterogeneity (ΞΎ).
Without heterogeneity (ΞΎ=0): Y = ΞΈΒ·A + Ξ³_YΒ·L + Ξ΅ β€” the causal effect ΞΈ is the same for everyone, regardless of L. ATE = ATT = ATU = ΞΈ, and in the effect plot (4th panel) all points lie on a horizontal line.
With ΞΎ=0.5: Y = (ΞΈ + 0.5Β·L)Β·A + Ξ³_YΒ·L + Ξ΅ β€” persons with higher L benefit more. A positive slope appears in the effect plot. Since with positive confounding the treated have higher L-values on average (they were more often treated): ATT > ATE > ATU. Same data, same method, three different correct answers β€” because three different population questions are asked.

Overlap (positivity assumption).
Overlap means: for every observed L-value there are both treated and untreated persons β€” 0 < P(A=1|L=l) < 1. This is visible in the first scatter plot: with good overlap, orange (A=1) and blue (A=0) points mix across the entire L-range. In the "No overlap" scenario the groups are almost completely separated β€” persons with high L are almost always treated.
This is a problem because G-Computation then extrapolates the outcome model into L-regions where no (or very few) controls were observed. The estimate becomes highly model-dependent β€” visible as a wider posterior and greater uncertainty.

When is G-Computation essential β€” and when does the regression coefficient suffice?
In the linear model without interaction (Y ~ A + L) the G-Computation ATE equals exactly the regression coefficient Ξ²Μ‚_A: linearity ensures that conditional and marginal effects coincide (without confounding, G-Computation is even identical to a t-test).

In the linear model with interaction (Y ~ A + L + A:L) this no longer holds in general: Ξ²Μ‚_A is only the effect at L=0, while G-Computation correctly averages over the actual L-distribution of the target population (ATE = Ξ²Μ‚_A + Ξ²Μ‚_{AL}Β·Δ’[L]). Moreover, G-Computation computes ATT and ATU by restricting to the respective subpopulation β€” this is not possible with the coefficient alone.

Formula choice: Y ~ A + L versus Y ~ A * L
Standard DAGs encode causal structure (which variable influences which) β€” but not the functional form of those relationships. Whether the effect of A on Y differs across levels of L (effect heterogeneity, A:L interaction) is a substantive claim that goes beyond the graph: β€œIs it plausible that L moderates the effect of A?”

The tool therefore always specifies the outcome model as Y ~ A * L β€” a conservative default that allows heterogeneity without assuming it. Whether A:L is actually needed can be tested empirically with loo_compare(loo(fit_gc), loo(fit_add)) (see R code). If evidence for the interaction is lacking, Y ~ A + L is the more parsimonious model. The Golem Builder helps decide whether a moderation relationship is theoretically justified in the DAG.

G-Computation is essential for GLMs with non-linear link functions (e.g. logit for logistic regression, log for Poisson): there the coefficient is a conditional effect on the link scale (e.g. log-odds ratio). Due to the non-collapsibility of the logit link, conditional and marginal effects differ fundamentally β€” even without confounding. G-Computation delivers the marginal effect on the response scale (e.g. risk difference in probability points), which the coefficient alone cannot provide. avg_comparisons() in marginaleffects does exactly that, model-class-agnostically.

Extensive examples of G-Computation with brms for various GLMs (logistic, Poisson, multinomial, etc.) can be found on the blog of Solomon Kurz: Boost your power with baseline covariates.

Bias Deep Diveβ–Ά

OVB formula β€” why the naive estimator is biased.
The naive regression coefficient Ξ²Μ‚naiv from Y ~ A equals (in the population):

Ξ²Μ‚naiv = Cov(Y, A) / Var(A)
       = ΞΈ + Ξ³Y · Cov(L, A) / Var(A)
       = ΞΈ  (true effect)  +  Ξ³Y · Ξ΄A∼L  (confounding bias)

Ξ΄A∼L = Cov(L,A)/Var(A) is the slope coefficient from the auxiliary regression A ~ L β€” it measures how strongly L predicts treatment assignment. The product Ξ³Y · Ξ΄A∼L is the Omitted Variable Bias: it vanishes exactly when either L does not affect the outcome (Ξ³Y=0) or L is independent of A (determined by Ξ³A) (Ξ΄A∼L=0).

Sign of the bias. In this tool γ_A (L→A) and γ_Y (L→Y) can be set independently, yielding the following sign structure:

Lβ†’A Lβ†’Y Bias = (Lβ†’Y)×(Lβ†’A) Naive vs. True
++ + (overestimation) Naive larger
+ (−×− = +) Naive larger (often surprising!)
+ − (underestimation) Naive smaller
+ − (underestimation) Naive smaller (wrong sign possible!)

Rule of thumb: Bias > 0 when both paths have the same sign (both + or both −). Bias < 0 when signs differ. In the tool Ξ³_A and Ξ³_Y can be set independently β€” same sign means positive bias, opposite signs mean negative bias.

Heckman decomposition: Naive comparison = ATE + Selection Bias + HTEB.
The naive comparison E[Y(1)|A=1] − E[Y(0)|A=0] generally does not equal the ATE (Cunningham, 2021; Morgan & Winship, 2007). The ATE is first a weighted sum of ATT and ATU (π = P(A=1)):

ATE = π · ATT + (1−π) · ATU

After algebraic rearrangement (add and subtract E[Y(0)|A=1]) the following holds directly:

Naive comparison
E[Y(1)|A=1] − E[Y(0)|A=0] = ATE

                                 + {E[Y(0)|A=1] − E[Y(0)|A=0]} ← Selection Bias / Baseline Bias

                                 + (1−π) · (ATT − ATU) ← Heterogenous Treatment Effect Bias (HTEB)

Interpretation of the terms:
Selection Bias / Baseline Bias = E[Y(0)|A=1] − E[Y(0)|A=0]: How would the two groups differ if there were no treatment from the start? It is simply a description of baseline differences between the two groups under the control condition.
HTEB (Heterogenous Treatment Effect Bias) = (1−π)·(ATT−ATU): The expected difference in treatment effect between those in the treatment and control groups (multiplied by the population share). Arises when ξ≠0 in this tool.

When does Naive = ATE? Exactly when both terms vanish:
Selection Bias = 0 ⟺ E[Y(0)|A=1] = E[Y(0)|A=0] ⟺ Y(0) ⊥ A  (no baseline confounding)
HTEB = 0 ⟺ ATT = ATU  (no heterogeneity problem) or π = 1

In an RCT treatment is randomly assigned → Y(0) ⊥ A (Selection Bias = 0) and ATT ≈ ATU (HTEB ≈ 0) → Naive = ATE ✓

Source: Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press. Freely available online β€” with many further techniques for causal analysis: IPW, Difference-in-Differences, Regression Discontinuity, Instrumental Variables and more. An excellent introduction to the full breadth of causal inference methods.

β„Ή Causal Calculator β€” Help
What will I learn here?
The fundamental problem of causal inference: we never observe the same person simultaneously under treatment and control. G-Computation solves this via an outcome model: estimate Y for each person under both treatment arms β€” and compare. Prerequisite: no unmeasured confounding (which variables are needed is shown by the Golem Builder).
What does "confounder" mean here?
L influences both who gets treated (L β†’ A) and the outcome (L β†’ Y). This creates a spurious correlation. The naive estimate (simple difference treated βˆ’ untreated) mixes this confounder influence with the true effect. G-Computation corrects this by explicitly conditioning on L in the outcome model.
Recommended steps
ATE Β· ATT Β· ATU
Three different questions from the same data:

ATE: "What would the effect be if the entire population were randomly treated?" β€” population-level decisions.

ATT: "How much did the treatment help those who received it?" β€” programme evaluation.

ATU: "What would happen if we extended treatment to the untreated?" β€” scale-up decisions.

Without effect heterogeneity (ΞΎ=0) all three are equal. With ΞΎβ‰ 0 they diverge β€” because the subpopulations have different L-distributions (visible in the effect plot: orange = treated, blue = controls).
G-Computation (Standardisation)
Step 1: Fit outcome model β€” Y ~ A * L (conservative default; see Methodology).
Step 2: For each posterior sample compute all ΕΆ(1) and ΕΆ(0) and average over the target population.
Step 3: Difference ΕΆ(1) βˆ’ ΕΆ(0) = causal effect for that sample.
Result: A posterior distribution of the ATE/ATT/ATU β€” visible in the posterior panel and as individual points in the effect plot (fourth panel).
Related tools
Golem Builder β€” DAG-based causal analysis: which variables must be controlled in the model? (backdoor criterion, collider warning, data simulation)

brms Model Builder β€” transfer brms formula and priors from the Golem Builder into R code