Causal Calculator — G-Computation

The Problem — and the Solution

        Scenario: L = students’ prior knowledge  ·  A = voluntary tutorial attendance  ·  Y = exam grade

        Students with more prior knowledge (high L) attend the tutorial more often (L→A) and achieve better grades anyway (L→Y). This creates a spurious correlation: tutorial participants perform better — but not only because of the tutorial, but because they already knew more.

        Why does the naive group comparison mislead? Students who attend the tutorial (A=1) already have more prior knowledge on average — better grades would partly come “on their own”. The naive mean comparison E[Y|A=1] − E[Y|A=0] mixes the true tutorial effect with the prior-knowledge advantage: Naive estimate = Causal effect + Confounding bias. Only in a randomised experiment (RCT) would both groups be comparable — in observational studies we need G-Computation.

        G-Computation asks for each person the counterfactual question: “What would the grade be with the same prior knowledge — once with, once without tutorial?” The averaged difference is the causal effect — adjusted for prior knowledge.

Current in tool

■ Naive: —

■ G-Comp: —

■ True: —

Bias: —

Estimand — For which population?

Average Treatment Effect — Average effect in the total population. As if every person had been randomly assigned to treatment.

① Why does the simple group comparison mislead?

② What does G-Computation estimate for the causal effect?

G-Computation (dark=50% HDI, light=95% HDI) True Naive

      ⓘ Browser approximation via Normal-Normal conjugacy — not MCMC-based; serves as a conceptual illustration of G-Computation.
    

Bias Anatomy — Why does the naive estimator mislead?

          Arrow width ∝ |path strength|  ·  Orange = backdoor path  ·  Green = causal path
        

Decomposition of the naive estimator

          β̂naiv = θ  +  γY · δA∼L

          —
        
Number line (ATE)

          θ (ATE)
          Naive
        
          OVB = —
           (γY = — × δA∼L = —)

③ How does it work? — Counterfactual Predictions

Ŷ(1) — under treatment Ŷ(0) — without treatment Raw data

      Each person receives a model-based prediction under A=1 (●) and A=0 (●). The vertical distance = individual treatment effect. G-Computation averages these gaps across the target population (depending on the estimand).
    

④ Does the treatment have the same effect for everyone?

Treated (A=1) Control (A=0) Mean effect (ATE)

      Each point = individual causal effect Ŷ(1)−Ŷ(0) by confounder L. Without effect heterogeneity (ξ=0): all points on a line. With ξ≠0: slope visible — persons with higher L benefit more. Coloured points show the target population of the chosen estimand.
    

G-Computation

—

95% HDI: —

True value (DGP)

—

Data-generating process

Naive (uncorrected)

—

Bias: —

R Code (brms + marginaleffects)▶

Methodological Background▶

DGP of this tool.
All data are simulated according to the following data-generating process:
L ~ Normal(0,1) — confounder (e.g. health status)
A | L ~ Bernoulli(σ(γ_A·L)) — treatment depends on L (via γ_A)
Y | A, L ~ Normal((θ + ξ·L)·A + γ_Y·L, 1) — outcome
θ is the true causal effect of A on Y. γ_A controls how strongly L drives treatment selection; γ_Y controls how strongly L affects the outcome. In real data θ, γ_A and γ_Y are unknown; here we can set all three.

G-Computation: fully Bayesian.
The outcome model Y ~ A * L is fit in a Bayesian framework: flat priors, then K samples from the posterior. For each sample we compute for each person "What would Y be under A=1 vs. A=0?" and average the difference across the target population. The result is a proper posterior distribution of the estimand.

Why different populations for ATE, ATT, ATU?
The outcome model applies to everyone, but the question of for whom we average defines the estimand:
— ATE: Ŷ(1)−Ŷ(0) averaged over all n persons → "What would the effect be in the total population?" In code: avg_comparisons(fit_gc, variables = "A")
— ATT: only the n₁ actually treated (A=1) → "Did the treatment help those who received it?" Their L-values are systematically higher (since treatment correlates with L). In code: newdata = subset(dat, A==1)
— ATU: only the n₀ untreated (A=0) → "What would happen if we extended treatment to them?" In code: newdata = subset(dat, A==0)
Without effect heterogeneity (ξ=0) all three values are equal. With ξ≠0 they diverge — because the subpopulations have different L-distributions and the effect depends on L.

Browser approximation.
For illustration this tool uses Normal-Normal conjugacy: with a Normal likelihood and Normal prior the posterior is analytically tractable, no MCMC needed. The R code shows brms with full MCMC — which is the recommended approach for your own analyses.

Effect heterogeneity (ξ).
Without heterogeneity (ξ=0): Y = θ·A + γ_Y·L + ε — the causal effect θ is the same for everyone, regardless of L. ATE = ATT = ATU = θ, and in the effect plot (4th panel) all points lie on a horizontal line.
With ξ=0.5: Y = (θ + 0.5·L)·A + γ_Y·L + ε — persons with higher L benefit more. A positive slope appears in the effect plot. Since with positive confounding the treated have higher L-values on average (they were more often treated): ATT > ATE > ATU. Same data, same method, three different correct answers — because three different population questions are asked.

Overlap (positivity assumption).
Overlap means: for every observed L-value there are both treated and untreated persons — 0 < P(A=1|L=l) < 1. This is visible in the first scatter plot: with good overlap, orange (A=1) and blue (A=0) points mix across the entire L-range. In the "No overlap" scenario the groups are almost completely separated — persons with high L are almost always treated.
This is a problem because G-Computation then extrapolates the outcome model into L-regions where no (or very few) controls were observed. The estimate becomes highly model-dependent — visible as a wider posterior and greater uncertainty.

When is G-Computation essential — and when does the regression coefficient suffice?
In the linear model without interaction (Y ~ A + L) the G-Computation ATE equals exactly the regression coefficient β̂_A: linearity ensures that conditional and marginal effects coincide (without confounding, G-Computation is even identical to a t-test).

In the linear model with interaction (Y ~ A + L + A:L) this no longer holds in general: β̂_A is only the effect at L=0, while G-Computation correctly averages over the actual L-distribution of the target population (ATE = β̂_A + β̂_{AL}·Ē[L]). Moreover, G-Computation computes ATT and ATU by restricting to the respective subpopulation — this is not possible with the coefficient alone.

Formula choice: Y ~ A + L versus Y ~ A * L
Standard DAGs encode causal structure (which variable influences which) — but not the functional form of those relationships. Whether the effect of A on Y differs across levels of L (effect heterogeneity, A:L interaction) is a substantive claim that goes beyond the graph: “Is it plausible that L moderates the effect of A?”

The tool therefore always specifies the outcome model as Y ~ A * L — a conservative default that allows heterogeneity without assuming it. Whether A:L is actually needed can be tested empirically with loo_compare(loo(fit_gc), loo(fit_add)) (see R code). If evidence for the interaction is lacking, Y ~ A + L is the more parsimonious model. The Golem Builder helps decide whether a moderation relationship is theoretically justified in the DAG.

G-Computation is essential for GLMs with non-linear link functions (e.g. logit for logistic regression, log for Poisson): there the coefficient is a conditional effect on the link scale (e.g. log-odds ratio). Due to the non-collapsibility of the logit link, conditional and marginal effects differ fundamentally — even without confounding. G-Computation delivers the marginal effect on the response scale (e.g. risk difference in probability points), which the coefficient alone cannot provide. avg_comparisons() in marginaleffects does exactly that, model-class-agnostically.

Extensive examples of G-Computation with brms for various GLMs (logistic, Poisson, multinomial, etc.) can be found on the blog of Solomon Kurz: Boost your power with baseline covariates.

Bias Deep Dive▶

OVB formula — why the naive estimator is biased.
The naive regression coefficient β̂_naiv from Y ~ A equals (in the population):

        β̂naiv = Cov(Y, A) / Var(A)

               = θ + γY · Cov(L, A) / Var(A)

               = θ  (true effect)
            +  γY · δA∼L  (confounding bias)

δ_A∼L = Cov(L,A)/Var(A) is the slope coefficient from the auxiliary regression A ~ L — it measures how strongly L predicts treatment assignment. The product γ_Y · δ_A∼L is the Omitted Variable Bias: it vanishes exactly when either L does not affect the outcome (γ_Y=0) or L is independent of A (determined by γ_A) (δ_A∼L=0).

Sign of the bias. In this tool γ_A (L→A) and γ_Y (L→Y) can be set independently, yielding the following sign structure:

L→A	L→Y	Bias = (L→Y)×(L→A)	Naive vs. True
+	+	+ (overestimation)	Naive larger
−	−	+ (−×− = +)	Naive larger (often surprising!)
+	−	− (underestimation)	Naive smaller
−	+	− (underestimation)	Naive smaller (wrong sign possible!)

Rule of thumb: Bias > 0 when both paths have the same sign (both + or both −). Bias < 0 when signs differ. In the tool γ_A and γ_Y can be set independently — same sign means positive bias, opposite signs mean negative bias.

Heckman decomposition: Naive comparison = ATE + Selection Bias + HTEB.
The naive comparison E[Y(1)|A=1] − E[Y(0)|A=0] generally does not equal the ATE (Cunningham, 2021; Morgan & Winship, 2007). The ATE is first a weighted sum of ATT and ATU (π = P(A=1)):

        ATE = π · ATT + (1−π) · ATU
      

After algebraic rearrangement (add and subtract E[Y(0)|A=1]) the following holds directly:

Naive comparison

        E[Y(1)|A=1] − E[Y(0)|A=0] = ATE

                                         + {E[Y(0)|A=1] − E[Y(0)|A=0]} ← Selection Bias / Baseline Bias

                                         + (1−π) · (ATT − ATU) ← Heterogenous Treatment Effect Bias (HTEB)

Interpretation of the terms:
— Selection Bias / Baseline Bias = E[Y(0)|A=1] − E[Y(0)|A=0]: How would the two groups differ if there were no treatment from the start? It is simply a description of baseline differences between the two groups under the control condition.
— HTEB (Heterogenous Treatment Effect Bias) = (1−π)·(ATT−ATU): The expected difference in treatment effect between those in the treatment and control groups (multiplied by the population share). Arises when ξ≠0 in this tool.

When does Naive = ATE? Exactly when both terms vanish:
— Selection Bias = 0 ⟺ E[Y(0)|A=1] = E[Y(0)|A=0] ⟺ Y(0) ⊥ A (no baseline confounding)
— HTEB = 0 ⟺ ATT = ATU (no heterogeneity problem) or π = 1

In an RCT treatment is randomly assigned → Y(0) ⊥ A (Selection Bias = 0) and ATT ≈ ATU (HTEB ≈ 0) → Naive = ATE ✓

Source: Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press. Freely available online — with many further techniques for causal analysis: IPW, Difference-in-Differences, Regression Discontinuity, Instrumental Variables and more. An excellent introduction to the full breadth of causal inference methods.

Scenario

Sample size n

DGP Parameters