What will I learn here?
This tool explains
Maximum Likelihood Estimation (MLE) — one of the most
important estimation mechanisms in statistics. And it makes a crucial distinction clear:
likelihood is not probability.
- What is likelihood — and why is it a function of the parameter, not the data?
- How does a likelihood landscape arise and where is its peak?
- Why do we compute with log-likelihood instead of likelihood?
- How does MLE work for Poisson and Bernoulli — not just the Normal distribution?
The three stages
- Stage 1 — One data point: Slide the distribution over the fixed data point.
Observe how the density value at the data point (= likelihood) changes.
The peak shows: MLE = μ̂ = y.
- Stage 2 — Many data points: Total likelihood = product of individual densities
(= sum of log-densities). Right: the log-likelihood landscape over μ.
Tip: activate "Vary σ too" for the 2D heatmap.
- Stage 3 — Other families: MLE with Poisson (count data) and Bernoulli (0/1).
Same principle, different formula. AIC/BIC enable comparison between families.
Likelihood ≠ Probability
Probability: Parameters fixed → how probable are these data?
Likelihood: Data fixed, observed → how plausible is this parameter?
Likelihood is
not a probability distribution over the parameter —
it does not integrate to 1. Only a prior turns it into a posterior (Bayes theorem).
Why this matters for Bayes
MLE delivers the most plausible parameter without prior information.
Bayesian estimation weights this likelihood with a prior:
Posterior ∝ Likelihood × Prior.
With a flat prior → posterior mode ≈ MLE.
This makes MLE the conceptual foundation for everything that follows.
Next → From LM to GLM: link functions and
why GLMs apply the same MLE logic to other distributions