Introduction: the Two Fundamental Problems of Inference
When trying to estimate the effect of a program on an outcome, we face two very important and difficult problems: the Fundamental Problem of Causal Inference (FPCI) and the Fundamental Problem of Statistical Inference (FPSI).
In its most basic form, the FPCI states that our causal parameter of interest (\(TT\), short for Treatment on the Treated, that we will define shortly) is fundamentally unobservable, even when the sample size is infinite. The main reason for that is that one component of \(TT\), the outcome of the treated had they not received the program, remains unobservable. We call this outcome a counterfactual outcome. The FPCI is a very dispiriting result, and is actually the basis for all of the statistical methods of causal inference. All of these methods try to find ways to estimate the counterfactual by using observable quantities that hopefully approximate it as well as possible. Most people, including us but also policymakers, generally rely on intuitive quantities in order to generate the counterfactual (the individuals without the program or the individuals before the program was implemented). Unfortunately, these approximations are generally very crude, and the resulting estimators of \(TT\) are generally biased, sometimes severely.
The Fundamental Problem of Statistical Inference (FPSI) states that, even if we have an estimator \(E\) that identifies \(TT\) in the population, we cannot observe \(E\) because we only have access to a finite sample of the population. The only thing that we can form from the sample is a sample equivalent \(\hat{E}\) to the population quantity \(E\), and \(\hat{E}\neq E\). Why is \(\hat{E}\neq E\)? Because a finite sample is never perfectly representative of the population. What can we do to deal with the FPSI? I am going to argue that there are mainly two things that we might want to do: estimating the extent of sampling noise and decreasing sampling noise.