Chapter 6 Threats to the validity of Causal Inference

In this final section, I want to discuss more generally about the possible threats to the validity of methods of causaliInference. Most of these threast stem from the fact that, much as particles do when physicists try to measure their position and velocity, human beings react to our experimental devices in sometimes unexpected ways. It is classical to make a disctinction between two threat and one specific set of problems:

  1. Threats to internal validity: these are the threats that vitiate the result of the experiment in the sample at hand, even when only looking at the Intention to Treat Effect. They also include threats to the exclusion restriction assumption.
  2. Threats to the measurement of precision
  3. Threats to external validity: these are the threats that make the extension of the results from one experiment to the same population at another period of to another population dubious.
  4. Ethical and political issues

6.1 Threats to internal validity

Threats to internal validity are the problems that might make the results of the experiment not measure the effect of the treatment of interest in the ongoing sample. Let’s examine the most important ones in turn.

6.1.1 Survey bias

There is survey bias if the mere fact of having to answer to a survey alters the outcomes of the surveyed individuals. For example, administering a health survey might make you pay more attention to your health and as a consequence improve it. Survey bias alters the measured impact of the treatment since it alters the behavior of the control group. As a result, the estimated effect of the treatment might be biased.

Remark. Note that survey bias might affect all the estimators presented in this book, including natural experimental and observational estimators. All estimators rely on measuring something and thus might be affected by survey bias.

Actually, RCTs might be able to avoid survey bias whereas other methods generally cannot. Indeed, survey bias generally occurs with repeated sampling: surveying at baseline might trigger a response by individuals, and thus bias the measurement at the end of the experiment. RCTs can avoid this issue by bypassing the baseline survey, or at least collecting baseline information on a subsample of the experimental sample. Other estimators that might be able to avoid this problem are DID, where the repeated cross section estimator eschews survey bias, and RDD and IV, which generally use only cross sectional information. Matching in general requires observations for the same individual over time, so that avoiding possible survey bias is impossible.

Remark. Do we have evidence of the existence and extent of survey bias? A paper by Zwane et al (2010) shows that there is extensive survey bias in 2 out of 5 experiments.

In the first example, the authors show that being surveyed more frequently (every two weeks vs every six months) for the extent of diarrhea prevalence and use of chlorine decontamination increases the use of chlorine, decreases diarrhea prevalence and decreases the effect of a spring protection program, to the extent that it becomes null in the frequent survey sample, whereas it is large and positive in the infrequent survey sample. The authors speculate that the frequent surveys act as a reminder for chlorinating water, which is a substitute for well protection.

In a second experiment, the authors randomly run a baseline survey on 80% of the households tha would be later included in a RCT where health insurance would be offered at randomly selected prices. The baseline survey includes questions about health and health insurance, but does not mention the particular product that will be offered later. The authors find a small imprecise increase in insurance take up in the group having undergone the baseline survey (5% \(\pm\) 6.8), non significant at usual levels of confidence. They also find no evidence of impact of the baseline survey on the response of households to the price incentive.

In a third experiment, the authors report on the effect of being surveyed with a survey continaing questions on health status and health insurance on the subsequent adoption of health insurance. They find evidence that the baseline survey has increased the adoption of health insurance by 6.7% \(\pm\) 6.6, from a mean of 26.4% in the control group. The effect dissipates over time though and becomes much smaller 15 to 18 months after the treatment.

In a fourth experiment, the authors randomly selected 60% of targeted households to be included in a baseline survey including wuestions on household finances and borrowing opportunities.
The sample was then prospected by a micro-credit firm. The aurhors do not find higher micro-credit take-up among households surveyed at baseline (-0.009 \(\pm\) 0.048, with a baseline take up rate of 0.166). Note that the estimates are imprecise though.

In a fifth experiment, the authors randomly assigned the order in which households had to be contacted for a baseline survey, along with a fixed number of households to be sruveyed by village. The survey contained questions about finance and microfinancne use. Also, the survey explicitely mentioned that househods were interviewed because they were patrons of a micro-finance provider. Sunbsequently, households had to decide whether or not to renew their loans from the micro-finance provider. The authors do not evidence of an effect of being surveyed on the subsequent decision to renew the loan (-0.005 \(\pm\) 0.026 from a baseline rate of 0.769).

6.1.2 Experimenter bias

6.1.3 Substitution bias

6.1.4 Diffusion bias

6.2 Threats to the measurement of precision

6.2.1 Insufficient precision

6.2.2 Clustering

6.3 Threats to external validity

6.3.1 Randomization bias

6.3.2 Equilibrium effects

6.3.3 Context effects

6.3.4 Site selection bias

6.3.5 Publication bias

6.3.6 Ethical and political issues