Missing Data
It is important to consider the issues raised by missing data at the research design stage. As unplanned missing data inevitably introduce ambiguity into the inferences that can be drawn from a study, the design should be carefully scrutinised to minimise the scope for missing data to arise. Considerable care over this aspect of design will pay a substantial dividend when the study is analysed.
Inevitably, however, missing data will arise. Ambiguity in the analysis can be reduced if the chance of the data being missing depends only on observed data; the so-called ‘missing at random’ scenario (see the ‘Getting Started’ section of www.missingdata.org.uk). In other words, investigators should consider
which variables are likely to prove difficult to collect. Then they should see whether there are variables they could reliably collect which are likely to predict the chance of observing the difficult to collect variables.
To illustrate, people may be reluctant to divulge their income, but it may be easy to obtain their property band. If property band is a good predictor of the chance of people divulging their income (technically, if within each property band we observe a random sample of incomes) then collecting property band, and making appropriate adjustments in the analysis, will allow valid inferences to be drawn.
Longitudinal studies should consider which subgroups of individuals are likely to be lost to follow-up, and consider strategies for keeping in touch with representative samples of these groups.
Ensuring there is sufficient funding, and a careful strategy, for following up initial non-responders greatly increases the credibility of the conclusions.
Finally, if you suspect missing data is likely to be a substantial issue in the analysis, budget for statistical advice on handling it.
Strategy for analysis of partially observed data set Make sure you are familiar with the issues raised by missing data; see for instance the documents in the ‘Getting Started’ section of www.missingdata.org.uk
The next stage is to familiarise yourself with the data. A natural starting point is an analysis of the fully observed data; note that with missing data this is only the starting point! At this stage you should clearly identify (if you have not done so already) (i) the hypotheses of interest (ii) the models that you are going to use to explore them and (iii) the variables that you are going to use, including any that are partially observed. Note that variables that are apparently unrelated in the subset of observed data may become important later on!
Source:
ESRC: National Center for Research Method,
www.ncrm.ac.uk,
http://www.lshtm.ac.uk/msu/missingdata/guidelines.pdf
<< Home