What's up?
September 18th 2003:
Symposium
on Causal Effects in
Observational Studies
Reports from Academic
and Practice
in Nuremberg/Germany
September 19th 2003:

Workshop
on Multiple Imputation and
Split Questionnaire Survey
Designs
in Nuremberg/Germany
Related Topics
August 13th-20th 2003:
ISI
World Congress of the
International Statistical Institute
in Berlin/Germany
October 9th-12th 2003:

Workshop
on Item Non-response and
Data Quality in Large Social
Surveys
in Basel/Switzerland
Useful Links
More about Multiple Imputation

raessler automation & consulting

Chair of Statistics
and Econometrics of the
University of
Erlangen-Nuremberg, Germany
Download Area
Software
Missing Data

Everybody has them, nobody wants them! 

Often empirical researchers are confronted with missing values in their data sets. As the phenomenon is usually not seen as a possible threat to the validity of the reseach, the most common approach to this problem is simply to deny it. However, a closer look to the data often reveals 5% to 20% of missing values in a few variables, reducing the available data for any multivariate analysis considerably.

Unit no.
Gender
Age
Education
Health state
Personal
Net-Income 
...
Purchasing information about cereals
Brand A
Brand B
...
1
female
40-45
high
good
?
...
regulary
regulary
...
2
male
30-35
middle
poor
4500-5000
...
never
regulary
...
3
female
>60
?
poor
4000-4500
...
regulary
seldom
...
4
male
20-25
high
?
?
...
seldom
?
...
5
male
20-25
low
?
1500-2000
...
never
seldom
...
6
female
30-35
low
good
1500-2000
...
never
regulary
...
...
...
...
...
...
...
...
...
...
...

CaseDeletion

Unit no.
Gender
Age
Education
Health state
Personal
Net-Income 
...
Purchasing information about cereals
Brand A
Brand B
...
2
male
30-35
middle
poor
4500-5000
...
never
regulary
...
6
female
30-35
low
good
1500-2000
...
never
regulary
...
...
...
...
...
...
...
...
...
...
...

Moreover, often these blind spots were not dropped randomly all over the responses. We find special socio-economic groups or minorities disproportionately struck by missing values. Even worse, if the missingness depends on the variable of interest itself, like it is common that the highest income appears to be unknown. The same happens when e.g. populations with worst health conditions or high at risk refuse to be sampled. Finally, the quality of response deteriorates with long and boring questionnaires like they are common practice in media research.

In all these cases, missing data can be a threat to the research and the remaining data are all but representive for the population of interest. Thus, in general, we have found multiple imputation to be a very helpful and powerful tool to get the right answers even in the presence of nonresponse.

Contact me

copyright © 2003 by  susanne raessler
last modified Feb 27 2003