Data Collection for Data Quality

Print JPG RGB Large


Project Overview

In recent years response rates in major surveys have been falling and survey researchers have started to give greater emphasis to nonresponse bias. At the same time costs of data collection have significantly increased. Consequently, developing data collection and analysis methods that ensure high data quality whilst also controlling for survey costs are important concerns in survey research. Work Package 1 (WP1) assesses existing and develops new methods for assessment of quality of data collection in sample surveys, including face-to-face, telephone, web surveys and surveys with mixed modes of data collection. The use of paradata (survey process data) from different modes plays a key focus, as their potential for reducing survey error and improving fieldwork efficiency have become increasingly apparent. However, little is currently known about how to model these types of data in ways that provide substantive insight, while also adequately accounting for their complex structures. The project develops and applies techniques for the analysis of the resulting complex linked datasets with time-dependent and hierarchical properties. Analysis techniques are developed and applied in the substantive context of using survey paradata to model unit nonresponse and measurement error. The key methodological challenges are to properly incorporate information about the non-independence of observations, while also accounting for linkage errors between datasets. We will employ a range of modelling approaches. This will include, for example, sequence analysis and multilevel modelling techniques. Findings from the project will have wide ranging implications for survey practice. For example, by focussing on nonresponse bias generating mechanisms in large-scale surveys, the outcomes of WP1 will contribute to efforts by survey practitioners to reduce nonresponse bias in key social and economic surveys. While the focus of WP1 will be on survey nonresponse, our work on developing analytical methods will be of interest to researchers using these types of complex data structures in different substantive contexts. The project will make use of a unique dataset, the 2011 ONS Census nonresponse link study, which links census records to survey outcomes and to unit-level paradata for responding and nonresponding households, as well as data from the Understanding Society, a large-scale longitudinal household survey in the UK. Other datasets will also be explored. The work is broadly organised into the following three subprojects:


  1. Assessing data quality during data collection:

This subproject explores the relationship between nonresponse rate and nonresponse bias and develops the use of representativeness indicators to assess the risk of nonresponse bias during data collection. Key research questions include how many calls are necessary to achieve a specified level of data quality and which sample members should be followed up when the aim is to increase data quality rather than simply to increase response rates, whilst also considering costs. The work will inform adaptive and responsive survey designs. Specifically, we analyse call record and other field process data linked to 2011 census records and to survey outcomes for the same households. These data are characterised by having measurements made across several time points and datasets, as well as possessing hierarchical structures through the nesting of individuals within households, interviewers and areas.


  1. Interviewer effects on survey data quality

In face-to-face surveys, an interviewer plays a key role in gaining cooperation and in achieving high quality survey responses. This project analyses interviewer effects on interviewer bias, measurement error and other indicators of data quality. One commonly used data quality indicator is response latencies, measuring the time for a sample member to answer the survey or individual survey questions. This subproject will make use of multilevel modelling techniques. It applies a novel specification of a random slope multilevel model to the analysis of interviewer effects on nonresponse bias. The use of a recently developed cross-classified mixed-effects location scale model is assessed to analyse interviewer effects on response latencies.


  1. Using linked data and dealing with linkage errors

Survey data collection and analysis can sometimes be enhanced by linkage to different data sources. These may include census, register and administrative data and paradata. Such linked datasets offer great potential for analysis and may also enhance current and future data collection methods. The project also explores the impact of linkage errors and proposes ways of correcting for them.


Comments are closed, but trackbacks and pingbacks are open.