DATA CAPTURE AND PROCESSING
The first activity was checking and editing the waypoint file. Three main problems were found: duplicate waypoints, waypoints where letters were used instead of numbers, and invalid waypoint numbers. In all cases the errors were manually corrected. Duplicate waypoints were removed, alphanumeric numbers with "o" instead of "0" or "B" instead of "8" were rectified, and waypoints close to invalid waypoints were compared to obtain a clue as to what the likely correct number should be.
After linking waypoints with questionnaire data it was realized that the vast majority of waypoint numbers could be linked to their corresponding dwelling sticker numbers. However, it also became apparent that there were still discrepancies between the two databases: Some questionnaires could not be linked to waypoints and some waypoints could not be linked to questionnaires. These problems were solved with the help of maps, comparisons with nearby waypoints and, in some cases, with new visits to suspect dwellings or entire areas.
CSPro, a software package developed by the US Bureau of the Census, was used to capture, process, and edit data.
The data capture team was formed as soon as the interviewing was completed. Staff received training in subjects such as understanding the census process, the design of the census forms, and how to operate the data-entry program. Approximately 38 data-entry operators undertook the work in several shifts. The data was captured by keyboard.
After the data was entered, a process of editing and imputation of data was conducted.
It is important to mention data capture and processing was a weak part of the census operation. Data entry had to be repeated because of the many mistakes and errors done the first time. In addition, after the data was finally entered, it took an unnecessarily long period to process end edit it. Data-capture ended in February 2005 and the edited master file was not ready until December 2005. Even then there were still some inconsistencies, especially in fertility and school enrolment data. Such inconsistencies are minor, but they may reduce the credibility of some results.
The underlying causes of the previous problems were poor use of technical assistance, largely due to lack of management continuity. The NSD is aware of this weakness and the need to address it for the next census. |