Aboriginal Peoples Technical Report, Census of Population, 2016
3. Data processing
The processing phase of the 2016 Census began with the translation of responses into meaningful data. This part of the survey cycle was divided into six main activities:
- receipt and registration
- imaging and data capture
- edit and imputation
For general information regarding data processing, please refer to the Guide to the Census of Population, 2016, Catalogue no. 98-304-X.
3.1 Coding of the Membership in a First Nation or Indian band write-in question
Write-in responses to the Membership in a First Nation or Indian band question were coded to a list of over 600 First Nations and Indian bands. Automated coding was used to code 73.2% of responses. The remaining responses were coded with an interactive application. This application included several reference files, including a file that contained different spellings of First Nation and Indian band names and their corresponding codes, and a file that contained geographic codes for Indian reserves, the names of those reserves and the names of First Nations or Indian bands that were affiliated with those reserves.Note 1
3.1.1 Edit and imputation of Aboriginal variables
After the data capture and initial editing and coding operations were completed, the data were processed up to the final edit and imputation stage. This final edit stage detected invalid responses and inconsistencies, and identified unanswered questions. Imputation replaced missing, invalid or inconsistent responses with plausible values. When carried out properly, imputation can improve data quality by replacing non-responses with plausible responses that are similar to the responses the respondents would have given if they had answered the questions. Imputation also has the advantage of producing a complete dataset.
The nearest-neighbour-donor method was used to impute census data. This method is widely used in the treatment of item non-response. This method replaces missing, invalid or inconsistent information about one respondent with values from a similar respondent. The rules for identifying the respondent most similar to the non-respondent may vary depending on the variables to be imputed. Donor imputation methods have good properties and generally will not alter the distribution of the data, unlike many other imputation techniques. Nearest-neighbour imputation ensures data consistency (see Chapter 8 of the Guide to the Census of Population, 2016, Catalogue no. 98-304-X).
In 2016, the following variables were processed together, and interrelations between them were clearly defined in advance: immigration, citizenship, place of birth, ethnic origin or Aboriginal ancestry, population group or visible minority, Aboriginal group, Registered or Treaty Indian status, and Membership in a First Nation or Indian band. As much as possible, donor imputation for missing information within these variables was done with one donor for all variables.
All people who required imputation and who were not census family (CF) children had their missing data imputed using a single donor who was also not a CF child. The imputed information for CF children could be from a person within the same CF, from a person outside the CF, or from deterministic rules, in that order of preference. As a result, the imputed records were internally consistent and based on actual full responses, rather than multiple-donor responses that might have resulted in inconsistent information.
The low rates for item non-response and invalid response rates, and the corresponding low imputation rates for Aboriginal variables (i.e., Aboriginal group, Registered or Treaty Indian status, and Membership in a First Nation or Indian band—see Table 2), had little overall impact on data quality.
|Registered or Treaty Indian status
|Membership in a First Nation or Indian band
|Newfoundland and Labrador||2.2||2.3||3.3|
|Prince Edward Island||1.7||1.9||2.6|
|Source: Statistics Canada, Census of Population, 2016.|
Weighting was done to ensure that results were representative of the entire population. Therefore, each household was given a sample weight equivalent to the inverse of its probability of selection in the sample. In collection units (CUs) where enumerators conducted personal interviews and 100% of the households were asked to complete a long-form questionnaire, this weight was 1. In non-canvasser CUs, this weight was generally 4.
In the canvasser-enumerated CUs, non-response to the long-form questionnaire was accounted for by imputation. Data for households that did not respond to any questions were imputed using data from a respondent household in the same type of CU. All private households in these CUs that were not part of incompletely enumerated Indian reserves and establishments kept their sample weight of 1 for estimation purposes. Other private households and collective households were attributed a final weight of nil, and therefore did not contribute to the estimates.
In the non-canvasser CUs, several adjustments were made to the weight, and a different imputation method was used.
Further information regarding weighting can be found in Chapter 9 of the Guide to the Census of Population, 2016.
Report a problem on this page
Is something not working? Is there information outdated? Can't find what you're looking for?
Please contact us and let us know how we can help you.
- Date modified: