Guide to the Census of Population, 2021
Chapter 12 – Sampling and weighting for the long form

For the 2021 Census Program, Canadian households are enumerated using two main types of questionnaires: the short-form questionnaire and the long-form questionnaire. The long-form questionnaire includes the same questions as the short form, as well as a set of additional questions aimed at providing a more comprehensive portrait of the Canadian population and Canadian households. The long-form questionnaire is distributed to a sample of the population.

The estimates produced from the responses to questions found on both questionnaires are obtained from a census of the population. As a result, all households contribute to a given figure, such as for the population count of a given age group.

The estimates produced from the responses to one or more questions on the long-form questionnaire are obtained from a sample survey. The respondent households from the long-form sample and the imputed non-respondent households in collection units in First Nations communities, Métis settlements, Inuit regions and other remote areas contribute to the estimate (e.g., the unemployment rate estimate or the estimate of the population by highest education level).

Selecting the sample for the census long-form questionnaire

The long-form questionnaire sample is selected from small geographic areas that cover the entire country, called collection units (CUs). CUs determine the strata for the sample plan. There are four types of CUs: list/leave, mail-out, mail-out with drop-off, and First Nations communities, Métis settlements, Inuit regions and other remote areas. In each CU (or stratum), the list of dwellings is drawn up, and a systematic sample of private dwellings is chosen. Collective dwellings are excluded from this draw. Households in the private dwellings selected from the sample receive the census long‑form questionnaire. Other households, i.e., those in the private dwellings that are not in the long-form sample and those in collective dwellings, which are excluded from the sample, receive the short-form questionnaire.

The sample for the long-form questionnaire is distributed uniformly among geographic areas to ensure that reliable estimates are produced for all regions across the country and to give the same relative importance to all geographic units of a given population size. As in 2016, one in four dwellings was selected to form the sample of the 2021 Census long-form questionnaire. There is one exception for the sampling fraction: All dwellings in CUs of First Nations communities, Métis settlements, Inuit regions and other remote areas were selected in the long-form questionnaire sample.

Weighting the sample of the census long-form questionnaire

The estimates produced from the final responses to the long-form questionnaire are weighted to represent the Canadian population living in private dwellings. Weighting is the process involving calculating the sample weight and various adjustments leading to the final weight. These include a weighting adjustment for the coverage of occupied dwellings based on the results of the Dwelling Classification Survey (DCS), an adjustment to correct total non-response of sampled households, and a calibration of the weights of respondent households to the totals derived from the census.

First, each household in the sample is given a sample weight. A household’s sample weight is the inverse of its probability of being selected for the sample. In CUs in First Nations communities, Métis settlements, Inuit regions and other remote areas, this weight is equal to 1, and in other types of CUs, it is equal to 4.

The first two weight adjustments are related to the concept of total non-response. Households that responded to at least one question specific to the long-form questionnaire are “respondent households” for the long-form questionnaire. Selected households that responded only to questions common to both types of questionnaires and households that did not respond to any questions are defined as “non-respondent households” for the long-form questionnaire.

In the CUs of First Nations communities, Métis settlements, Inuit regions and other remote areas, total non-response to the long-form questionnaire is compensated for through imputation. Data for households that did not respond to any questions are imputed using data from a respondent household. All private households in these CUs kept their sampling weight of 1 for estimation purposes.

In some CUs of First Nations communities, Métis settlements, Inuit regions and other remote areas, it is sometimes impossible to finish listing dwellings. These CUs are considered to be incompletely enumerated. The extent of non‑response is therefore unknown, and the population cannot be adequately represented statistically. Both the counts obtained from the short-form questionnaire and the estimates derived from the long-form questionnaire exclude populations living on incompletely enumerated reserves and settlements.

In other types of CUs, reweighting is used to process total non-response to the long-form questionnaire. To do this, several adjustments are made to the sampling weights. Only respondent households to the long-form questionnaire are assigned a weight that is not zero at the end of the weighting stages, meaning that they are the only ones contributing to the long-form questionnaire estimates.

Before carrying out imputation for total non-response in the census, undercoverage of dwellings that were occupied at the time of the census is estimated using the DCS and corrected by changing the occupancy status of certain dwellings. In fact, one source of coverage error in the census is certain dwellings being incorrectly classified on Census Day. This error can occur when an occupied dwelling is classified as unoccupied or when an unoccupied dwelling is classified as occupied. The purpose of the DCS is to estimate the number of these classification errors. To this end, a sample of private dwellings for which no census questionnaire was returned is contacted, and information is collected on their occupancy status on Census Day and—if the dwelling was occupied—on the number of usual residents. The DCS results guide the imputation for total non-response and census undercoverage.

Sampling weights are then adjusted in three steps. All these weight adjustments are done by calibration. Calibration consists of applying the smallest adjustment possible to the weight so that the weighted estimates correspond to the known counts. These steps are performed independently in each super aggregate dissemination areaNote 1 (SADA). A SADA is a group of aggregate dissemination areas (ADA) created in order to reach a population of between 50,000 and 150,000.

An initial calibration is done to ensure that coverage of the long-form sample is the same as  coverage of private dwellings in the census. A second calibration is done, which takes into account a logistic regression model for the propensity to respond, so that the weights of the sampled non-respondent dwellings are redistributed to the respondent dwellings. For these two steps, all the potential calibration constraints are identical. The constraints are selected so that the model in the second step explains the non‑response. This is designed to reduce potential bias due to non-response. Finally, a third calibration is performed. This one takes into account a much more detailed set of potential constraints. The purpose of this step is to improve consistency between the estimates from the sample and the known census counts and to reduce the variability of the estimates derived from the long-form questionnaire.

The weighted estimates from the long-form questionnaire may differ from census counts for common characteristics. In particular, this occurs when looking at a geographic area with boundaries that do not correspond to ADAs and SADAs. The smaller the geographic area, the more likely that estimates from the long-form questionnaire will differ from the census counts. When there are differences, users should consider the 2021 Census counts to be of better quality and prioritize them, as they are not affected by the sampling variance or the slightly higher non-response error of the long-form questionnaire. Estimates from the long-form questionnaire for characteristics found in both forms should be used as contextual information when analyzing data specific to this questionnaire.

As was the case in 2016, the variability of the long-form questionnaire estimates is estimated using a replication method. All adjustments described in this chapter are also applied to the replicate weights used for variance estimation.

More information on the weighting and estimation process will be provided in the Sampling and Weighting Technical Report, Census of Population, 2021, Catalogue no. 98-306-X.

