Recently, a group of medical students and health professionals from the Penn State College of Medicine agreed to work with a community leader in San Pablo, Ecuador to improve population health. The first step in the collaboration was an assessment of the current health of the population. Since public sanitation was minimal in this rural area, there was particular concern about intestinal diseases. The objective of the assessment was "to establish a population-based estimate of the prevalence of selected health conditions, including diarrhea and respiratory illness, and assess water sources and sanitation for households in San Pablo."
How would this be accomplished? There was little information available from public health (surveillance) records for this area. The team decided to conduct door-to-door, in-person interviews in the local language (Spanish). Their target respondent was an adult who was knowledgeable about the health of all the residents in the house and about selected household characteristics. What questions should be asked? How many households should be surveyed? How would households be selected for the survey?
We'll learn more about this particular experience and how such questions may be addressed as we study this week's lesson. When you have completed this lesson, you will be able to do the following:
An epidemiologic survey consists of simultaneous assessment of the health outcome and exposures as well as potential confounders and effect modifiers. A survey is considered a cross-sectional study. Some epidemiologists may call it a prevalnce study. The survey results provide a 'snapshot' of a population. Surveys are a useful tool for gauging the health of a population or to monitor effectiveness of a preventative intervention or provision of emergency relief.
While a survey may provide a relatively quick and inexpensive method for assessing the health of a population, there are drawbacks as noted in Table 1 below:
Table 1: Advantages and Disadvantages of Surveys
|Inexpensive||Exposure may not have preceded disease or outcome. This limits the assessment of causality. For example, a survey may ask about the current behavior of smoking and a diagnosis of ashma. While the results may show an association between smoking and asthma, we may not be able to accurately determine which came first.|
|Relatively quick||Disease and health outcomes with long duration can be over-represented.|
|Can help establish or clarify a hypothesis||
Less severe outcomes may be over-represented becuase they may not have been diagnosed at the time of the survey.
Survey are subject to information bias (e.g. from inaccuarte recall or misdiagnosis) and selection bias (e.g. those without telephone cannot be selected for random digit dial survey)
Some considerations in the design of survey sampling
Even though this is not a course on surveys, you should be aware of some approaches to drawing a sample for an epidemiologic survey. First, if the population can be enumerated (listed), a simple random sampling approach can be used to draw a representiave sample of potential participants. For example, you might generate a list of all children attending a public school and then from this list, randomly select students for the survey. Procedures for simple random sampling can be done in many software packages, including Excel. The use of simple sampling allows us to generalize the results of the survey back to the population from which the sample was drawn.
Sometimes, we want to make sure that there are an adequate number of responses from a groups that is relatively small. To do that, we might use stratified random sampling which divides groups into homogeneos groups. Then we can draw simple random samples from each of the groups. Stratified sampling assures that selected subgroups of the population will be represented in the sample. If the strata are homogeneous, statistical precision from stratified sampling is greater than that achieved with simple random sampling. Stratified samples can be proportionate (or disproportionate) to the size of the stratum . If sampling is disproportionate, overall population estimates are constructed by weighting within-group estimates by the sampling fraction. Cluster sampling is a specific type of stratified sampling, and often refers to sampling from geographic areas. A cluster might be a zip code area in the US or streets within a city.
Systematic sampling occurs when we select our sample in a systemic manner. For example, you might select every 10th house on a street to participate in a household survey. Systematic sampling can be easier to implement than simple random sampling and may represent the population as well as a simple random sample. However, if every rth unit corresponds to an existing sequence in the population with the result that each member of the sample was selected from the same part of the recurring pattern, the sample will be biased. For example, if an observation is made every seventh day, beginning on a Monday, the entire sample will only represent Monday experiences.
Multi-stage sampling occurs when a combination of sampling methods is used.
Fially, tthere are several types of surveys that may be used but may produce biased population estimates. First, we may choose a convenience sample, such as randomly asking people on a street corner or in a store to particiapte in a survey. The convenience sample may be useful in gathering preliminary or pilot data for a future sruvey that would be larger and have more rigourous sampling methods. Finally, you may choose purposive sampling because you are particularly interested in the responses of a specifc group. Each of these approaches are useful, but to what population can the results be generalized?
Come up with an answer to this question and then click on the icon to the left to reveal the answer.
In the example above, to what population can the results be generalized?
It is not clear what population the respondants will represent. Perhaps the sample will represent those individuals in the study area who are healthy enough to travel and motivated to report on health conditions in their household or village. Unknown biases are problems with convenience samples. Suppose a researcher invites community midwives to a training session where he will also assess maternal and infant health in their villages from their responses to a survey. This would be a purposive sample. A purposive sample can produce results representing the targeted group, but will over-represent those in the population who are readily available.
How did the researchers decide to sample the village of San Pablo?
We were concerned that we might not have enough time in Ecuador to adequately survey all neighborhoods of San Pablo. So, we used Google Earth and took a preliminary walking tour in order to divide the community into four approximately equal-sized (number of households) sectors (strata). We then rotated our days of surveying into each of these sectors. This assured that we had approximately an equal amount of time for surveying each sector which would be important if the sectors were substantially different (e.g., different type of water supply). As it turned out, the surveys went very well and we were able to complete the survey processs in each of the four sectors in San Pablo. Households in each sector were systematically sampled: every 15th house on both vertical and horizontals streets was entered into the sample. This produced a population-based estimate of the health and expoure, both self-reported, of San Pablo.
We were also interested in neighboring community, Rio Guayas. Rio Guayas had fewer households and was a planned community, substantially different from San Pablo. The houses were newer and cinder block. The water was centralized. The population was younger. We sampled every 5th household in Rio Guayas on both horizontal and vertical streets. This is an example of a survey where a choice was made to sample different proportions in different strata. (Rio Guayas vs the 4 sectors of San Pablo)
Survey questions and administration:
Survey questions are carefully structured in order to reduce bias. Care should be given to the wording and order of questions. Using a standard questionnaire increases reliablity and validity of the results. A reliable survey has internal consistency and produces results that are replicable. The subject would answer the question in the same way if asked again. Valid questions are those which accurately assess the specific concept that is being measured.
The process of administering a survey should be standardized to reduce potential for bias.The respondent should be informed of the purpose of the research and freely consent to participate. A survey with a low response rate is likely to have some bias.
Here are examples of research assessing the validity and reliability of a survey instrument, the Behaviorial Risk Factor Surveillance System: https://www.cdc.gov/brfss/publications/data_qvr.htm 
(Statistics 507 is a survey course in epidemiologic research methods so we will not delve into the strengths and weaknesses of various methods for evaluating reliablity and validity of a survey instrument as might be presented in a psychometric course. You should however recognize the need to consider this type of analysis when selecting a survey instrument.)
In San Pablo, verbal informed consent was obtained from the potential respondent before administering the survey. The respondent was frequently the head of the household. The survey consisted of two components, a household component and an individual component. Questions were both closed- and open-ended. The household component was a census of all persons residing in the household as well as questions about the water supply and sanitation for the household and utilization of medical care by household members. A water sample was also collected from selected households. For the individual component, questions were directed toward the education, employment (adults) and health of each person in the household. Both components were adapted from UNICEF surveys to increase reliability and validity. The survey instrument used in San Pablo (English version) is here .
Data from a survey that includes indicators of the presence or absence of disease and a risk factor can be summarized as shown below:
Table 2: 2 × 2 Table for an Epidemilogic Cross-Sectional Study
Total Not Exposed
Measures of Disease Frequency
Measures of Association
[A/(A+C)] / [B/(B+D)]
[A/(A+B)] / [C/(C+D)]
[A/(A+C)] - [B/(B+D)]
[A/(A+B)] - [C/(C+D)]
[A/C] / [B/D] = [A*D] / [B*C]
Logistic regresssion would be an appropriate statistical method because the outcome is binary (case, non-case).. The important point is that since this table shows data from a survey, it provides only only a 'snapshot' of the situation. Time-to-event methods of analysis would not be applicable from such data.
Summarizing the data from San Pablo:
In San Pablo, 75 households were entered into the sample. Of these, 62 (82.7%) completed a household survey, 11 were not at home or the house was abandoned. Only 2 households refused to participate in the survey. Thirty-seven (59.7%) of responding households used piped water and 25 (40.3%) used bottled water; 51 (82.3%) of households subsequently boiled water that was to be used for drinking purposes. Water used for cooking and washing purposes was typically not boiled or treated additionally. Limited money was frequently reported as a condition which prevented residents from seeking medical care. Perhaps because of the large percentage of households that boiled drinking water, there was a generally low prevalence of acute and chronic diarrheal disease. Respiratory conditions however, were more prevalent than anticipated.
Might there be other reasons the survey results indicate a lower than expected prevalence of diarrheal disease?
In the US, governmental agencies conduct surveys for various purposes at regular intervals. Investigate these surveys by following the links below, then complete Table 3.
1. Behavioral Risk Factor Surveillance System (BRFSS) https://www.cdc.gov/BRFSS/ 
2. Youth Risk Behavior Surveillance System (YRBSS) https://www.cdc.gov/HealthyYouth/yrbs/index.htm 
3. National Health Interview Survey (NHIS) https://www.cdc.gov/nchs/nhis.htm 
4. National Health and Nutrition Examination Survey (NHANES) https://www.cdc.gov/nchs/nhanes.htm 
5. California Health Interview Survey (CHIS) https://healthpolicy.ucla.edu/chis/about/Pages/about.aspx 
Table 3: Comparison of US Health Surveys
|Target Population||Mode/Sampling Strategy/Size||Health Issues; Example of a Disease/Outcome and an Exposure||Notes|
You have finished the reading for Week 5. Table 3 above, completed, is part of your homework. (A blank WORD table is part of the Homework 3 file or you may create your own table.)
Check for the remaining homework problems in the Week 5 Homework 3 folder.