Printer-friendly versionPrinter-friendly version

A basic tenet of scientific experimentation is that phenomena worthy of scientific investigation are replicable. However, not everything that is replicable is worthy of investigation.  Unfortunately, in our pursuit of replicable biological phenomena, we can readily introduce replicable biases.  Thus we always have to balance good protocols, which enhance replicability by reducing variability, with enough realism so that the results are due to the biology, rather than the protocol.  

We are inevitably going to find things that are not biology. Until recently, many scientists, including me, assumed that science was self-correcting - because of the way in which subsequent experiments depend on earlier results we assumed that errors (i.e. findings due to chance or bias) would not persist.  However, in recent years it has become increasingly evident that this is not the case.  Despite the wide-spread dissemination of results via journals and the internet,  careful perusal of the biological and medical literature has shown that even results published in highly regarded venues may not be biologically sound e.g. [1].  While there are many reasons for this, one of the important findings is that the most reliable results come from carefully designed experiments following statistical principles.

If we could observe the entire population with no measurement error, we would not need carefully designed experiments. We would just measure everything and we  would then know everything about the population at least from the measurement aspect.  For example, we would know the exact mean difference in gene expression between two tissue types - although we would still need to understand the biological implications of this difference.  Alternatively, if all the members of the population were identical and there were no measurement error, we could measure a single individual from the population and make inferences about the population.

Of course, we actually have both biological variability and measurement error and so the objective of an experiment is to use a sample to produce an accurate inference about the population.  We need to be able to deal with experimental error and biological variation without having to measure every single element of the population.

When we design an experiment we need to keep in mind what  population we want to make inferences about. For example,  usually two observations from the same individual are technical replicates. However, if we are actually interested in spatial diversity and gene expression in a tumor, then they are not technical replicates anymore because the populations of interest will be of different parts of the tumor rather than the tumor itself. For a discussion about level of replication see [2].

We also need to ensure that the samples we take our representative of the population of interest, not just one member of the population.

This has three implications:

1) We need to have replication, i.e. several individuals from our population. So, for instance, if we are interested in specific type of tumor and we don't have other biological specimens that have the same type of tumor, then what we might be looking at are different parts of that tumor to determine how variable these might be.  However, if we're looking at colon cancer tissue versus normal colon tissue then we really want this from different individuals because thinking about the populations as being colon tumor from all humans and normal colon tissue from all humas.

2) We will always need to estimate biological variability of the population using the samples we take.

3) Sampling should be done at random.Randomness sounds like a simple idea but it is really difficult to explain. For example, suppose you're doing a rat study and you will have two treatments. Suppose the rats have been housed together in a big cage and suppose you pulled out five of them and assign them to treatment one and then the next five to treatment two. Even if you think you are pulling them out "at random", you cannot be sure. The first five that you caught might be less active than the next five that you caught. Formally speaking, we should label all the rats and then take a random sample of the labels, using a random number generator on the computer or pulling the labels out of a hat (after mixing well).  In cases where it is not practical to do a formal randomization by labeling, we hope that our sampling design mimics such a purely random process.  When the treatment is controllable, we can at least assign the treatments at random to our sample.

Of course there are studies in which we can't randomize. If you are studying tissue samples from mammoths you probably take whatever you can get. If you're looking at brain tissue in humans you will have whatever samples you can obtain from cadavers and you will have to assume that they came in at random from the population

[1] Ioannidis JPA (2005) Contradicted and initially stronger effects in highly cited clinical research. JAMA 294: 218–228.

[2] Blainey, Paul, Martin Krzywinski, and Naomi Altman. "Points of significance: replication." Nature methods 11.9 (2014): 879-880.