# 3.1 - Statistical Jargon

Let's start with some statistical terminology.

A sample is a set of observed items from the population. Sometimes we also refer to the individual items as samples. Generally, for statistical inference we need information about how variable these items are. In addition, we also might need to know about our measurement error, because none of the instruments that we use for measuring are exact, especially for measurements like protein and nucleic acid quantification, or identification of molecular components for which two measurements on the item will not yield exactly the same value.

Measurement error can be estimated by taking technical replicates, i.e., basically taking many measurements on the same sample. If these vary then we have measurement error. However, it turns out that knowing a lot about measurement error or putting in a lot of effort to improve the precision of the measurement on each item does not help us in most cases because we are not interested in comparing a single item to another single item. We are interested in making statements about populations such as genotypes, before and after exposure groups, and so on. For inferences about biological populations, technical replicates are typically averaged to reduce the measurement error.

A comparative experiment will have several populations which will be compared. For example, if we have two genotypes at two different exposures to a pathogen, then we have four populations defined by the 4 combinations. Each of the populations is considered a "treatment group" even if one group is a control. The defining variables, for example "genotype" and "pathogen level" are called the factors. The values of each factor are called the levels - for example "wild type" and "mutant" might be the levels of genotype, and "low" and "high" might be the levels of pathogen. The combinations of levels are called the treatments, e.g. wild type low pathogen level. Factors may be categorical, like genotype, or quantitative like pathogen level. They can also be controllable, like pathogen level, which means that individuals can be assigned at random to the levels, or intrinsic, like genotype, in which case they cannot be assigned.

We randomize in the experiment by assigning individuals at random to the levels of controllable factors, or selecting individuals at random from the levels of intrinsic factors.