# 1.1 - The Working Hypothesis

Using the scientific method, before any statistical analysis can be conducted, a researcher must generate a guess, or hypothesis about what is going on. The process begins with a *Working Hypothesis*. This is a direct statement of the research idea. For example, a plant biologist may think that plant height may be affected by applying different fertilizers. So they might say: "Plants with different fertilizers will grow to different heights". But according to the Popperian Principle of Falsification, we can't conclusively affirm a hypothesis, but we can conclusively negate a hypothesis. So we need to translate the working hypothesis into a framework wherein we state a null hypothesis that the average height (or mean height) for plants with the different fertilizers will all be the same. The alternative hypothesis (which the biologist hopes to show) is that they are not all equal, but rather some of the fertilizer treatments have produced plants with different mean heights. The strength of the data will determine whether the null hypothesis can be rejected with a specified level of confidence.

Pictured in the graph below, we can imagine testing three kinds of fertilizer and also one group of plants that are untreated (the control). The plant biologist kept all the plants under controlled conditions in the greenhouse, to focus on the effect of the fertilizer, they only thing we know to differ among the plants. At the end of the experiment, the biologist measured the height of each plant. This is the dependent or repsponse variable and is plotted on the vertical (y) axis. The biologist used a simple bar chart to plot the difference in the heights.

This bar chart is a customary way to show treatment (or factor) level means. In this case there was only one treatment: fertilizer. The fertilizer treatment had four levels that included the control, which received no fertilizer. Using this language convention is important because later on we will be using ANOVA to handle multi-factor studies (for example if the biologist manipulated the amount of water AND the type of fertilizer) and we will need to be able to refer to different treatments, each with their own set of levels.

The height of a bar in this bar graph shows the mean, and there are error bars ( I ) around each mean. Here the error bars show + / - 1 standard error (s.e.). Bar charts can be effective in graphically showing ANOVA results, but are often mis-used or are misleading. Truncating the vertical axis exaggerates the differences among bar heights, and unless the response variable is a ratio scale variable, relative heights of bars can be misleading. Another alternative is a 'means plot' (a scatter or interval plot):

This second method to plot the difference in the means of the treatments provides essentially the same information. However, this plot is showing the option of having the 'error bars' that are the 95% confidence interval limits around the means. Bar charts vs. means plots, and showing standard errors vs. confidence intervals is a matter of choice, and is usually determined by the conventions of particular discipline or journal. For this course you can use your choice of these methods. Note however, that in SAS, which will become necessary to use in the course, the means plot is generated by the software. I find that bar charts are best produced using Excel.

Check out this post that further explains the difference between the use of these charts.

But what about the lower case letters **a**, **ab**, **b**, and **c** in each of the graphs above? Any two means that do not share the same letter are significantly different after running the ANOVA. The letters are used to indicate which means differ and which ones don’t. The letters are obtained by mean comparison tests based on the ANOVA output.

In-between the statement of a Working Hypothesis and the endpoint of the complete graph is a 7-step process of statistical hypothesis testing.