Lesson 10: OneWay ANOVA
Lesson 10: OneWay ANOVAObjectives
 Explain why it is not appropriate to conduct multiple independent t tests to compare the means of more than two independent groups
 Use Minitab to construct a probability plot for an F distribution
 Use Minitab to perform a oneway ANOVA with Tukey's pairwise comparisons
 Interpret the results of a oneway ANOVA
 Interpret the results of Tukey's pairwise comparisons
In previous lessons you learned how to compare the means of two independent groups. In this lesson, we will learn how to compare the means of more than two independent groups. This procedure is known as a oneway analysis of variance, or more often as a "oneway ANOVA."
A frequently asked question is, "why not just perform multiple two independent samples \(t\) tests?" If you were to perform multiple independent \(t\) tests instead of a oneway ANOVA you would need to perform more tests. For \(k\) independent groups there are \(\frac{k(k1)}{2}\) possible pairs. If you had 5 independent groups, that would equal \(\frac{5(51)}{2}=10\) independent t tests! And, those 10 independent t tests would not give you information about the independent variable overall. Most importantly, multiple \(t\) tests would lead to a greater chance of making a Type I error. By using an ANOVA, you avoid inflating \(\alpha\) and you avoid increasing the likelihood of a Type I error.
10.1  Introduction to the F Distribution
10.1  Introduction to the F DistributionOneway ANOVAs, along with a number of other statistical tests, use the F distribution. Earlier in this course you learned about the \(z\) and \(t\) distributions. You computed \(z\) and \(t\) test statistics and used those values to look up pvalues using statistical software. Similarly, in this lesson you are going to compute F test statistics. The F test statistic can be used to determine the pvalue for a oneway ANOVA.
The video below gives a brief introduction to the F distribution and walks you through two examples of using Minitab Express to find the pvalues for given F test statistics. The steps for creating a distribution plot to find the area under the F distribution are the same as the steps for finding the area under the \(z\) or \(t\) distribution. For the F distribution we will always be looking for a righttailed probability. Later in this lesson we will see that this area is the pvalue.
The F distribution has two different degrees of freedom: between groups and within groups. Minitab Express will call these the numerator and denominator degrees of freedom, respectively. Within groups is also referred to as error.
 Between Groups (Numerator) Degrees of Freedom

\(df_{between}=k1\)

\(k\) = number of groups
 Within Groups (Denominator, Error) Degrees of Freedom

\(df_{within}=nk\)

\(n\) = total sample size with all groups combined
\(k\) = number of groups
MinitabExpress – Creating an F Distribution
Scenario: An F test statistic of 2.57 is computed with 3 and 246 degrees of freedom. What is the pvalue for this test?
We can create a distribution plot. Our distribution is the F distribution. The numerator df (\(df_1\)) is 3 and the denominator df (\(df_2\)) is 246. We want to shade the area in the right tail. Our “X Value” is 2.57.
 On a PC: STATISTICS > Distribution Plot > Display Probability
On a Mac: Statistics > Probability Distributions > Distribution Plot > Display Probability  Change the Distribution to F
 Fill in the Numerator degrees of freedom with 3 and the Denominator degrees of freedom with 246
 Select A specified x value
 Use the default Right tail
 For the X value enter 2.57
The area beyond an Fvalue of 2.57 with 3 and 246 degrees of freedom is 0.05487. The pvalue for this F test is 0.05487.
Note: When you conduct an ANOVA in Minitab Express, the software will compute this pvalue for you.
Below is an interactive video designed to help you review the F distribution and practice using Minitab Express and StatKey to look up pvalues.
10.2  Hypothesis Testing
10.2  Hypothesis TestingA oneway ANOVA is used to compare the means of more than two independent groups. A oneway ANOVA comparing just two groups will give you the same results at the independent \(t\) test that you conducted in Lesson 8. We will use the five step hypothesis testing procedure again in this lesson.
The assumptions for a oneway ANOVA are:
 Samples are independent
 The response variable is approximately normally distributed or all sample sizes are at least 30
 The population variances are equal across responses for the group levels (if the largest sample standard deviation divided by the smallest sample standard deviation is not greater than two, then assume that the population variances are equal)
Given that you are comparing \(k\) independent groups, the null and alternative hypotheses are:
\(H_{0}: \mu_1 = \mu_2 = \cdots = \mu_k\)
\(H_{a}:\) Not all \(\mu_\cdot\) are equal
In other words, the null hypothesis is that at all of the groups' population means are equal. The alternative is that they are not all equal; there are at least two population means that are not equal to one another.
The oneway ANOVA uses an F test statistic. Hand calculations for ANOVAs require many steps. In this class, you will be working primarily with Minitab Express outputs.
Conceptually, the F statistic is a ratio: \(F=\frac{Between\;groups\;variability}{Within\;groups\;variability}\). Numerically this translates to \(F=\frac{MS_{Between}}{MS_{Within}}\). In other words how much do individuals in different groups vary from one another over how much to individuals within groups vary from one another.
Statistical software will compute the F ratio for you and produce what is known as an ANOVA source table. The ANOVA source table will give you information about the variability between groups and within groups. The table below gives you all of the formulas, but you will not be responsible for performing these calculations by hand. Minitab Express will do all of these calculations for you and provide you with the full ANOVA source table.
Source  SS  df  MS  F  p 

Between Groups (Factor)  \(\sum_{k}n_k(\overline{x}_k\overline{x}_\cdot)^2\)  \(k1\)  \(\frac{SS_{Between}}{df_{Between}}\)  \(\frac{MS_{Between}}{MS_{Within}}\)  Area to the right of F_{k1, nk} 
Within Groups (Error)  \(\sum_k \sum_i(x_{ik}\overline{x}_k)^2\)  \(nk\)  \(\frac{SS_{Within}}{df_{Within}}\)  
Total  \(\sum_k \sum_i(x_{ik}\overline{x}_\cdot)^2\)  \(n1\) 
\(k\)  Number of groups 
\(n\)  Total sample size (all groups combined) 
\(n_k\)  Sample size of group \(k\) 
\(\overline{x}_k\)  Sample mean of group \(k\) 
\(\overline{x}_\cdot\)  Grand mean (i.e., mean for all groups combined) 
SS  Sum of squares 
MS  Mean square 
df  Degrees of freedom 
F  Fratio (the test statistic) 
Some of the terms in the table above should look familiar, while others will be new to you. The sum of squares that appears in the ANOVA source table is similar to the sum of squares that you computed in Lesson 2 when computing variance and standard deviation. Recall, the sum of squares is the squared difference between each score and the mean. Here, there are three different sum of squares each measuring a different type of variability.
The ANOVA source table also has three different degrees of freedom: \(df_{between}\), \(df_{within}\), and \(df_{total}\). If you were to look up an F value using statistical software you would need to know two of these degrees of freedom: \(df_1 = df_{between}\) and \(df_2=df_{within}\).
When performing a oneway ANOVA using statistical software, you will be given the pvalue in the ANOVA source table. If performing a oneway ANOVA by hand, you would use the F distribution. Similar to the t distribution, the F distribution varies depending on degrees of freedom.
If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis.
Based on your decision in Step 4, write a conclusion in terms of the original research question.
10.3  Pairwise Comparisons
10.3  Pairwise ComparisonsWhile the results of a oneway ANOVA will tell you if there is what is known as a main effect of the explanatory variable, the initial results will not tell you which groups are different from one another. In order to determine which groups are different from one another, a posthoc test is needed. Posthoc tests are conducted after a oneway ANOVA to determine which groups differ from one another. There are many different posthoc analyses that could be performed following a oneway ANOVA. Here, we will learn about one of the most common tests known as Tukey's Honestly Significant Differences (HSD) Test.
Most statistical software, including Minitab Express, will compute Tukey's pairwise comparisons for you. This specific posthoc test makes all possible pairwise comparisons. In this class we will be relying on statistical software to perform these analyses, if you are interested in seeing how the calculations are performed, this information is contained in the notes for STAT 502: Analysis of Variance and Design of Experiments. This analysis takes into account the fact that multiple tests are being performed and makes the necessary adjustments to ensure that Type I error is not inflated.
In the following examples you will see a number of Tukey posthoc tests. You will also learn how to obtain these results using Minitab Express.
For each pairwise comparison, \(H_0: \mu_i  \mu_j=0\) and \(H_a: \mu_i  \mu_j \ne 0\).
10.4  Minitab Express: OneWay ANOVA
10.4  Minitab Express: OneWay ANOVAIn one research study, 20 young pigs are assigned at random among 4 experimental groups. Each group is fed a different diet. (This design is a completely randomized design.) The data are the pigs' weights in kg after being raised on these diets for 10 months. We wish to ask whether mean pig weights are the same for all 4 diets.
 \(H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4\)
 \(H_a:\) Not all \(\mu\) are equal
Feed_1  Feed_2  Feed_3  Feed_4 

60.8  68.3  102.6  87.9 
57.1  67.7  102.2  84.7 
65.0  74.0  100.5  83.2 
58.7  66.3  97.5  85.8 
61.8  69.9  98.9  90.3 
Contained in the Minitab Express file:
Note that in this file the data were entered so that each group is in its own column. In other words the responses are in a separate column for each factor level. In later examples you will see that Minitab Express will also conduct a oneway ANOVA if the responses are all in one column with the factor codes in another column.
MinitabExpress – OneWay ANOVA
To perform an Analysis of Variance (ANOVA) test in Minitab Express:
 Open the ANOVA_ex.MTW data set.
 From the menu bar, select Statistics > ANOVA > OneWay ANOVA.
 Click the dropdown menu and select "Responses are in a separate column for each factor level".
 Doubleclick on the variables Feed_1, Feed_2, Feed_3, and Feed_4 to insert them into the "Responses" box.
 Click the comparisons tab and check the box next to "Tukey (family error rate)".
 Click OK.
The result should be the following output:
Null hypothesis  All means are equal 

Alternative hypothesis  At least one mean is different 
Significance level  \(\alpha=0.05\) 
Equal variances were assumed for the analysis
Factor  Levels  Values 

Factor  4  Feed_1, Feed_2, Feed_3, Feed_4 
Source  DF  Adj SS  Adj MS  FValue  PValue 

Factor  3  4703.188  1567.72933  206.72  <0.0001 
Error  16  121.340  7.58375  
Total  19  4824.528 
S  Rsq  Rsq(adj)  Rsq(pred) 

2.75386093  97.48%  97.01%  96.07% 
Factor  N  Mean  StDev  95% CI 

Feed_1  5  60.680  3.028  (58.069, 63.291) 
Feed_2  5  69.240  2.958  (66.629, 71.851) 
Feed_3  5  100.3400  2.1640  (97.7292, 102.9508) 
Feed_4  5  86.380  2.782  (83.769, 88.991) 
Pooled StDev = 2.75386093
Factor  N  Mean  Grouping  

Feed_3  5  100.34  A  
Feed_4  5  86.38  B  
Feed_2  5  69.24  C  
Feed_1  5  60.68  D 
Means that do not share a letter are significantly different.
Difference of Levels  Difference of Means  SE of Difference  95% CI  TValue  Adjusted PValue 

Feed_2Feed_1  8.560  1.742  (3.572, 13.548)  4.91  0.0008 
Feed_3Feed_1  39.660  1.742  (34.672, 44.648)  22.77  <0.0001 
Feed_4Feed_1  25.700  1.742  (20.712, 30.688)  14.76  <0.0001 
Feed_3Feed_2  31.100  1.742  (26.112, 36.088)  17.86  <0.0001 
Feed_4Feed_2  17.140  1.742  (12.152, 22.128)  9.84  <0.0001 
Feed_4Feed_3  13.960  1.742  (18.948,8.972)  9.02  <0.0001 
Individual confidence level = 98.87%
Select your operating system below to see a stepbystep guide for this example.
10.5  Example: SATMath Scores by Award Preference
10.5  Example: SATMath Scores by Award PreferenceThe video below walks through an example of obtaining and interpreting all of the output provided by Minitab Express when a oneway ANOVA with Tukey pairwise comparisons is preformed.
The example in this video uses the StudentSurvey.MTW dataset provided by the Lock5 textbook. In this example we are comparing the SAT scores of students who said that they would prefer to win an Academy Award, a Nobel Prize, or an Olympic gold medal.
10.6  Example: Exam Grade by Professor
10.6  Example: Exam Grade by ProfessorThis example uses the following dataset:
Download this Minitab dataset to follow along.
Three professors were each teaching one section of a course. They all gave the same final exam and they want to know if there are any differences between their sections’ scores.
\(H_0:\mu_1=\mu_2=\mu_3\)
\(H_a: Not\;all\;\mu\;are\;equal\)
Instructor  N  Mean  StDev  95% CI 

Dr. Al  60  68.367  17.719  (63.977, 72.756) 
Dr. Oh  87  71.448  16.702  (67.803, 75.094) 
Dr. Pa  98  67.939  17.465  (64.504, 71.373) 
Pooled StDev = 17.2609
The standard deviations for all three classes are all similar.
Using Minitab Express for Mac or PC: Statistics > ANOVA > OneWay ANOVA
The result is the following ANOVA source table:
Source  DF  Adj SS  Adj MS  FValue  PValue 

Instructor  2  635.3  317.671  1.07  0.3459 
Error  242  72101.1  297.938  
Total  244  72736.4 
F (2, 242) = 1.07
From our ANOVA source table, p = .3459
Because \(p > \alpha\), we fail to reject the null hypothesis.
There is NOT evidence that the mean scores from the three different professors’ sections are different.
There is some debate as to whether pairwise comparisons are appropriate when the overall oneway ANOVA is not statistically significant. Some argue that if the overall ANOVA is not significant then pairwise comparisons are not necessary. Others argue that if the pairwise comparisons were planned before the ANOVA was conducted (i.e., "a priori") then they are appropriate.
The results of our Tukey pairwise comparisons were as follows:
Instructor  N  Mean  Grouping 

Dr. Oh  87  71.448  A 
Dr. Al  60  68.367  A 
Dr. Pa  98  67.939  A 
Means that do not share a letter are significantly different.
Difference of Levels  Difference of Means  SE of Difference  95% CI  TValue  Adjusted PValue 

Dr. OhDr. Al  3.082  2.897  (3.698, 9.861)  1.06  0.5366 
Dr. PaDr. Al  0.428  2.829  (7.050, 6.195)  0.15  0.9875 
Dr. PaDr. Oh  3.510  2.543  (9.460, 2.441)  1.38  0.3512 
Individual confidence level = 97.99%
Looking at the first table, all three instructors are in group A. Means that share a less are not significantly different from one another (i.e., they are in the same group). Because all three instructors share the letter A, there are no significantly different pairs of instructors.
We could also look at the second table which gives us the t test statistic and adjusted pvalue for each possible pairwise comparison. This pvalue is adjusted to take into account that multiple tests are being conducted. You can compare these pvalues to the standard alpha level of .05. All pvalue are greater than .05, therefore no pairs are significantly different from one another.
10.7  Lesson 10 Summary
10.7  Lesson 10 SummaryObjectives
 Explain why it is not appropriate to conduct multiple independent t tests to compare the means of more than two independent groups
 Use Minitab to construct a probability plot for an F distribution
 Use Minitab to perform a oneway ANOVA with Tukey's pairwise comparisons
 Interpret the results of a oneway ANOVA
 Interpret the results of Tukey's pairwise comparisons
In this lesson you learned how to compare the means of three or more groups using a oneway ANOVA. A oneway ANOVA is used instead of multiple independent \(t\) tests in order to avoid increasing the likelihood of committing a Type I error.
A oneway ANOVA provides information about the explanatory variable overall, but not about differences between the different levels of that variable. In order to compare the different pairs we need to conduct a posthoc analysis such as Tukey's HSD test.
This lesson gave you a brief overview of the oneway ANOVA. If you would like to learn more about analysis of variance techniques, ask your instructor about some of the more advanced statistics courses available on the topic.