11.2 - Goodness of Fit Test

11.2 - Goodness of Fit Test

A chi-square goodness-of-fit test can be conducted when there is one categorical variable with more than two levels. If there are exactly two categories, then a one proportion z test may be conducted. The levels of that categorical variable must be mutually exclusive. In other words, each case must fit into one and only one category.

We can test that the proportions are all equal to one another or we can test any specific set of proportions.

If the expected counts, which we'll learn how to compute shortly, are all at least five, then the chi-square distribution may be used to approximate the sampling distribution. If any expected count is less than five, then a randomization test should be conducted. 

Possible Research Questions
  • When randomly selecting a card from a deck with replacement, are we equally likely to select a heart, diamond, spade, and club?
  • According to one research study, in the United States 2% of adults identify as homosexual, 2% as bisexual, and 96% as heterosexual. Are these proportions different in the population of Penn State students?
  • A concessions stand sells blue, red, purple, and green freezer pops. They survey a sample of children and ask which of the four colors is their favorite. They want to know if the colors differ in popularity. 
Test Statistic

In conducting a goodness-of-fit test, we compare observed counts to expected counts. Observed counts are the number of cases in the sample in each group. Expected counts are computed given that the null hypothesis is true; this is the number of cases we would expect to see in each cell if the null hypothesis were true.

Expected Cell Value
\(E=n (p_i)\)

\(n\) is the total sample size
\(p_i\) is the hypothesized proportion of the "ith" group

The observed and expected values are then used to compute the chi-square (\(\chi^2\)) test statistic.

Chi-Square (\(\chi^2\)) Test Statistic

\(\chi^2=\Sigma \frac{(Observed-Expected)^2}{Expected}\)

Approximating the Sampling Distribution

StatKey has the ability to conduct a randomization test for a goodness-of-fit test. There is an example of this in Section 7.1 of the Lock5 textbook.  If all expected values are at least five, then the sampling distribution can be approximated using a chi-square distribution.

Like the t distribution, the chi-square distribution varies depending on the degrees of freedom. Degrees of freedom for a chi-square goodness-of-fit test are equal to the number of groups minus 1. The distribution plot below compares the chi-square distributions with 2, 4, and 6 degrees of freedom.

Probability distribution plot made using Minitab Express; 3 chi-square distributions are overlaid with degrees of freedom of 2, 4, and 6

To find the p-value we find the area under the chi-square distribution to the right of our test statistic. A chi-square test is always right-tailed. 


11.2.1 - Five Step Hypothesis Testing Procedure

11.2.1 - Five Step Hypothesis Testing Procedure

The examples on the following pages use the five step hypothesis testing procedure outlined below. This is the same procedure that we used to conduct a hypothesis test for a single mean, single proportion, difference in two means, and difference in two proportions.

Step 1: Check assumptions and write hypotheses

When conducting a chi-square goodness-of-fit test, it makes the most sense to write the hypotheses first. The hypotheses will depend on the research question. The null hypothesis will always contain the equalities and the alternative hypothesis will be that at least one population proportion is not as specified in the null.

In order to use the chi-square distribution to approximate the sampling distribution, all expected counts must be at least five.

Expected Count
\(E=np_i\)

Where \(n\) is the total sample size and \(p_i\) is the hypothesized population proportion in the "ith" group.

To check this assumption, compute all expected counts and confirm that each is at least five.

Step 2: Compute the test statistic

In Step 1 you already computed the expected counts. Use this formula to compute the chi-square test statistic:

Chi-Square Test Statistic
\(\chi^2=\Sigma \frac{(O-E)^2}{E}\)
Where \(O\) is the observed count for each cell and \(E\) is the expected count for each cell.
Step 3: Determine the p-value

Construct a chi-square distribution with degrees of freedom equal to the number of groups minus one. The p-value is the area under that distribution to the right of the test statistic that was computed in Step 2. You can find this area by constructing a probability distribution plot in Minitab Express. 

Step 4: Make a decision

Unless otherwise stated, use the standard 0.05 alpha level.

\(p \leq \alpha\) reject the null hypothesis.

\(p > \alpha\) fail to reject the null hypothesis.

Step 5: State a real-world conclusion

Go back to the original research question and address it directly. If you rejected the null hypothesis, then there is evidence that at least one of the population proportions is not as stated in the null hypothesis. If you failed to reject the null hypothesis, then there is not evidence that any of the population proportions are different from what is stated in the null hypothesis. 


11.2.1.1 - Video: Cupcakes (Equal Proportions)

11.2.1.1 - Video: Cupcakes (Equal Proportions)

11.2.1.2- Cards (Equal Proportions)

11.2.1.2- Cards (Equal Proportions)

Research question

When randomly selecting a card from a deck with replacement, are we equally likely to select a heart, diamond, spade, and club?

I randomly selected a card from a standard deck 40 times with replacement. I pulled 13 hearts, 8 diamonds, 8 spades, and 11 clubs.

Let's use the five-step hypothesis testing procedure:

Step 1: Check assumptions and write hypotheses

\(H_0: p_h=p_d=p_s=p_c=0.25\)
\(H_a:\) at least one \(p_i\) is not as specified in the null

We can use the null hypothesis to check the assumption that all expected counts are at least 5.

\(Expected\;count=n (p_i)\)

All \(p_i\) are 0.25. \(40(0.25)=10\), thus this assumption is met and we can approximate the sampling distribution using the chi-square distribution.

Step 2: Compute the test statistic

 \(\chi^2=\Sigma \frac{(Observed-Expected)^2}{Expected} \)

All expected values are 10. Our observed values were 13, 8, 8, and 11.

\(\chi^2=\frac{(13-10)^2}{10}+\frac{(8-10)^2}{10}+\frac{(8-10)^2}{10}+\frac{(11-10)^2}{10}\)
\(\chi^2=\frac{9}{10}+\frac{4}{10}+\frac{4}{10}+\frac{1}{10}\)
\(\chi^2=1.8\)

Step 3: Determine the p-value

Our sampling distribution will be a chi-square distribution.

\(df=k-1=4-1=3\)

We can find the p-value by constructing a chi-square distribution with 3 degrees of freedom to find the area to the right of \(\chi^2=1.8\)

Chi-squared distribution plot made using Minitab Express; degrees of freedom equal 3; area to the right of chi-squared value of 1.8 is 0.614935

The p-value is 0.614935

Step 4: Make a decision

\(p>0.05\) therefore we fail to reject the null hypothesis.

Step 5: State a "real world" conclusion

There is not evidence that the proportion of hearts, diamonds, spades, and clubs that are randomly drawn from this deck are different.


11.2.1.3 - Roulette Wheel (Different Proportions)

11.2.1.3 - Roulette Wheel (Different Proportions)

Research Question

An American roulette wheel contains 38 slots: 18 red, 18 black, and 2 green.  A casino has purchased a new wheel and they want to know if there is any evidence that the wheel is unfair. They spin the wheel 100 times and it lands on red 44 times, black 49 times, and green 7 times.

Step 1: Check assumptions and write hypotheses

If the wheel is fair then \(p_{red}=\frac{18}{38}\), \(p_{black}=\frac{18}{38}\), and \(p_{green}=\frac{2}{38}\).

All of these proportions combined equal 1.

\(H_0: p_{red}=\frac{18}{38},\;p_{black}=\frac{18}{38}\;and\;p_{green}=\frac{2}{38}\)

\(H_a: At\;least\;one\;p_i\;is \;not\;as\;specified\;in\;the\;null\)

In order to conduct a chi-square goodness of fit test all expected values must be at least 5. 

For both red and black: \(Expected \;count=100(\frac{18}{38})=47.368\)

For green: \(Expected\;count=100(\frac{2}{38})=5.263\)

All expected counts are at least 5 so we can conduct a chi-square goodness of fit test. 

Step 2: Compute the test statistic

 \(\chi^2=\Sigma \frac{(Observed-Expected)^2}{Expected} \)

In the first step we computed the expected values for red and black to be 47.368 and for green to be 5.263.

 \(\chi^2= \frac{(44-47.368)^2}{47.368}+\frac{(49-47.368)^2}{47.368}+\frac{(7-5.263)^2}{5.263} \)

 \(\chi^2=0.239+0.056+0.573=0.868\)

Step 3: Determine the p-value

Our sampling distribution will be a chi-square distribution.

\(df=k-1=3-1=2\)

We can find the p-value by constructing a chi-square distribution with 2 degrees of freedom to find the area to the right of \(\chi^2=0.868\)

Chi-squared distribution plot made using Minitab Express; degrees of freedom equal 2; area to the right of chi-squared value of 0.868 is shaded with a proportion of 0.647912

The p-value is 0.647912

Step 4: Make a decision

\(p>0.05\) therefore we should fail to reject the null hypothesis. 

Step 5: State a "real world" conclusion

There is not evidence that this roulette wheel is unfair.


11.2.2 - Minitab Express: Goodness-of-Fit Test

11.2.2 - Minitab Express: Goodness-of-Fit Test

Research Question:

When randomly selecting a card from a deck with replacement, are we equally likely to select a heart, diamond, spade, and club?

I randomly selected a card from a standard deck 40 times with replacement. I pulled 13 hearts, 8 diamonds, 8 spades, and 11 clubs.

MinitabExpress  – Conducting a Chi-Square Goodness-of-Fit Test

Summarized Data, Equal Proportions

To perform a chi-square goodness-of-fit test in Minitab Express using summarized data we first need to enter the data into the worksheet. Below you can see that we have one column with the names of each group and one column with the observed counts for each group.

  Suit Count
1 Hearts 13
2 Diamonds 8
3 Spades 8
4 Clubs 11
  1. On a PC: Select STATISTICS > Chi-Square Goodness-of-Fit
    On a Mac: Select Statistics > Tables > Chi-Square Goodness-of-Fit
  2. From the drop-down box select Summarized data in a column
  3. Double-click Count to enter it into the Observed Counts box
  4. Double-click Suit to enter it into the Category names box
  5. Click OK

This should result in the following output:

Chi-Square Goodness-of-Fit Test: Count
Observed and Expected Counts
Category Observed Test
Proportion
Expected Contribution
to Chi-Sq
Hearts 13 0.250000 10 0.90
Diamonds 8 0.250000 10 0.40
Spades 8 0.250000 10 0.40
Clubs 11 0.250000 10 0.10
Chi-Square Test
N DF Chi-Sq P-Value
40 3 1.80 0.6149
Video Walkthrough

Select your operating system below to see a step-by-step guide for this example.

All expected values are at least 5 so we can use the chi-square distribution to approximate the sampling distribution. Our results are \(\chi^2 (3) = 1.80\). \(p = 0.6149\). Because our p-value is greater than the standard alpha level of 0.05, we fail to reject the null hypothesis. There is not evidence that the proportions are different in the population.

The example above tested equal population proportions. Minitab Express also has the ability to conduct a chi-square goodness-of-fit test when the hypothesized population proportions are not all equal. To do this, you can choose to test specified proportions or to use proportions based on historical counts:

Screen shot from Minitab Express showing where to change the null proportions


11.2.2.1 - Video Example: Tulips (Summarized Data, Equal Proportions)

11.2.2.1 - Video Example: Tulips (Summarized Data, Equal Proportions)

The following example uses summarized data and tests a null hypothesis of equal proportions.


11.2.2.2 - Video Example: Roulette (Summarized Data, Different Proportions)

11.2.2.2 - Video Example: Roulette (Summarized Data, Different Proportions)

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility