Lesson 7: Normal Distributions

Lesson 7: Normal Distributions

Objectives

Upon successful completion of this lesson, you should be able to:

  • Describe the standard normal distribution
  • Determine the area under a normal distribution using Minitab Express
  • Determine the points that offset a given proportion of a normal distribution using Minitab Express
  • Summarize the Central Limit Theorem
  • Conducted a hypothesis test using a standardized test statistic
  • Construct a confidence interval using the standard form

Over the last three lessons you have approximated sampling distributions using bootstrapping and randomization methods. You may have noticed that many of the distributions that you constructed had similar shapes, such as those below:

Randomization Dotplot of Proportion

Randomization Dotplot of Proportion

 

Bootstrap Dotplot of Mean

Bootstrap Dotplot of Mean

Bootstrap Dotplot of Correlation

Bootstrap Dotplot of Correlation

Randomization Dotplot of \(\bar{x}_1-\bar{x}_2\)

Randomization Dotplot of x(bar)1 - x(bar)2

These are all approximately normally distributed. You were first introduced to the normal distribution in Lesson 2 as a special type of symmetrical distribution. In this lesson, we'll review normal distributions, learn how to use Minitab Express to construct plots of normal distributions, and learn how the Central Limit Theorem allows us to apply what we know about the normal distribution to construct confidence intervals and conduct hypothesis tests without using simulations. 


7.1 - Standard Normal Distribution

7.1 - Standard Normal Distribution

A normal distribution is a bell-shaped distribution. Theoretically, a normal distribution is continuous and may be depicted as a density curve, such as the one below. The distribution plot below is a standard normal distribution. A standard normal distribution has a mean of 0 and standard deviation of 1. This is also known as the z distribution. You may see the notation \(N(\mu, \sigma\)) where N signifies that the distribution is normal, \(\mu\) is the mean of the distribution, and \(\sigma\) is the standard deviation of the distribution. A z distribution may be described as \(N(0,1)\). 

Distribution Plot - Normal, Mean=0, StDev=1

While we cannot determine the probability for any one given value because the distribution is continuous, we can determine the probability for a given interval of values. The probability for an interval is equal to the area under the density curve. The total area under the curve is 1.00, or 100%. In other words, 100% of observations fall under the curve.

For example, in Lesson 2 we learned about the Empirical Rule which stated that approximately 68% of observations on a normal distribution will fall within one standard deviation of the mean, approximately 95% will fall within two standard deviations of the mean, and approximately 99.7% will fall within three standard deviations of the mean. 

The normal curve showing the empirical rule.
mean−2s mean−1s mean+1s mean−3s mean+3s mean mean+2s 68% 95% 99.7%

Example: SAT-Math Scores

The distribution of SAT-Math scores can be described as \(N(500, 100)\). Let's apply the Empirical Rule to determine the SAT-Math scores that separate the middle 68% of scores, the middle 95% of scores, and the middle 99.7% of scores. 

Answer

Middle 68%: \(500\pm1(100)=[400, 600]\)

Middle 95%: \(500\pm2(100)=[300, 700]\)

Middle 99.7%: \(500\pm 3(100)= [200, 800]\)

z scores

In Lesson 2 we wanted to describe one observation in relation to the distribution of all observations. We did this using a z score.

z score

Distance between an individual score and the mean in standard deviation units; also known as a standardized score.

z score
\(z=\dfrac{x - \overline{x}}{s}\)

\(x\) = original data value
\(\overline{x}\) = mean of the original distribution
\(s\) = standard deviation of the original distribution

This equation could also be rewritten in terms of population values: \(z=\dfrac{x-\mu}{\sigma}\)

Example: IQ Scores

IQ scores are normally distributed with a mean of 100 and standard deviation of 15. Compute the z score for an individual with an IQ score of 120.

Answer
We'll use the formula for a z score:

\(z=\dfrac{x- \mu}{\sigma}\)

Here, \(x=120\), \(\mu=100\), and \(\sigma=15\).

\(z=\dfrac{120-100}{15}=\dfrac{20}{15}=1.333\)

This individual's z score is 1.333. Their IQ is 1.333 standard deviations above the mean.


7.2 - Minitab Express: Finding Proportions

7.2 - Minitab Express: Finding Proportions

Minitab Express can be used to find the proportion of a normal distribution in a given range. The default is to construct a standard normal distribution, but the mean and standard deviation of the distribution can be edited. The following pages will walk through how to construct normal distributions to find the proportion greater than a given value, the proportion less than a given value, or the proportion between two given values.


7.2.1 - Proportion 'Less Than'

7.2.1 - Proportion 'Less Than'

The cumulative probability for a value is the probability less than or equal to that value. In notation, this is \(P(X\leq x)\). The proportion at or below a given value is also known as a percentile.

MinitabExpress  – Proportion Less Than a z Value

Question: What proportion of the standard normal distribution is less than a z score of -2?

Recall that the standard normal distribution (i.e., distribution) has a mean of 0 and standard deviation of 1. This is the default normal distribution in Minitab Express.

Steps
  1. On a PC: from the menu select STATISTICS > Distribution Plot
    On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability (Note: The default is the standard normal distribution)
  3. Select A specified X value
  4. Select Left tail
  5. For X value enter -2

    This should result in the following output:

    Minitab Express output: z distribution showing the proportion less than -2

    The proportion of the z distribution that is less than -2 is 0.0227501.

    Video Walkthrough

    Select your operating system below to see a step-by-step guide for this example.

    MinitabExpress  – Proportion Less Than a Value on a Normal Distribution

    Scenario: Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph. What is the probability that a randomly selected vehicle will be going 73 mph or slower?

    Let's construct a normal distribution with a mean of 65 and standard deviation of 5 to find the area less than 73.

    Steps
    1. On a PC: from the menu select STATISTICS > Distribution Plot
      On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
    2. Select Display Probability 
    3. For Distribution select Normal (Note: This is the default)
    4. For Mean enter 65
    5. For Standard deviation enter 5
    6. Select A specified X value
    7. Select Left tail
    8. For X value enter 73

      This should result in the following output:

      Distribution Plot - Normal, Mean=65, StDev=5; Less Than

      On a normal distribution with a mean of 65 and standard deviation of 5, the proportion less than 73 is 0.945201 

      In other words, 94.5201% of vehicles will be going less than 73 mph.

      Video Walkthrough

      Select your operating system below to see a step-by-step guide for this example.


      7.2.1.1 - Video Example: P(Z<-1)

      7.2.1.1 - Video Example: P(Z<-1)

      Question: What proportion of the z distribution falls below a z score of -1?

      Steps
      1. On a PC: from the menu select STATISTICS > Distribution Plot

        On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot

      2. Select Display Probability (Note: The default is the standard normal distribution)
      3. Select A specified X value
      4. Select Left tail
      5. For X value enter -1
      Video Walkthrough

      This video is available in both Windows and Mac platforms. Select your platform below.


      7.2.1.2 - Video Example: P(SATM<540)

      7.2.1.2 - Video Example: P(SATM<540)

      Question: SAT-Math scores are normally distributed with a mean of 500 and standard deviation of 100. What proportion of scores are less than 540?

      Steps
      1. On a PC: from the menu select STATISTICS > Distribution Plot
        On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
      2. Select Display Probability 
      3. For Distribution select Normal (Note: This is the default)
      4. For Mean enter 500
      5. For Standard deviation enter 100
      6. Select A specified X value
      7. Select Left tail
      8. For X value enter 540

      Video Walkthrough


      7.2.2 - Proportion 'Greater Than'

      7.2.2 - Proportion 'Greater Than'

      MinitabExpress  – Proportion Greater Than a z Value

      Question: What proportion of the standard normal distribution is greater than a z score of 2?

      Recall that the standard normal distribution (i.e., distribution) has a mean of 0 and standard deviation of 1. This is the default normal distribution in Minitab Express.

      Steps
      1. On a PC: from the menu select STATISTICS > Distribution Plot
        On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
      2. Select Display Probability (Note: The default is the standard normal distribution)
      3. Select A specified X value
      4. Select Right tail
      5. For X value enter 2

        This should result in the following output:

        Minitab Express output: z distribution showing the area above 2

        The area of the z distribution that is greater than 2 is 0.0227501

        Video Walkthrough

        Select your operating system below to see a step-by-step guide for this example.

        MinitabExpress  – Proportion Greater Than a Value on a Normal Distribution

        Question: Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph. What is the probability that a randomly selected vehicle will be going more than 73 mph? 

        Let's construct a normal distribution with a mean of 65 and standard deviation of 5 to find the area greater than 73.

        To calculate a probability for values greater than a given value in Minitab Express:

        Steps
        1. On a PC: from the menu select STATISTICS > Distribution Plot
          On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
        2. Select Display Probability 
        3. For Distribution select Normal (Note: This is the default)
        4. For Mean enter 65
        5. For Standard deviation enter 5
        6. Select A specified X value
        7. Select Right tail
        8. For X value enter 73

          This should result in the following output:

          Distribution Plot - Normal, Mean=65, StDev=5; Greater Than

          On a normal distribution with a mean of 65 and standard deviation of 5, the proportion greater than 73 is 0.0547993

          In other words, 5.47993% of vehicles will be going more than 73 mph.

          Video Walkthrough

          Select your operating system below to see a step-by-step guide for this example.


          7.2.2.1 - Video Example: P(Z>0.5)

          7.2.2.1 - Video Example: P(Z>0.5)

          Question: What proportion of the z distribution is greater than z = 0.5?

          Steps
          1. On a PC: from the menu select STATISTICS > Distribution Plot
            On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
          2. Select Display Probability (Note: The default is the standard normal distribution)
          3. Select A specified X value
          4. Select Right tail
          5. For X value enter 0.5
          Video Walkthrough

          Select your operating system below to see a step-by-step guide for this example.


          7.2.3 - Proportion 'In between'

          7.2.3 - Proportion 'In between'

          MinitabExpress  – Proportion Between Two z Values

          Question: What proportion of the standard normal distribution is between a z score of 0 and a z score of 1.75?

          Recall that the standard normal distribution (i.e., distribution) has a mean of 0 and standard deviation of 1. This is the default normal distribution in Minitab Express.

          Steps
          1. On a PC: from the menu select STATISTICS > Distribution Plot
            On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
          2. Select Display Probability (Note: The default is the standard normal distribution)
          3. Select A specified X value
          4. Select Middle
          5. For X value 1 enter 0 and for X value 2 enter 1.75

            This should result in the following output:

            Standard normal distribution from Minitab Express showing that the area between 0 and 1.75 is 0.459941

            The proportion of the z distribution that is between 0 and 1.75 is 0.459941

            Video Walkthrough

            Select your operating system below to see a step-by-step guide for this example.

            MinitabExpress  – Proportion Between Values on a Normal Distirbution

            Question: Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph. What is the probability that a randomly selected vehicle will be going between 60 mph and 73 mph?

            Let's construct a normal distribution with a mean of 65 and standard deviation of 5 to find the area between 60 and 73.

            Steps
            1. On a PC: from the menu select STATISTICS > Distribution Plot
              On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
            2. Select Display Probability 
            3. For Distribution select Normal
            4. For Mean enter 65
            5. For Standard deviation enter 5
            6. Select A specified X value
            7. Select Middle
            8. For X value 1 enter 60 and for X value 2 enter 73

              This should result in the following output:

              Distribution Plot - Normal, Mean=65, StDev=5; In Between

              On a normal distribution with a mean of 65 and standard deviation of 5, the proportion between 60 and 73 is 0.786545

              In other words, 78.6545% of vehicles will be going between 60 mph and 73 mph. 

              Video Walkthrough

              Select your operating system below to see a step-by-step guide for this example.


              7.2.3.1 - Video Example: Proportion Between z -2 and +2

              7.2.3.1 - Video Example: Proportion Between z -2 and +2

              Question: What proportion of the z distribution is between -2 and 2?

              Steps
              1. On a PC: from the menu select STATISTICS > Distribution Plot
                On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
              2. Select Display Probability (Note: The default is the standard normal distribution)
              3. Select A specified X value
              4. Select Middle
              5. For X value 1 enter -2 and for X value 2 enter 2

              7.2.4 - Proportion "More Extreme Than"

              7.2.4 - Proportion "More Extreme Than"

              MinitabExpress

              Question: What proportion of the standard normal distribution is more extreme than a z value of ±2?

              Steps
              1. On a PC: from the menu select STATISTICS > Distribution Plot
                On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
              2. Select Display Probability (Note: The default is the standard normal distribution)
              3. Select A specified X value
              4. Select Equal tails
              5. For X value enter 2

              This should result in the following output:

              Minitab Express output showing a z distribution showing the area less than -2 and greater than +2

              Video Walkthrough

              Select your operating system below to see a step-by-step guide for this example.


              7.3 - Minitab Express: Finding Values Given Proportions

              7.3 - Minitab Express: Finding Values Given Proportions

              MinitabExpress  – Finding the Middle 90% of a z Distribution

              Scenario: What z scores separate the middle 90% of the standard normal distribution from the outer 10%?

              We will construct a standard normal distribution with a mean of 1 and standard deviation of 0. We will find the points that separate the middle 0.90 from the outer 0.10. Note that the outer 0.10 will be equally split between the two tails:

              Steps
              1. On a PC: from the menu select STATISTICS > Distribution Plot
                On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
              2. Select Display Probability (Note: The default is the standard normal distribution)
              3. Select A specified probability
              4. Select Equal tails
              5. For Probability enter 0.10

              This should result in the following output:

              Minitab Express output: z distribution showing the z scores that separate the middle 90% from the outer 10% of the z distribution

              The z scores that separate the middle 90% of the distribution from the outer 10% are ±1.64485

              Video Walkthrough

              Select your operating system below to see a step-by-step guide for this example.

              MinitabExpress  – Finding the Top 10% of Vehicle Speeds

              Scenario: Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph. What speed separates the top 10% of vehicles?

              We will construct a normal distribution with a mean of 65 and standard deviation of 5. We will find the point that separates the to 0.10 from the bottom 0.90:

              Steps
              1. On a PC: from the menu select STATISTICS > Distribution Plot
                On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
              2. Select Display Probability 
              3. For Distribution select Normal
              4. For Mean enter 65
              5. For Standard deviation enter 5
              6. Select A specified probability
              7. Select Right tail
              8. For Probability enter 0.10

              Note: You could also select Left tail and enter 0.90 as the probability. 

              This should result in the following output:

              Distribution Plot - Normal, Mean=65, StDev=5; Given Probability

              The speed that separates the top 10% is 71.4078 mph.

              Video Walkthrough

              Select your operating system below to see a step-by-step guide for this example.


              7.3.1 - Video Example: Middle 80% of the z Distribution

              7.3.1 - Video Example: Middle 80% of the z Distribution

              Question: What z scores separate the middle 80% of the z distribution from the outer 20%?

              Steps
              1. On a PC: from the menu select STATISTICS > Distribution Plot
                On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
              2. Select Display Probability (Note: The default is the standard normal distribution)
              3. Select A specified probability
              4. Select Equal tails
              5. For Probability enter 0.20
              Video Walkthrough

               


              7.3.2 - Video Example: Middle 50% SATM

              7.3.2 - Video Example: Middle 50% SATM

              Question: SAT-Math scores are normally distributed with a mean of 500 and standard deviation of 100. Find the SAT-Math scores that separate the middle 50% of the distribution from the outer 50% of the distribution.

              Steps
              1. On a PC: from the menu select STATISTICS > Distribution Plot
                On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
              2. Select Display Probability 
              3. For Distribution select Normal
              4. For Mean enter 500
              5. For Standard deviation enter 100
              6. Select A specified probability
              7. Select Equal tails
              8. For Probability enter 0.50
              Video Walkthrough

              7.3.3 - Video Example: Top 10% SATM

              7.3.3 - Video Example: Top 10% SATM

              Question: SAT-Math scores are normally distributed with a mean of 500 and standard deviation of 100. What score separates the top 10% from the bottom 90%?

              Steps
               
              1. On a PC: from the menu select STATISTICS > Distribution Plot
                On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
              2. Select Display Probability 
              3. For Distribution select Normal
              4. For Mean enter 500
              5. For Standard deviation enter 100
              6. Select A specified probability
              7. Select Right tail
              8. For Probability enter 0.10
              Video Walkthrough

              7.4 - Central Limit Theorem

              7.4 - Central Limit Theorem

              As we saw at the beginning of this lesson, many of the sampling distributions that you have constructed and worked with this semester are approximately normally distributed. The Central Limit Theorem states that if the sample size is sufficiently large then the sampling distribution will be approximately normally distributed for many frequently tested statistics, such as those that we have been working with in this course: one sample mean, one sample proportion, difference in two means, difference in two proportions, the slope of a simple linear regression model, and Pearson's r correlation. Over the next few lessons we will examine what constitutes a "sufficiently large" sample size. Essentially, it is determined by the point at which the sampling distribution becomes approximately normal.

              In practice, when we construct confidence intervals and conduct hypothesis tests we often use the normal distribution (or t distributions which you'll see next week) as opposed to bootstrapping or randomization procedures in situations when the sampling distribution is approximately normal. This method is preferred by many because z scores are on a standard scale (i.e., mean of 0 and standard deviation of 1) which makes interpreting results more straight forward. 

              Drag the slider at the bottom of the graph to see normal curve fit on the randomization plot.


              7.4.1 - Hypothesis Testing

              7.4.1 - Hypothesis Testing

              Five Step Hypothesis Testing Procedure

              In the remaining lessons, we will use the following five step hypothesis testing procedure. This is slightly different from the five step procedure that we used when conducting randomization tests. 

              1. Check assumptions and write hypotheses. The assumptions will vary depending on the test. In this lesson we'll be confirming that the sampling distribution is approximately normal by visually examining the sampling distribution. In later lessons you'll learn more objective assumptions. The null and alternative hypotheses will always be written in terms of population parameters; the null hypothesis will always contain the equality (i.e., \(=\)).
              2. Calculate the test statistic. Here, we'll be using the formula below for the general form of the test statistic.
              3. Determine the p-value. The p-value is the area under the standard normal distribution that is more extreme than the test statistic in the direction of the alternative hypothesis.
              4. Make a decision. If \(p \leq \alpha\) reject the null hypothesis. If \(p>\alpha\) fail to reject the null hypothesis.
              5. State a "real world" conclusion. Based on your decision in step 4, write a conclusion in terms of the original research question.

              General Form of a Test Statistic

              When using a standard normal distribution (i.e., z distribution), the test statistic is the standardized value that is the boundary of the p-value. Recall the formula for a z score: \(z=\frac{x-\overline x}{s}\). The formula for a test statistic will be similar. When conducting a hypothesis test the sampling distribution will be centered on the null parameter and the standard deviation is known as the standard error.

              General Form of a Test Statistic
              \(test\;statistic=\dfrac{sample\;statistic-null\;parameter}{standard\;error}\)

              This formula puts our observed sample statistic on a standard scale (e.g., z distribution). A z score tells us where a score lies on a normal distribution in standard deviation units. The test statistic tells us where our sample statistic falls on the sampling distribution in standard error units.


              7.4.1.1 - Video Example: Mean Body Temperature

              7.4.1.1 - Video Example: Mean Body Temperature

              Research question: Is the mean body temperature in the population different from 98.6° Fahrenheit?


              Video Walkthrough

              7.4.1.2 - Video Example: Correlation Between Printer Price and PPM

              7.4.1.2 - Video Example: Correlation Between Printer Price and PPM

              Research question: Is there a positive correlation in the population between the price of an ink jet printer and how many pages per minute (ppm) it prints?


              Video Walkthrough

              7.4.1.3 - Example: Proportion NFL Coin Toss Wins

              7.4.1.3 - Example: Proportion NFL Coin Toss Wins

              Research question: Is the proportion of NFL overtime coin tosses that are won different from 0.50?


              StatKey was used to construct a randomization distribution:

              Screenshot of StatKey randomization distribution

               

              Step 1: Check assumptions and write hypotheses

              From the given StatKey output, the sampling distribution is approximately normal.

              \(H_0\colon p=0.50\)

              \(H_a\colon p \ne 0.50\)

              Step 2: Calculate the test statistic

              \(test\;statistic=\dfrac{sample\;statistic-null\;parameter}{standard\;error}\)

              The sample statistic is the proportion in the original sample, 0.561. The null parameter is 0.50. And, the standard error is 0.024.

              \(test\;statistic=\dfrac{0.561-0.50}{0.024}=\dfrac{0.061}{0.024}=2.542\)

              Step 3: Determine the p value

              The p value will be the area on the z distribution that is more extreme than the test statistic of 2.542, in the direction of the alternative hypothesis. This is a two-tailed test:

              Minitab Express output of a z distribution, the area more extreme than z= 2.542 is highlighted

              The p value is the area in the left and right tails combined: \(p=0.0055110+0.0055110=0.011022\)

              Step 4: Make a decision

              The p value (0.011022) is less than the standard 0.05 alpha level, therefore we reject the null hypothesis.

              Step 5: State a "real world" conclusion

              There is evidence that the proportion of all NFL overtime coin tosses that are won is different from 0.50

               


              7.4.1.4 - Example: Proportion of Students Female

              7.4.1.4 - Example: Proportion of Students Female

              Research question: Are more than 50% of all World Campus STAT 200 students female?

              Data were collected from a representative sample of 501 World Campus STAT 200 students. In that sample, 284 students were female and 217 were male. 


              StatKey was used to construct a sampling distribution using randomization methods:

              Randomization Dotplot of Proportion; Null hypothesis p=0.5

              Because this sampling distribution is approximately normal, we can find the p value by computing a standardized test statistic and using the z distribution.

              Step 1: Check assumptions and write hypotheses

              The assumption here is that the sampling distribution is approximately normal. From the given StatKey output, the sampling distribution is approximately normal. 

              \(H_0\colon p=0.50\)
              \(H_a\colon p>0.50\)

              2. Calculate the test statistic

              \(test\;statistic=\dfrac{sample\;statistic-hypothesized\;parameter}{standard\;error}\)

              The sample statistic is \(\widehat p = 284/501 = 0.567\).

              The hypothesized parameter is the value from the hypotheses: \(p_0=0.50\).

              The standard error on the randomization distribution above is 0.022.

              \(test\;statistic=\dfrac{0.567-0.50}{0.022}=3.045\)

              3. Determine the p value

              We can find the p value by constructing a standard normal distribution and finding the area under the curve that is more extreme than our observed test statistic of 3.045, in the direction of the alternative hypothesis. In other words, \(P(z>3.045)\):

              Distribution Plot - Normal, Mean=0, StDev=1

              Our p value is 0.0011634

              4. Make a decision

              Our p value is less than or equal to the standard 0.05 alpha level, therefore we reject the null hypothesis.

              5. State a "real world" conclusion

              There is evidence that the proportion of all World Campus STAT 200 students who are female is greater than 0.50.


              7.4.1.5 - Example: Mean Quiz Score

              7.4.1.5 - Example: Mean Quiz Score

              Research question: Is the mean quiz score different from 14 in the population?


              StatKey was used to construct a randomization distribution:

              Randomization distribution constructed in StatKey

              Step 1: Check assumptions and write hypotheses

              From the given StatKey output, the sampling distribution is approximately normal.

              \(H_0\colon \mu = 14\)

              \(H_a\colon \mu \ne 14\)

              Step 2: Calculate the test statistic

              \(test\;statistic=\dfrac{sample\;statistic-null\;parameter}{standard\;error}\)

              The sample statistic is the mean in the original sample, 13.746 points. The null parameter is 14 points. And, the standard error, 0.142, can be found on the StatKey output.

              \(test\;statistic=\dfrac{13.746-14}{0.142}=\dfrac{-0.254}{0.142}=-1.789\)

              Step 3: Determine the p value

              The p value will be the area on the z distribution that is more extreme than the test statistic of -1.789, in the direction of the alternative hypothesis:

              Minitab Express output showing the area more extreme than z = -1.789

              This was a two-tailed test. The p value is the area in the left and right tails combined: \(p=0.0368074+0.0368074=0.0736148\)

              Step 4: Make a decision

              The p value (0.0736148) is greater than the standard 0.05 alpha level, therefore we fail to reject the null hypothesis.

              Step 5: State a "real world" conclusion

              There is not evidence that the mean quiz score in the population is different from 14 points. 


              7.4.1.6 - Example: Difference in Mean Commute Times

              7.4.1.6 - Example: Difference in Mean Commute Times

              Research question: Do the mean commute times in Atlanta and St. Louis differ in the population? 


              StatKey was used to construct a randomization distribution:

              Screenshot of the randomization distribution constructed in StatKey

              Step 1: Check assumptions and write hypotheses

               From the given StatKey output, the sampling distribution is approximately normal.

              \(H_0: \mu_1-\mu_2=0\)

              \(H_a: \mu_1 - \mu_2 \ne 0\)

              Step 2: Compute the test statistic

              \(test\;statistic=\dfrac{sample\;statistic - null \; parameter}{standard \;error}\)

              The observed sample statistic is \(\overline x _1 - \overline x _2 = 7.14\). The null parameter is 0. And, the standard error, from the StatKey output, is 1.136.

              \(test\;statistic=\dfrac{7.14-0}{1.136}=6.285\)

              Step 3: Determine the p value

              The p value will be the area on the z distribution that is more extreme than the test statistic of 6.285, in the direction of the alternative hypothesis:

              Minitab Express output: Normal distribution showing the area more extreme than 6.285

              This was a two-tailed test. The area in the two tailed combined is 0.000000. Theoretically, the p value cannot be 0 because there is always some chance that a Type I error was committed. This p value would be written as p < 0.001.

              Step 4: Make a decision

              The p value is smaller than the standard 0.05 alpha level, therefore we reject the null hypothesis. 

              Step 5: State a "real world" conclusion

              There is evidence that the mean commute times in Atlanta and St. Louis are different in the population. 


              7.4.2 - Confidence Intervals

              7.4.2 - Confidence Intervals

              Standard Normal Distribution Method

              The normal distribution can also be used to construct confidence intervals. You used this method when you first learned to construct confidence intervals using the standard error method. Recall the formula you used:

              95% Confidence Interval
              \(sample\;statistic \pm 2 (standard\;error)\)

              The 2 in this formula comes from the normal distribution. According to the 95% Rule, approximately 95% of a normal distribution falls within 2 standard deviations of the mean.

              The normal curve showing the empirical rule.
              µ−2 σ µ−1 σ µ+1 σ µ−3 σ µ+3 σ µ µ+2 σ 68% 95% 99.7%

              Using the normal distribution, we can conduct a confidence interval for any level using the following general formula:

              General Form of a Confidence Interval
              sample statistic \(\pm\) \(z^*\) (standard error)
              \(z^*\) is the multiplier

              The \(z^*\) multiplier can be found by constructing a z distribution in Minitab Express.

               

              z* Multiplier for a 90% Confidence Interval

              What z* multiplier should be used to construct a 90% confidence interval?

              For a 90% confidence interval, we would find the z scores that separate the middle 90% of the z distribution from the outer 10% of the z distribution:

              Minitab Express output: Normal distribution showing the values that separate the outer 10% from the inner 90%
              0.05 1.64485 -1.64485 0 0.05 0.0 0.1 0.2 0.3 0.4 Density X DistributionPlot Normal,Mean,StDev=1

              For a 90% confidence interval, the \(z^*\) multiplier will be 1.64485.


              7.4.2.1 - Video Example: 98% CI for Mean Atlanta Commute Time

              7.4.2.1 - Video Example: 98% CI for Mean Atlanta Commute Time

              Construct a 98% confidence interval to estimate the mean commute time in the population of all Atlanta residents.


              This example uses a dataset is built in to StatKey: Confidence Interval for a Mean, Median, Std. The dataset is titled 'Atlanta Commute.'

              Video Walkthrough


              7.4.2.2 - Video Example: 90% CI for the Correlation between Height and Weight

              7.4.2.2 - Video Example: 90% CI for the Correlation between Height and Weight

              Construct a 90% confidence interval to estimate the correlation between height and weight in the population of all adult men.


              Video Walkthrough


              7.4.2.3 - Example: 99% CI for Proportion of Students Female

              7.4.2.3 - Example: 99% CI for Proportion of Students Female

              Scenario: Data were collected from a representative sample of 501 World Campus STAT 200 students. In that sample, 284 students were female and 217 were male. Construct a 99% confidence interval to estimate the proportion of all World Campus students who are female. 


              StatKey was used to construct a sampling distribution using bootstrapping methods:

              StatKey Bootstrap Distribution Plot

              Because this distribution is approximately normal, we can approximate the sampling distribution using the z distribution. We will use the standard error, 0.022, from this distribution.

              The original sample statistic was \(\widehat p =\frac{284}{501}=0.567\). 

              We can find the \(z^*\) multiplier by constructing a z distribution to find the values that separate the middle 99% from the outer 1%:

              Minitab Express output: z distribution showing the middle 99% versus the outer 1%

              The \(z^*\) multiplier is 2.57583

              Recall the general form of a confidence interval: sample statistic \(\pm\) \(z^*\) (standard error) where \(z^*\) is the multiplier. So in this case we have...

              \(0.567 \pm 2.57583 (0.022)\)

              \(0.567 \pm 0.057\)

              \([0.510, 0.624]\)

              I am 99% confident that the proportion of all World Campus students who are female is between 0.510 and 0.624


              7.4.2.4 - Example: 95% CI for Difference in Proportion of Smokers by Sex

              7.4.2.4 - Example: 95% CI for Difference in Proportion of Smokers by Sex

              Construct a 95% confidence interval to estimate the difference between the proportion of all females who smoke and the proportion of all males who smoke.

              This dataset is built in to StatKey: Confidence Interval for Difference in Proportions. It is the Student Survey: Smoke by Gender dataset.

              Original Sample

              Group Count Sample Size Proportion
              Female 16 169 0.095
              Male 27 193 0.140
              Female-Male -11 n/a -0.045

              StatKey was used to construct a bootstrap sampling distribution:

              StatKey: Bootstrap sampling distribution for the difference in the proportion of female and male smokers

              Because this distribution is approximately normal, we can approximate the sampling distribution using the z distribution. We will use the standard error, 0.033, from this distribution.

              The original sample statistic was \(\widehat p_f - \widehat p_m = \frac{16}{169} - \frac{27}{193} = -0.045\)

              We can find the \(z^*\) multiplier for a 95% confidence interval using Minitab Express. This will be the values on a z distribution that separate the middle 95% from the outer 5%. (Note: You could apply the Empirical Rule and use a multiplier of 2, but the value found using Minitab Express will be more precise)

              Minitab Express output: z distribution with the multipliers for a 95% confidence interval

              The \(z^*\) multiplier is 1.95996.

              Recall the general form of a confidence interval: sample statistic \(\pm\) \(z^*\) (standard error) where \(z^*\) is the multiplier. So in this case we have...

              \(-0.045 \pm 1.95996(0.033)\)

              \(-0.045 \pm 0.065\)

              \([-0.110,0.020]\) 

              I am 95% confident that the difference in the population between the proportion of females who smoke and the proportion of males who smoke (i.e., \(p_f-p_m\)) is between -0.110 and 0.020.


              7.5 - Lesson 7 Summary

              7.5 - Lesson 7 Summary

              Objectives

              Upon successful completion of this lesson, you should be able to:

              • Describe the standard normal distribution
              • Determine the area under a normal distribution using Minitab Express
              • Determine the points that offset a given proportion of a normal distribution using Minitab Express
              • Summarize the Central Limit Theorem
              • Conduct a hypothesis test using a standardized test statistic
              • Construct a confidence interval using the standard form

              In this lesson we learned how to find the proportion under a normal distribution. We used the standard normal distribution to approximate the sampling distribution to find p value and to construct confidence intervals. In the next few lessons we will learn about the t distribution, which is similar to the standard normal distribution, and we'll focus more on how Minitab Express can be used to construct confidence intervals and conduct hypothesis tests using these common distributions. 


              Legend
              [1]Link
              Has Tooltip/Popover
               Toggleable Visibility