2.3.1 - Poisson Sampling Printer-friendly version

Poisson sampling assumes that the random mechanism to generate the data can be described by a Poisson distribution. It is useful for modeling counts or events that occur randomly over a fixed period of time or in a fixed space. Often it is useful when the probability of any particular incidence happening is very small while the number of incidences is very large. (This is very much like a binomial distribution where success probability π of a trial is very very small but the number of trials n is very very large. This is known as the limiting condition).

For example, consider the World Cup soccer data example where we collect data on the frequency of the number of goals scored by teams during the first round matches of the 2002 World Cup. Another example is rolling of a dice during a fixed two-minute time period. Similarly count the number of emails you received between 4pm-5pm on a Friday, or number of students accessing STAT 504 course website on a Saturday, etc.

Let X be the number of goals scored in the matches of the first round of the World Cup. XPoisson (λ)

$P(X=x)=\dfrac{\lambda^x e^{-\lambda}}{x!}\qquad x=0,1,2,\ldots$

Where λ is the parameter describing the rate, that is the mean of the distribution, e.g., the average number of goals scored during the first round matches. Once you know λ, you know everything there is to know about this distribution. x! stands for x factorial, i.e., x!=1*2*3*...*x.  P(X=x) or P(x) is a probability that a randomly chosen team scored x number of goals in a game, e.g.:

$P(X=0)=\dfrac{\lambda^0 e^{-\lambda}}{0!}=\dfrac{1\cdot e^{-\lambda} }{1}=e^{-\lambda}$

How the average rate λ = 1.38 is obtained, is given below.  Then $P(X=0)=e^{-1.38}=\dfrac{1}{e^{1.38}}=0.252$ is the probability that a randomly chosen team will score 0 goals in the first round match of the World Cup. For the remaining probabilities see the table at the end of this page.

The Poisson Model (distribution) Assumptions

1. Independence: Events must be independent (e.g. the number of goals scored by a team should not make the number of goals scored by another team more or less likely.)
2. Homogeneity: The mean number of goals scored is assumed to be the same for all teams.
3. Time period (or space) must be fixed

Recall that mean and variance of Poisson distribution are the same; e.g., E(X) = Var(X) = λ. However in practice, the observed variance is usually larger than the theoretical variance and in the case of Poisson, larger than its mean. This is known as overdispersion, an important concept that occurs with discrete data. We assumed that each team has the same probability of in each match of the first round of scoring goals, but it's more realistic to assume that these probabilities will vary by the teams skills, the day the matches were played because of the weather, maybe even if the order of the matches, etc. Then we may observe more variations in the scoring than the Poisson model predicts.  Analyses assuming binomial, Poisson or multinomial distributions are sometimes invalid because of overdispersion. We will see more on this later when we study logistic regression and Poisson regression models.

Let us see how we can do some basic calculations with the World Cup Soccer example under the Poisson model.

QUESTION: What is the most likely mean number of goals scored; that is, what is the most likely value of the unknown parameter λ given the data x?

We can answer this question by relying on the basic principle of statistical inference, e.g., point estimation, confidence intervals and/or hypothesis testing.

Recall from Lesson 1 on Likelihood and MLE: The most common point estimate is the "maximum likelihood estimate" (MLE) which has nice statistical properties, and it is the most likely value of the parameter given the data; it is the value that maximizes the likelihood function.

The MLE of  λ from the Poisson distribution is the sample mean or the expectation of the distribution, and from the computation below for our example this is approximately:

\begin{align}
\bar{x} &= \dfrac{1}{95}\sum\limits_{i=1} x_i\\
&= \dfrac{1}{95} (0\times 23+1\times 37+2\times 20+3\times 11+4\times 2+5\times 1+6\times 0+ 7\times 0+ 8 \times 1)\\
&= \dfrac{131}{95}\\
&= 1.38\\
\end{align}

Thus, $\hat{\lambda}=1.38$ goals per first round matches.

$[1.38-1.96\sqrt{1.38/95},1.38+1.96\sqrt{1.38/95}]=[1.14,1.62]$

and we are 95% confident that the mean number of goals scored by a team during the first round match-ups will be somewhere between 1.14 and 1.62. Now that we have some estimate of the mean number of goals we can calculate the expected probabilities of a randomly chosen team scoring 0, 1, 2, 3, etc... number of goals, as well as the expected frequencies (or counts). For example, under this Poisson model with $\hat{\lambda}=1.38$, the expected probability of scoring 2 goals is $\hat{\pi}_2=p_2=P(X=2)=\frac{{1.38}^{2}e^{-1.38}}{2!}=0.239$ and the expected frequency is $np_2=95*0.239=22.75$ (see the 3rd row of the table below).

Example - World Cup Soccer

Here is a link to the World Cup Soccer data (text file).

You can easily do these calculation by hand or in Excel or in any other software package you are using. Here they are in SAS and R. Here is the SAS program soccer.sas.

For the complete output, see the course SAS page.

Please Note: Most PROC FREQ SAS options do NOT work for one-way tables, thus some coding is needed. Here is a link to this code in R. soccer.R.

You can click on the 'Inspect' button below to see how the Poisson probabilities are calculated using R. Please Note: There are some discrepancies between the R code file and Inspect! The file itself contains line comments explaining the code.

Here is a summary of these probabilities:

 Number of goals Observed Counts Expected probabilities under assumed Poisson model Expected Counts 0 23 0.252 23.93 1 37 0.347 32.99 2 20 0.239 22.75 3 11 0.110 10.46 4 2 0.038 3.61 5 1 0.010 0.99 6 0 0.002 0.23 7 0 0.0005 0.05 8 1 0.00008 0.01 Total 95

In the graphical form: For additional general calculations with Poisson distribution, see the following methods in SAS and R: Here is the SAS program for calculating Poisson probabilities: PoissonCal.sas. Here is the R code for calculating Poisson probabilities: PoissonCal.R.

Here is a walk-through of this code: Similarly binomial and multinomial sampling data also can be analyzed.