3.5.2 - Bubble Plots

A bubble plot can be used to display data concerning three quantitative variables at a time and a categorical grouping variable.  In the example below, three variables are displayed: one on the \(x\)-axis, one on the \(y\)-axis, and one as the size of the bubbles. In Figures 2.75 and 2.76 in your textbook, four variables are displayed: quantitative variables are represented on the x-axis, y-axis, and as the size of the bubbles; a categorical variable is represented by the color of the bubbles. 

Minitab Express will not construct bubble plots, however, Minitab 18, Excel, R, and many other statistical programs will.

Example: Height, Weight, & Days Exercising Section

The plot below was made using the statistical software R. Data were collected from World Campus students. They were asked for their heights, weights, and how many days per week they exercised. Researchers believed that there would be a linear relationship between height and weight overall but that number of days exercised would also be a factor. In this plot height (in inches) is on the \(x\)-axis, weight (in pounds) is on the \(y\)-axis, and the size of each bubble is determined by the number of days per week that the individual exercised.

Bubble Plot of Weight vs Height Regression

Larger bubbles signify more days per week exercising. From this plot, we can see that there is a positive linear relationship between height and weight. We can also see that many of the larger bubbles (i.e., people who exercise more) tend to fall below the line of best fit and more of the smaller bubbles are above the line. In other words, people who spend more time at the gym have larger negative residuals. This means that for their height, they weigh less than predicted given a model that uses only height to predict weight. 

Example: Air Quality in New York Section

Bubble plot with groups of New York air quality

Source: http://t-redactyl.io/blog/2016/02/creating-plots-in-r-using-ggplot2-part-6-weighted-scatterplots.html

The bubble plot above displays data for three quantitative variables plus a categorical variable. The x axis represents the day of the month and the y axis represents a measure of the air quality. The size of each bubble is the wind speed on that day. And, the color of the bubble represents the month. 

It looks like the pink bubbles, particularly starting on the sixth of the month, tend to be lower than the blue and green. This means that air pollution tends to be lower in September compared to July and August. The larger bubbles tend to be lower, representing lower air pollution on windier days.