What is a mosaic plot?
A mosaic plot is a special type of stacked bar chart that shows percentages of data in groups. The plot is a graphical representation of a contingency table.
How are mosaic plots used?
Mosaic plots are used to show relationships and to provide a visual comparison of groups.
Mosaic plots show relationships
A mosaic plot is a special type of stacked bar chart. For two variables, the width of the columns is proportional to the number of observations in each level of the variable plotted on the horizontal axis. The vertical length of the bars is proportional to the number of observations in the second variable within each level of the first variable.
Mosaic plots help show relationships and give a visual way to compare groups. Figure 1 shows a mosaic plot for data from a clinical trial. The goal is to compare the distribution of patients over 65 years in both the placebo and study drug treatment groups. Ideally, the clinical trial should have about the same percentage of elderly patients in each treatment group.
The mosaic plot in Figure 1 shows that the placebo group has a higher percentage of elderly patients than the study drug group. The team running the trial will need to decide if the percentages are “close enough” to meet the goal or not. This mosaic plot also shows that there is a lower percentage of elderly people overall.
Mosaic plot examples
Example 1: Adding labels
Figure 2 expands the basic example by adding labels to the mosaic plot.
We can now see that there is about a 10% difference in the percentage of elderly patients in the placebo and study drug groups. We can use a Chi-square test of independence to evaluate if this difference is significant or not.
Adding labels to mosaic plots makes sense when there are only a few cells. With a lot of cells, the smallest cells may not be labeled. When your data set is small, count labels can be used rather than percentages to highlight that your visualization is based on limited information.
Example 2: Two variables for categories
Mosaic plots can be extended to more than two variables. The graph in Figure 3 shows survival percentages for passengers on the Titanic. The categories are formed by the combinations of levels for ticket class and for the passenger’s sex.
Example 3: Percentages on the x-axis
Mosaic plots can show percentages on the x-axis. Figure 4 shows data for students in urban and rural schools. Students were asked whether their primary goal was to have good grades, to be good at sports, or to be popular.
The mosaic plot shows that the goals of urban and suburban students are very similar to each other, with percentages that are very close. The rural students have different goals and are nearly evenly split across the three goals. Alternatively, one might label a mosaic plot with counts.
Example 4: Using sorted order
Sometimes the mosaic plot shows a natural sort order because of the categories. See Example 2 for the Titanic survival data as one example.
When the mosaic plot does not have a natural order, it can be hard to use for visual comparisons. The graph in Figure 5 shows the distribution of manufacturing locations within the categories of vehicles.
We see that all large cars are manufactured in the US, but it is hard to compare the percentages of compact and midsize cars made in the US.
We can improve the mosaic plot by sorting the vehicle categories in increasing order of percentage manufactured in the US. Figure 6 shows the revised mosaic plot. We can more easily compare the groups in this plot.
Now we can see that more compact cars than midsize cars are made in the US. The sorted chart also highlights that all large cars are made in the US.