Heatmap

What is a heatmap?

A heatmap uses color to show changes and magnitude of a third variable to a two-dimensional plot.

How are heatmaps used?

Heatmaps are used to help show patterns and changes. While they can be used to show changes over time, they are not designed for detailed analysis.

See how to create a heatmap using statistical software

Excerpt from Statistical Thinking for Industrial Problem Solving, a free online statistics course

Heatmaps show relationships and changes

A heatmap is an arrangement of rectangles. The x-axis is often some measure of time but can be any variable with groupings. The y-axis is a variable that defines the categories in the data. Each rectangle is the same size, unlike a treemap. The rectangles are colored to show the magnitude of a third variable. Although initially used for temperatures, heatmaps are now used for many types of data.

Heatmaps are helpful for large data sets. A heatmap with a time axis can be used to view patterns and changes over time. Heatmap rectangles can be labeled with values of the color variable, which is useful only in cases where there are very few categories on the y-axis.

Figure 1 shows a heatmap of maximum temperatures at three US airports by week of the year. The legend at the right explains the colors of the rectangles. JMP software scales and colors the heatmap based on the data.

The graph in Figure 1 shows the basic idea of a heatmap. Not surprisingly, the coolest weeks are in the winter, and the warmest weeks are in the summer.

Figure 1: Heatmap of maximum temperatures at three US airports by week of the year

Figure 2 shows the same heatmap with numeric labels added for temperature.

Figure 2: Heatmap with labels

The software automatically colors the label text to be readable for the different colors of rectangles. You can imagine that labels might not be practical for a heatmap with many more rectangles. 

Heatmap examples

Example 1: Temperatures and airports

Figure 3 expands the basic heatmap by showing all airports in the data set. 

Figure 3: Heatmap with all airports

In Figure 3, we again see that the maximum temperature is cooler in winter and warmer in summer. Because the airports are ordered top-to-bottom by latitude, we see that the northern airports are generally cooler than southern airports across the entire year. We also see missing data represented by white cells. 

Compare this heatmap to Figure 1 that shows just three airports. JMP automatically scales and colors the heatmap based on the range of the variable used for coloring the heatmap data. Because of this, the three airports from the first heatmap have different colors than in Figure 3, which includes all of the data. 

Example 2: Population change over time

Heatmaps can be used for many types of data. The heatmap in Figure 4 shows the population change over a century for different US regions. 

Figure 4: Heatmap showing population change over time by region

This heatmap shows that some regions had little population change over the past century (relative to the scale in this particular heatmap). Alaska and Hawaii show almost no color change in the heatmap. The South Atlantic states had the largest population change over time.

Example 3: Airline delays with a large data set

Heatmaps are most helpful for seeing patterns in very large data sets. The graph in Figure 5 summarizes data from more than 29,000 flights. The heatmap shows the average arrival delay for six airlines. The rectangles are defined by the month on the y-axis and the day of the month on the x-axis.

Figure 5: Heatmap of airline arrival delays by month and day

From the heatmap colors, we see that the summer months and December have the highest average delays. We also see a few white cells which indicate missing data, specifically for those months with fewer than 31 days, meaning there are no flights on those days.

Use caution when combining very large data sets. In some cases, there is another variable that can have a big impact on the heatmap. For the airline delay data, the heatmap differs by airline. The graphs in Figure 6 show heatmaps for Southwest and American airlines.

Figure 6: Heatmaps of delays for American and Southwest airlines

While the heatmaps for the two airlines still show more delays in the summer and in December, the two airlines have different overall patterns. Southwest has overall fewer delays than American. When building a heatmap for a large data set, think about whether another variable could have an impact on the heatmap.

Example 4: Correlation matrix

Heatmaps are also useful when trying to understand relationships between many variables. JMP adds heatmaps for the pairwise correlations between variables to a scatter plot matrix. Figure 7 shows the two-way scatter plots between many variables for Australian tourism. The upper triangle of the matrix shows a heatmap of the correlations between pairs of variables. 

Figure 7: Correlation matrix

From this heatmap, we see that the bed spaces have a negative correlation between the persons employed by the hotel and the average length of customer stay, and a positive correlation with all other variables.