Treemap

What is a treemap?

A treemap shows the hierarchical structure of data using rectangles of different colors and sizes.

How are treemaps used?

Treemaps help you see the hierarchy of data and relationships between variables.

See how to create a treemap using statistical software

Excerpt from Statistical Thinking for Industrial Problem Solving, a free online statistics course

Treemaps show hierarchical data

A treemap is an arrangement of rectangles that shows the hierarchical structure of your data. Treemaps originated as a way to show the structure and size of files on a computer hard drive. Treemaps are now used for many situations, including those without a hierarchy. Treemaps can show a large amount of data in a small amount of space.

Treemaps are almost always computer-generated. Software tools use an algorithm to construct the size of rectangles proportional to the number of observations that fall within each rectangle.

Most tools have an option to color the rectangles and can add labels to them. Some tools allow drill-down to show more detail, which is useful when a treemap has small rectangles and the labels cannot be seen easily.

The very simple treemap in Figure 1 below shows the structure of sales for small, medium and large companies. The rectangles are sized by the average sales (in US dollars) for the categories of companies and colored by the profits per employee.

Figure 1: Treemap of company size and profits per employee

The graph in Figure 1 shows the basic idea of a treemap. From this example, you conclude that medium-sized companies have the highest profits per employee based on the colors of the rectangles. You conclude that large companies have the highest average sales from the sizes of the rectangles. However, treemaps can be used to visualize more complex data. Variables with complex hierarchies are good candidates for visualization with a treemap. 

Treemap examples

Example 1: Category and hierarchy treemap

Expanding on the basic example, the treemap in Figure 2 shows two categories of companies and the structure of sales for small, medium and large companies in each category. In the treemap, rectangles are sized by average sales in US dollars for the category-size combination. The treemap colors the rectangles by the profit/employee. 

Figure 2: Treemap with two categories of companies

From the treemap in Figure 2, we see that the profit/employee is highest for small pharmaceutical companies. From the rectangle sizes, we see that large companies in both categories have the highest sales. We also see that medium-sized computer companies have a negative profit/employee. The treemap cannot label the smallest rectangle for the small pharmaceutical companies. This situation is common with larger sets of data that result in many small rectangles. Tools that provide “hover help” or allow interactive drill-down are helpful here. 

The original conclusion that large companies have the highest average sales is still true. However, by using the category variable, we see that the original conclusion that medium-sized companies have the highest profits/employee is not true.

Example 2: Categories and hierarchy for larger sets of variables with many levels

Treemaps are more useful for larger sets of variables with a large number of levels. Figure 3 shows similar financial data as is presented in Figure 2. Now we have a variable for different types of companies with six levels. We also have a variable for the sizes of the companies. This example shows more categories of companies than the previous examples. The rectangles are sized by average sales in US dollars. The rectangles are colored by the type of company, and the rectangles are grouped by company size.

Figure 3: Treemap with many variables and categories

This treemap shows that oil companies have the highest average sales across all levels of the hierarchy for company size. Beverage companies have the lowest average sales for large companies, but not for medium or small companies. For small companies, soap companies have the lowest average sales. For medium companies, aerospace companies have the lowest average sales.

Example 3: Treemap without a hierarchy

Treemaps can also be helpful for data without a hierarchy. The treemap in Figure 4 shows total sleep time in hours for many species of animals. The rectangles are sized by species lifespan, and the rectangles are colored by hours of sleep.

Figure 4: Treemap of data without a hierarchy

From the treemap colors in Figure 4, bats have the longest total sleep time. From the size of the rectangle, the little brown bat has a longer lifespan than the big brown bat. Rectangle size reveals that humans (man) have the longest lifespan in the treemap. 

Example 4: Y-axis category

Previous examples show the categories, or hierarchies, on the x-axis. The example in Figure 5 shows the category hierarchy on the y-axis. The data is from mid-1990s cars, and the treemap boxes are sized by highway miles per gallon (MPG) for the models of cars. The y-axis category variable shows whether or not the car is manufactured in the US.

Figure 5: Treemap with categories on the y-axis

The treemap is helpful for seeing general patterns. For example, are the orange rectangles generally larger than the blue rectangles? The Geo Metro has the highest MPG of all the cars. Using hover help would make it easier to see this, since it would reveal the MPG for each rectangle. The software has automatically ordered the cars alphabetically.

Example 5: Two categories 

Treemaps are helpful for multiple categories where the categories define the structure. The treemap in Figure 6 shows delays for six airlines and the days of the week as the category variables. The rectangles are sized and colored by the average arrival delay.

Figure 6: Treemap with multiple categories

For all airlines, the average arrival delay varies across days of the week. If you wanted to choose an airline with the overall lowest delays, the treemap shows that you should choose either Southwest or Delta. For Southwest and Delta, the average delay is less than eight minutes early in the week and greater on Thursdays and Fridays. For these two airlines, the highest average delay is less than 11 minutes. In contrast, for American, the lowest average delay is 11 minutes. The lowest average delay across the entire treemap is for Southwest on Tuesday. The highest average delay is for American on Friday.