Types of Graphs

Below is a list of several types of graphs that can be used in exploratory data analysis (EDA). Click on each one to see an example of that type of graph, the number of variables that graph uses and a description of its purpose.

Histograms

  • Number of variables: 1.
  • Displays the shape or distribution of data; may help identify outliers.
  • Learn more about histograms.
Figure 1: Histogram

Side-by-Side Histograms

  • Number of variables: 2.
  • Displays the shapes or distributions for groups of data; may help identify outliers.
Figure 2: Side-by-side histograms with two variables

Bar Charts

  • Number of variables: 1.
  • Displays the frequency count of values for a categorical variable; may be vertical (as shown below in Figure 3) or horizontal.
  • Learn more about bar charts.
Figure 3: Bar chart displaying count

Grouped Bar Charts

  • Number of variables: 2 or more, depending on how many variables are used to define groups.
  • Displays bar charts for groups defined by another variable. Grouped bar charts have a separate chart within each level of the grouping variable.
Figure 4: Grouped bar charts

Stacked Bar Charts

  • Number of variables: 2 or more, depending on how many variables are used to define groups.
  • Displays bar charts for groups defined by another variable. Stacked bar charts have a single bar for each level of the grouping variable. Colors or patterns for counts of another variable are stacked in each bar.
Figure 5: Stacked bar chart displaying a single bar for each level of the grouping variable

Pareto Charts

  • Number of variables: 1.
  • Displays ordered frequency counts for a variable. Useful for highlighting the “vital few.” A type of bar chart, Pareto charts often include a cumulative percent curve.
  • Learn more about Pareto charts.
Figure 6: Pareto chart showing ordered frequency counts for a variable

Packed Bar Charts

  • Number of variables: 1.
  • Displays ordered frequency counts for a variable. Used instead of a Pareto chart, especially when there are many categories. Useful for highlighting the “vital few.”
  • Learn more about packed bar charts.
Figure 7: Packed bar chart showing ordered frequency counts for a variable across many categories

Mosaic Plots

  • Number of variables: 2 or more.
  • Shows possible relationships between categorical variables. Useful for finding data errors, such as mistyped categories. A special type of stacked bar chart that shows more than one variable on the x-axis.
  • Learn more about mosaic plots.
Figure 8: Mosaic plot showing possible relationships between categorical variables

Treemaps

  • Number of variables: 2 or more.
  • Shows possible relationships between variables. A special type of stacked bar chart that colors, orders, and sizes by different variables.
  • Learn more about treemaps.
Figure 9: Treemap showing relationships between variables

Box Plots

  • Number of variables: 1.
  • Shows the distribution of data. Parts of the box identify the 25th percentile, median (50th percentile), and 75th percentile. Depending on the data, whiskers show minimum and maximum; outliers occur beyond the whiskers. Used for finding data errors and exploring one variable.
  • Learn more about box plots.
Figure 10: Box plot

Side-by-Side Box Plots

  • Number of variables: 2 or more, depending on how many variables are used to define groups.
  • Displays box plots for groups defined by another variable. Used for finding data errors and exploring two or more variables.
Figure 11: Side-by-side box plot used for exploring two or more variables

Normal Quantile Plots

  • Number of variables: 1.
  • Determines whether or not the assumption that a variable has a normal distribution is reasonable.
Figure 12: Normal quantile plot used to determine if a variable has a normal distribution

Line Graphs

  • Number of variables: 2.
  • Shows changes over time. The x-axis must have values ordered by time. Line graphs, also called line charts or run charts, are useful for finding outliers.
  • Learn more about line graphs.
Figure 13: Line graph showing changes over time

Line Graphs with Categories

  • Number of variables: 2 or more, depending on how many variables are used to define groups.
  • Displays multiple line graphs for groups defined by another variable. Used for understanding changes over time for multiple variables and for finding outliers.
Figure 14: Line graph with categories used to understand how multiple variables change over time

Scatter Plots

  • Number of variables: 2 or more, depending on how many variables are used to define groups for colors and markers.
  • Shows a possible relationship between two variables and identifies outliers. Adding colors and/or markers for other variables can help with EDA. Adding reference lines or specification limits can help identify outliers.
  • Learn more about scatter plots.
Figure 15: Scatter plot showing a possible relationship between two variables

Scatter Plot Matrix

  • Number of variables: Many.
  • Shows possible relationships between multiple variables, looking at all two-way combinations. Additional graphs can be added: histograms for each variable to identify outliers, density ellipses for each scatter plot to identify multidimensional outliers, heatmaps of correlations to clarify possible relationships.
Figure 16: Scatter plot matrix showing possible relationships between multiple variables

Pie Charts

  • Number of variables: 1 or more.
  • Displays part-to-whole relationships for a variable. Adding categories for multiple pie charts is more useful than a single pie chart. For a single variable, a bar chart is easier to distinguish small differences in values.
  • Learn more about pie charts.
Figure 17: Pie chart showing part-to-whole relationships for a variable

Heatmaps

  • Number of variables: 2 or more.
  • Shows possible relationships between variables. Most often used for data that changes over time. Uses color to explore relationships between variables.
  • Learn more about heatmaps.
Figure 18: Heatmap showing possible relationships between variables

Stem-and-Leaf Plots

  • Number of variables: 1.
  • Shows the shape of data and identifies outliers. More widely used before computers were available; histograms are used more often now.
Figure 19: Stem-and-leaf plot showing the shape of data and identifying outliers