Beta-Binomial Distribution

What is the beta-binomial distribution?

The beta-binomial distribution models the number of successes in n independent Bernoulli trials. It is an extension of the binomial distribution for cases where the probability of success is not the same in each trial, as it is in the binomial, but is a random variable with a beta distribution. It is useful for modeling binomial data when the variance is larger than expected under a binomial distribution.

What are some examples of the beta-binomial distribution?

Some examples of the beta-binomial distribution are:

  • The number of heads in a sequence of coin tosses using multiple weighted (i.e., not fair) coins, where each coin’s probability of coming up heads might differ due to different weights.
  • The number of defective items in lots of fixed size, where the defect rate from lot to lot is random.
  • The number of successful basketball free throws made by twenty people with differing skill levels in a free throw competition.

When do you use a beta-binomial distribution?

Suppose you collect data from n independent, binary trials and notice that the counts have more variance than is expected under the binomial model; the data are overdispersed. Overdispersion can occur when the probability of success is not constant across trials but instead varies from one trial to another. In such cases, the beta-binomial is a good model. For example, if you wanted to model these data as a function of independent predictor variables, you could use a generalized regression model and specify the beta-binomial for the response distribution. (If the data are underdispersed, meaning that the observed variance is smaller than the expected variance, consider using the hypergeometric distribution to model the data.)

Characteristics of a beta-binomial distribution

The image below shows beta-binomial distributions for different values of a and b when n = 10.

Example of a beta-binomial random variable

Suppose you conduct an informal poll of 20 people randomly selected from different geographic regions. There is a single question or statement that respondents can either agree or disagree with: Yes or No, where a Yes response is a success. If the probability of answering Yes (success) varies by region, then the total number of successes follows a beta-binomial distribution. In other words, the number of successes follows a binomial distribution with n = 20, but p is not fixed. Assume p ~ Beta(8, 2) (the notation ~ is read as “follows” or “is distributed as,” so in this case, p is distributed as a beta distribution with $\alpha$ = 8 and $\beta$ = 2). How would you find the probability of getting exactly 15 successes?

Let X be the number of Yes responses (successes) out of the 20 people polled. Then X ~ Beta-Binomial(20, p), where p ~ Beta(8, 2). We can use the beta-binomial mass function to find the probability that exactly 15 poll responses are Yes.

$\ P(X = 15) = \binom{20}{15} \frac{B(15+8,\;20-15+2)}{B(8,2)} = 0.1022$

The probability of exactly 15 out of 20 poll respondents answering Yes, given $\alpha$ = 8 and $\beta$ = 2, is slightly over 10%.