In statistics, there are many terms that have subtle distinctions between them. One example of this is the difference between frequency and relative frequency. Although there are many uses for relative frequencies, there is one in particular that involves a relative frequency histogram. This is a type of graph that has connections to other topics in statistics and mathematical statistics.

## Definition

Histograms are statistical graphs that look like bar graphs. Typically, however, the term histogram is reserved for quantitative variables. The horizontal axis of a histogram is a number line containing classes or bins of uniform length. These bins are intervals of a number line where data can fall and can consist of a single number (typically for discrete data sets that are relatively small) or a range of values (for larger discrete data sets and continuous data).

For example, we may be interested in considering the distribution of scores on a 50 point quiz for a class of students. One possible way to construct the bins would be to have a different bin for every 10 points.

The vertical axis of a histogram represents the count or frequency that a data value occurs in each of the bins. The higher the bar is, the more data values fall into this range of bin values. To return to our example, if we there are five students who scored more than 40 points on the quiz, then the bar corresponding to the 40 to 50 bin will be five units high.

## Frequency Histogram Comparison

A relative frequency histogram is a minor modification of a typical frequency histogram. Rather than using a vertical axis for the count of data values that fall into a given bin, we use this axis to represent the overall proportion of data values that fall into this bin. Since 100% = 1, all bars must have a height from 0 to 1. Furthermore, the heights of all of the bars in our relative frequency histogram must sum to 1.

Thus, in the running example that we have been looking at, suppose that there are 25 students in our class and five have scored more than 40 points. Rather than constructing a bar of height five for this bin, we would have a bar of height 5/25 = 0.2.

Comparing a histogram to a relative frequency histogram, each with the same bins, we will notice something. The overall shape of the histograms will be identical. A relative frequency histogram does not emphasize the overall counts in each bin. Instead, this type of graph focuses on how the number of data values in the bin relates to the other bins. The way that it shows this relationship is by percentages of the total number of data values.

## Probability Mass Functions

We may wonder what the point is in defining a relative frequency histogram. One key application pertains to discrete random variables where our bins are of width one and are centered about each nonnegative integer. In this case, we can define a piecewise function with values corresponding to the vertical heights of the bars in our relative frequency histogram.

This type of function is called a probability mass function. The reason for constructing the function in this way is that the curve that is defined by the function has a direct connection to probability. The area underneath the curve from the values *a* to *b* is the probability that the random variable has a value from *a* to *b*.

The connection between probability and area under the curve is one that shows up repeatedly in mathematical statistics. Using a probability mass function to model a relative frequency histogram is another such connection.