Frequency Distributions

A frequency distribution is one of the most common graphical tools used to describe a single population. It is a tabulation of the frequencies of each value (or range of values).

There are a wide variety of ways to illustrate frequency distributions, including histograms, relative frequency histograms, density histograms, and cumulative frequency distributions. Histograms show the frequency of elements that occur within a certain range of values, while cumulative distributions show the frequency of elements that occur below a certain value.

Histograms

Frequency Histogram

A graphical representation of a single dataset, tallied into classes.
Frequency defined as the number of values that fall into each class.
Histogram consists of a series of rectangles whose widths are defined by the limits of the classes, and whose heights are determined by the frequency in each interval.
Histogram depicts many attributes of the data, including location, spread, and symmetry.

No rigid set of rules that determine the number of classes or class interval.
Between 5 and 20 classes suitable for most datasets.
Equal sized class widths are found by dividing the range by the number of classes.
Formal guide by which class intervals can be derived is the formula: K = 1 + 3.3 * log n
where K is the number of classes and n is the number of variables.

Relative Frequency Histogram

Relative frequency defined as the fraction of times the value occurs, or the freuqency of value(s) ÷ number of observations in the set.
Relative frequencies usually of more interest than the absolute frequencies.
Relative frequency histogram constructed by assigning the relative frequencies as heights of the rectangles.
Sum of all relative frequencies in a dataset is 1.

Density Histogram

Similar to frequency histogram except heights of rectangles are calculated by dividing relative frequency by class width.
Resulting rectangle heights called densities, vertical scale called density scale.
Noteworthy property: (class width * density) = relative frequency.
Total area of all rectangles equals 1.

Histogram Shapes

Unimodal: Rises to single peak, then declines.
Bimodal: Has two distinct peaks.
Multimodal: More than two peaks.
Discriptions of skew may also be applied to histograms (see Measures of Central Tendency section.)

Example: Construct a frequency, relative frequency, and density histogram of net heat flux data at 130° E, 20° N for January 1960 to March 1998.

Locate Dataset and Variable	Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Cloud Characteristics and Radiation Budget" link. Select the CAYAN dataset. Click on the "net heat flux" link under the Datasets and Variables subheading. CHECK
Select Temporal and Spatial Domains	Click on the "Data Selection" link in the function bar. Enter the text 130E, 20N and Jan 1960 to Mar 1998 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK
Calculate Number of Classes and Class Interval	Select the "Filters" link in the function bar. Choose the Maximum over "T" command. CHECK EXPERT This operation computes the maximum heat flux over the time grid T. The value is located under the Expert Mode text box in bold: 154.3319 W/m². Remember this value. Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK This operation undoes the maxover command. Again, select the "Filters" link in the function bar. Choose the Minimum over "T" command. CHECK EXPERT This operation computes the minimum heat flux value over the time grid T. Again, the value is located under the Expert Mode text box in bold: -401.584 W/m². Remember this value. Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK This operation undoes the minover command. Calculate the range by subtracting the minimum value from the maximum value. 154.3319 - -401.584 = 555.91 which may be rounded to 556. Scroll down the page and find the month variable under the Grids subheading. Note how many data points are contained in the grid by finding N=___. You should have seen the number 459. Use the following formula to estimate the number of classes: K = 1 + 3.3 * log n K = 1 + 3.3 * log 459. K = 9.7840 which may be rounded to 10. Calculate the class width by dividing the range (556) by the number of classes (10). 556 ÷ 10 = 55.6, however the class interval should be rounded to a whole number. Rounding the class interval to 56, and beginning the first class at -405 will ensure that the ten classes include all values in the dataset. -405 + 56 * 10 = 155.
Generate Histogram	Click on the "Expert Mode" text box in the function bar. Type the following command under the text already there: DATA -405 155 56 RANGESTEP distrib1D Press the OK button. CHECK The RANGESTEP command defines the range of heat flux values to consider, as well as the spacing to use. The range -405 to 155 includes the entire set of values. The spacing is 56. The distrib1D command calculates the frequencies within each class that are used to create the histogram.
View Histogram	To view the histogram, choose the viewer marked with colors. CHECK Heat Flux Frequencies at 130E, 20N for January 1960 to March 1998 The histogram is unimodal and is negatively skewed.
Generate Relative Frequency Histogram	Click on the right most link in the blue source bar to exit the viewer. CHECK Click in the Expert Mode text box in the function bar. Type the following command under the text already there: 459 div /long_name (Relative Frequency) def Press the OK button. CHECK The div command divides each frequency by 459, the total number of observations. This transforms the set of frequencies into relative frequencies. The second command changes the variable name from Frequency to Relative Frequency
View Relative Frequency Histogram	To view the relative frequency histogram, choose the viewer marked with colors. CHECK Heat Flux Relative Frequencies at 130E, 20N for January 1960 to March 1998 The relative frequency histogram looks similar to the frequency histogram, except that the rectangle heights are represented by different values.
Generate Density Histogram	Click on the right most link in the blue source bar to exit the viewer. CHECK Click in the Expert Mode text box in the function bar. Type the following command under the text already there: 56 div /long_name (Density) def Press the OK button. CHECK The div command divides each relative frequency by 56, the class width. This transforms the set of relative frequencies into densities. The second command changes the variable name from Relative Frequency to Density.
View Density Histogram	To view the density histogram, choose the viewer marked with colors. CHECK Heat Flux Relative Frequencies at 130E, 20N for January 1960 to March 1998 The density histogram looks similar to the frequency and relative frequency histograms, except that the rectangle heights are represented by different values. Recall that in a density histogram, the total area of all rectanges is 1.

Cumulative Frequency Distributions

Cumulative frequency distributions contain all information present in histograms, plus the following:

Allows user to easily estimate frequencies over several class intervals.
Provides better estimates of probablities since there is no arbitrary division of data into classes.

Cumulative distribution function is plotted with cumulative probabilites on the vertical axis and data values on the horizontal axis.
Cumulative frequencies are obtained by the formula F = m / (n + 1) where m is the m^th value in order of magnitude of the series and n is the number of terms in the series.
F gives the probability that a randomly chosen value will not exceed the data value by which F was calcuated.
Probability of any exact value occuring is zero.
Concave downward (upward) cumulative frequency distributions indicative of positively (negatively) skewed data.

Example: Construct a cumulative frequency distribution of net heat flux data at 130° E, 20° N for January 1960 to March 1998.

Locate Dataset and Variable	*NOTE: This example uses the same dataset and variable as the previous example. Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Cloud Characteristics and Radiation Budget" link. Select the CAYAN dataset. Click on the "net heat flux" link under the Datasets and Variables subheading. CHECK
Select Temporal and Spatial Domains	Click on the "Data Selection" link in the function bar. Enter the text 130E, 20N and Jan 1960 to Mar 1998 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK
Calculate Maximum and Minimum Values	Select the "Filters" link in the function bar. Choose the Maximum over "T" command. CHECK EXPERT This operation computes the maximum heat flux over the time grid T. The value is located under the Expert Mode text box in bold: 154.3319 W/m². Remember this value. Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK This operation undoes the maxover command. Again, select the "Filters" link in the function bar. Choose the Minimum over "T" command. CHECK EXPERT This operation computes the minimum heat flux value over the time grid T. Again, the value is located under the Expert Mode text box in bold: -401.584 W/m². Remember this value. Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK This operation undoes the minover command.
Generate Cumulative Frequency Distribution	Click on the "Expert Mode" text box in the function bar. Type the following command under the text already there: DATA -405 155 1 RANGESTEP integrateddistrib1D Press the OK button. CHECK The RANGESTEP command defines the range of heat flux values to consider, as well as the spacing to use. The range -405 to 155 includes the entire set of values. The spacing will be assigned to 1 because the smallest interval will yield the most precise distribution. The integrateddistrib1D command creates the cumulative frequency distribution by integrating along the relative frequencies.
View Cumulative Frequency Distribution	To view the histogram, choose the viewer marked with colors. CHECK Cumulative Frequency Distribution of Heat Flux at 130E, 20N for January 1960 to March 1998 The cumulative distribution always begins at a relative frequency of 0 and ends at a relative frequency of 1. Notice how the distribution rises slowly at first, then rises quickly towards the end. This concave upward look is indicative of negatively skewed data. Recall from the previous example that the histogram had a tail to the left, signifying negative skewness.