Frequency Distributions
- Introduction
- Histograms
- Cumulative Frequency Distributions
A frequency distribution is one of the most common graphical tools used to describe a
single population. It is a tabulation of
the frequencies of each value (or range of values). There are a wide variety of ways
to illustrate frequency distributions, including histograms, relative frequency
histograms, density
histograms, and cumulative frequency distributions. Histograms show the frequency of
elements that occur within a certain range of values, while cumulative
distributions show the frequency of elements that occur below a certain value.
Frequency Histogram
- A graphical representation of a single dataset, tallied into classes.
- Frequency defined as the number of values that fall into each class.
- Histogram consists of a series of rectangles whose widths are defined by the limits of the classes, and whose heights are determined by the frequency in each interval.
- Histogram depicts many attributes of the data, including location, spread, and symmetry.
- No rigid set of rules that determine the number of classes or class interval.
- Between 5 and 20 classes suitable for most datasets.
- Equal sized class widths are found by dividing the range by the number of classes.
- Formal guide by which class intervals can be derived is the formula: K = 1 + 3.3 * log n
where K is the number of classes and n is the number of variables.
Relative Frequency Histogram
- Relative frequency defined as the fraction of times the value occurs, or the
freuqency of value(s) ÷ number of observations in the set.
- Relative frequencies usually of more interest than the absolute frequencies.
- Relative frequency histogram constructed by assigning the relative frequencies as heights of the rectangles.
- Sum of all relative frequencies in a dataset is 1.
Density Histogram
- Similar to frequency histogram except heights of rectangles are calculated by dividing relative frequency by class width.
- Resulting rectangle heights called densities, vertical scale called density scale.
- Noteworthy property: (class width * density) = relative frequency.
- Total area of all rectangles equals 1.
Histogram Shapes
- Unimodal: Rises to single peak, then declines.
- Bimodal: Has two distinct peaks.
- Multimodal: More than two peaks.
- Discriptions of skew may also be applied to histograms
(see Measures of Central Tendency section.)
Example: Construct a frequency, relative frequency, and density histogram of net heat flux data at 130° E, 20° N for January 1960 to March 1998.
| Locate Dataset and Variable |
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Cloud Characteristics and Radiation Budget" link.
- Select the
CAYAN dataset.
- Click on the "net heat flux" link under the Datasets and Variables subheading.
CHECK
|
| Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 130E, 20N and Jan 1960 to Mar 1998 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
| Calculate Number of Classes and Class Interval |
- Select the "Filters" link in the function bar.
- Choose the Maximum over "T" command. CHECK
EXPERT
This operation computes the maximum heat flux over the time grid T. The value is located under the Expert Mode text box in bold: 154.3319 W/m2.
Remember this value.
- Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK
This operation undoes the maxover command.
- Again, select the "Filters" link in the function bar.
- Choose the Minimum over "T" command. CHECK
EXPERT
This operation computes the minimum heat flux value over the time grid T. Again, the value is located under the Expert Mode text box in bold: -401.584 W/m2.
Remember this value.
- Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK
This operation undoes the minover command.
- Calculate the range by subtracting the minimum value from the maximum value.
154.3319 - -401.584 = 555.91 which may be rounded to 556.
- Scroll down the page and find the month variable under the Grids subheading.
- Note how many data points are contained in the grid by finding N=___.
You should have seen the number 459.
- Use the following formula to estimate the number of classes: K = 1 + 3.3 * log n
K = 1 + 3.3 * log 459. K = 9.7840 which may be rounded to 10.
- Calculate the class width by dividing the range (556) by the number of classes (10).
556 ÷ 10 = 55.6, however the class interval should be rounded to a whole number.
Rounding the class interval to 56, and beginning the first class at -405 will ensure that the ten classes include all values in the dataset. -405 + 56 * 10 = 155.
|
| Generate Histogram |
- Click on the "Expert Mode" text box in the function bar.
- Type the following command under the text already there:
DATA -405 155 56 RANGESTEP
distrib1D
- Press the OK button. CHECK
The RANGESTEP command defines the range of heat flux values to consider, as well as the spacing to use.
The range -405 to 155 includes the entire set of values. The spacing is 56.
The distrib1D command calculates the frequencies within each class that are used to create the histogram.
|
| View Histogram |
- To view the histogram, choose the viewer marked with colors. CHECK
Heat Flux Frequencies at 130E, 20N for January 1960 to March 1998
The histogram is unimodal and is negatively skewed.
|
| Generate Relative Frequency Histogram |
- Click on the right most link in the blue source bar to exit the viewer. CHECK
- Click in the Expert Mode text box in the function bar.
- Type the following command under the text already there:
459 div
/long_name (Relative Frequency) def
- Press the OK button. CHECK
The div command divides each frequency by 459, the total number of observations.
This transforms the set of frequencies into relative frequencies. The second command changes the variable name from
Frequency to Relative Frequency
|
| View Relative Frequency Histogram |
- To view the relative frequency histogram, choose the viewer marked with colors. CHECK
Heat Flux Relative Frequencies at 130E, 20N for January 1960 to March 1998
The relative frequency histogram looks similar to the frequency histogram, except that the rectangle heights are represented by different values.
|
| Generate Density Histogram |
- Click on the right most link in the blue source bar to exit the viewer. CHECK
- Click in the Expert Mode text box in the function bar.
- Type the following command under the text already there:
56 div
/long_name (Density) def
- Press the OK button. CHECK
The div command divides each relative frequency by 56, the class width.
This transforms the set of relative frequencies into densities. The second command changes the variable name from
Relative Frequency to Density.
|
| View Density Histogram |
- To view the density histogram, choose the viewer marked with colors. CHECK
Heat Flux Relative Frequencies at 130E, 20N for January 1960 to March 1998
The density histogram looks similar to the frequency and relative frequency histograms, except that the rectangle heights are represented by different values. Recall that in a density histogram, the total area of all rectanges is 1.
|
* * * * * * * * * *
Cumulative Frequency Distributions
Cumulative Frequency Distributions
- Cumulative frequency distributions contain all information present in histograms, plus the following:
- Allows user to easily estimate frequencies over several class intervals.
- Provides better estimates of probablities since there is no arbitrary division of data into classes.
- Cumulative distribution function is plotted with cumulative probabilites on the vertical axis and data values on the horizontal axis.
- Cumulative frequencies are obtained by the formula F = m / (n + 1) where m is the mth value in order of magnitude of the series and n is the number of terms in the series.
- F gives the probability that a randomly chosen value will not exceed the data value by which F was calcuated.
- Probability of any exact value occuring is zero.
- Concave downward (upward) cumulative frequency distributions indicative of positively (negatively) skewed data.
Example: Construct a cumulative frequency distribution of net heat flux data at 130° E, 20° N for January 1960 to March 1998.
| Locate Dataset and Variable |
*NOTE: This example uses the same dataset and variable as the previous example.
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Cloud Characteristics and Radiation Budget" link.
- Select the
CAYAN dataset.
- Click on the "net heat flux" link under the Datasets and Variables subheading.
CHECK
|
| Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 130E, 20N and Jan 1960 to Mar 1998 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
| Calculate Maximum and Minimum Values |
- Select the "Filters" link in the function bar.
- Choose the Maximum over "T" command. CHECK
EXPERT
This operation computes the maximum heat flux over the time grid T. The value is located under the Expert Mode text box in bold: 154.3319 W/m2.
Remember this value.
- Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK
This operation undoes the maxover command.
- Again, select the "Filters" link in the function bar.
- Choose the Minimum over "T" command. CHECK
EXPERT
This operation computes the minimum heat flux value over the time grid T. Again, the value is located under the Expert Mode text box in bold: -401.584 W/m2.
Remember this value.
- Click on the right most link prior to the "T 0.0 maxover" box in the source bar. CHECK
This operation undoes the minover command.
|
| Generate Cumulative Frequency Distribution |
- Click on the "Expert Mode" text box in the function bar.
- Type the following command under the text already there:
DATA -405 155 1 RANGESTEP
integrateddistrib1D
- Press the OK button. CHECK
The RANGESTEP command defines the range of heat flux values to consider, as well as the spacing to use.
The range -405 to 155 includes the entire set of values. The spacing will be assigned to 1 because the smallest interval will yield the most precise distribution.
The integrateddistrib1D command creates the cumulative frequency distribution by integrating along the relative frequencies.
|
| View Cumulative Frequency Distribution |
- To view the histogram, choose the viewer marked with colors. CHECK
Cumulative Frequency Distribution of Heat Flux at 130E, 20N for January 1960 to March 1998
The cumulative distribution always begins at a relative frequency of 0 and ends at a relative frequency of 1.
Notice how the distribution rises slowly at first, then rises quickly towards the end. This concave upward look is indicative of negatively skewed data.
Recall from the previous example that the histogram had a tail to the left, signifying negative skewness.
|
Return to Table of Contents