## Measures of Central Tendency

One of the most common quantities used to summarize a set of data is its center. The center is a single value, chosen in such a way that it gives a reasonable approximation of normality.

There are many ways to approximate the center of a set of data. One of the most familiar and useful measures of center is the mean, however, using only the mean to approximate normality can often be misleading. To obtain a better understanding of what is considered normal, other measures of central tendency such as the median, the trimmed mean, and the trimean may be utilized in addition to the mean.

## Mean

• Defined as the arithmetic average of the set.
• Calculated by summing all values, then dividing by the number of values.
• One of the simplest measures of center to calculate.
• May provide an incomplete description of the central tendency if not accompanied by other measures.
• Greatly affected by extreme values.
Example: Calculate temporal, spatial, and zonal means of outgoing longwave radiation data over eastern North America for the time period 1980-1999.

## Median

• A commonly used robust and resistant measure of central tendency.
• Defined as the middle value when observations are ordered from smallest to largest.
• Divides the dataset into two parts of equal size, with 50% of the values below the median and 50% of the values above the median.
• Also known as the 50th percentile.
• Insensitive to extreme values.
Example: Calculate the spatial median of outgoing longwave radiation data.
 Locate Dataset and Variable *NOTE: This example uses the same dataset and variable as the previous example. Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Cloud Characteristics and Radiation Budget" link. Select the NOAA NCEP CPC GLOBAL dataset. Select the "monthly" link under the Datasets and Variables subheading. Choose the "outgoing longwave radiation" link, again located under the Datasets and Variables subheading. CHECK Select Temporal and Spatial Domains Click on the "Data Selection" link in the function bar. Enter the text 70W to 90W, 20N to 60N, and Jan 1980 to Dec 1999 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK Calculate Spatial Median Click on the "Expert Mode" link in the function bar. Enter the following text below the text already there: `[X Y]medianover` Press the OK button. CHECK The above command computes the spatial median of the data. The median longitude and median latitude values are located for each time step. View Spatial Median To see the results of this operation, choose the time series viewer. CHECK Spatial Median of Outgoing Longwave Radiation at 70W-90W, 20N-60N for Jan 1980 - Dec 1999 The spatial median time series looks similar to the spatial average time series from the previous example. Yet, even though the two images look similar, they are not identical. In a non-symmetrical dataset such as this, the median is different from the mean. *NOTE: A temporal or zonal median can be generated the same way as temporal averages in Expert Mode. For a temporal median, enter the command [T] medianover.

## Trimmed Mean

• Discards a percentage of the outlying values before calculating the arithmetic average.
• A measure that incorporates characteristics of the mean and the median.
• Less affected by outliers than the untrimmed average.
• A x% trimmed mean will eliminate the largest x% and the smallest x% of the sample before calculated the mean.
• Typical range for x% is 5% to 25%.
Wilks, Daniel S. Statistical Methods in the Atmospheric Sciences. p 26.
Example: Calculate the 20% trimmed mean of spatially averaged outgoing longwave radiation data.
 Locate Dataset and Variable *NOTE: This example uses the same dataset and variable as the previous example. Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Cloud Characteristics and Radiation Budget" link. Select the NOAA NCEP CPC GLOBAL dataset. Select the "monthly" link under the Datasets and Variables subheading. Choose the "outgoing longwave radiation" link, again located under the Datasets and Variables subheading. CHECK Select Temporal and Spatial Domains Click on the "Data Selection" link in the function bar. Enter the text 70W to 90W, 20N to 60N, and Jan 1980 to Dec 1999 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK Calculate Spatial Average Click on the "Filters" link in the function bar. Select the Average over "XY" command. CHECK EXPERT This command takes the spatial average of the data (refer to section on calculating the Mean) Calculate Trimmed Mean Click on the "Expert Mode" link in the function bar. Enter the following lines below the text already there: ```dup [T]0.2 0.8 0.0 replacebypercentile a: percentile 0.2 VALUE :a: percentile 0.8 VALUE :a masknotrange ``` Press the OK button. CHECK The commands above discard the lowest 20% of the values and the highest 20% of the values from the dataset. First, the dup command duplicates the current variable and adds it to the stack. The replacebypercentile command identifies the 20th and 80th percentiles and the remaining code masks out the values below and above these thresholds, respectively. Click on the time series viewer in the function bar. CHECK Outgoing Longwave Radiation at 70W-90W, 20N-60N for Jan 1980 - Dec 1999 With Extreme Values Masked Out Select the right most link in the blue source bar to return to the dataset page. Click on the "Filters" link in the function bar. Select the Average over "T" command. CHECK EXPERT The operation computes the temporal average of the data, after trimming. Since time was the only remaining independent variable, the result of this function is a single value. The value should be located under the Expert Mode text box in bold: 225.9467 W/m2

## Trimean

• Weighted average of the median and the quartiles.
• Mean receives twice as much weight as the quartiles.
• More representative of the magnitude of the data values than the median.
Example: Calculate the trimean of spatially averaged outgoing longwave radiation data.
 Locate Dataset and Variable *NOTE: This example uses the same dataset and variable as the previous example. Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Cloud Characteristics and Radiation Budget" link. Select the NOAA NCEP CPC GLOBAL dataset. Select the "monthly" link under the Datasets and Variables subheading. Choose the "outgoing longwave radiation" link, again located under the Datasets and Variables subheading. CHECK Select Temporal and Spatial Domains Click on the "Data Selection" link in the function bar. Enter the text 70W to 90W, 20N to 60N, and Jan 1980 to Dec 1999 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK Calculate Spatial Average Click on the "Filters" link in the function bar. Select the Average over "XY" command. CHECK EXPERT This command calculates the spatial average of the data (refer to section on calculating the mean) Calculate Trimean Click on the "Expert Mode" link in the function bar. Enter the following text below the text already there: ```[T]0.25 0.5 0.75 0 replacebypercentile a: percentile 0.5 VALUE 2 mul :a: percentile 0.25 VALUE :a: percentile 0.75 VALUE :a add add 4 div ``` Press the OK button. CHECK The [T] 0.25 0.5 0.75 0 replacebypercentile command computes the 25th percentile, the median, and the 75th quartile. The percentile 0.5 VALUE and 2 mul commands take the median value and multiply it by 2. The following 7 lines add the 25th and 75th percentiles to the resulting sum. The last command, 4 div, divides the total sum by four. The resulting value is the trimean. The value should be located below the Expert Mode text box in bold: 226.1863 W/m2.

## Interpreting Measures of Central Tendency

• If the population is non-symmetrical, it is said to be skewed.
• Negatively skewed distributions:
1. Tail to the left.
2. Median always greater than mean.
• Positively skewed distributions:
1. Tail to the right.
2. Mean always greater than median.
• Before making inferences from the data, decide whether the mean or median is of greater interest and then proceed accordingly.
Interpreting Measures of Central Tendency: Caveats in the Atmospheric Sciences
1. The computed mean is not a completely reliable estimate of the climate system's true long-term mean state.
• Observations only taken over a limited observing period, at discrete times and varying locations.
• Mean may be affected by instrumental or human errors.
• Subjective and objective interpolation of station data both introduce error not present in the raw station data.
2. The mean state is not necessarily a typical state.
• Long-term mean masks a great deal of intraperiod variability.
• Spatial variability within each period is larger than that of the long-term mean.
• Often the long-term mean field is unlikely to be observed, due to variablity within the period.
• Sample mean becomes an increasingly better estimator of the long-term mean as the number of elements in the sample increases.
3. The climatological mean is a moving target.
• Today's climate is different from climates of the past.
• Climatology generally based on 10 - 30 year time period.
• Shorter climatologies more representative of today's climate.
A certain estimate of the center is never considered to be "wrong" or "right". It is an approximation based on a given set of data and parameters. The probability that the exact value aligns perfectly with the estimated value is extremely low. Nevertheless, some estimates will be closer to the actual value than others. Example: Calculate spatially averaged July rainfall from 10E to 25W and 0N to 30N and show that the mean value is atypical for the region.
 Locate Dataset and Variable Select the "Datasets by Catagory" link in the blue banner on the Data Library page. Click on the "Atmosphere" link. Select the NOAA NCEP CPC Merged_Analysis dataset. Select the "monthly" link under the Datasets and Variables subheading. Select the "December 2003 Release" link under the Datasets and Variables subheading. Choose the "Version 2" link. Click on the "CMAP estimated precipitation" link. CHECK Select Spatial and Temporal Domains Click on the "Data Selection" link in the function bar. Enter the text 10E to 25W, 0N to 30N, and Jan 1980 to Dec 2002 in the appropriate text boxes. Press the Restrict Ranges button and then the Stop Selecting button. CHECK Compute Average July Precipitation Click on the "Expert Mode" link in the function bar. Enter the following lines below the text already there: ```T 12 splitstreamgrid T (Jul) VALUES ``` Press the OK button. CHECK The splitstreamgrid function splits the time grid into two new parts, T and T2. The T grid has a period of 12 months and a step of 1. This grid represents data from January, Februrary, March, etc. The T2 grid has a step of 12 and represents the years from the beginning of the dataset to the end of the dataset. The subsequent command retains only the July values from the T grid. Click on the "Filters" link in the function bar. Select the Average over "T2" command. CHECK EXPERT This operation averages over the second time grid, T2, thereby calculating the temporal average of July rainfall over the years from 1980 to 2002. View Average July Precipitation To see the results of this operation, choose the viewer window with coasts outlined. CHECK Estimated Average July Precipitation for 10E to 25W, 0N to 30N Average July rainfall is not constant over the spatial grid. There seems to be a distinct rainy area to the south and a distinct dry area to the north. Calculate Spatial Average Click on the right most link in the blue source bar to exit the viewer. Click on the "Filters" link in the function bar. Select the Average over "XY" command. CHECK EXPERT The spatially averaged precipitation value is located under the Expert Mode text box in bold: 3.169257 mm/day. This value, 3.169257 mm/day, represents the mean July precipitation per day in the region from 10E to 25W and from the equator to 30N. Look again at the image of average July rainfall over the spatial grid (see above). There are a few locations on the border of wet and dry with daily rainfall near 3.17 mm/day, yet, it is not a typical value for the region. In most places, the location is either wet (above 4 mm/day), or it is dry (below 2 mm/day). This example illustrates that the mean state is not necessarily the typical state of the system.
Some additional functions associated with central tendency, along with corresponding examples, are located in the basic Tutorial (e.g., box average, running average).