| Data Homogeneity |
It is often important to determine if a set of data is homogeneous before any statistical
technique is applied to it.
Homogeneous data are drawn from a single population.
In other words, all outside processes that could potentially affect the data
must remain constant for the complete time period of the sample.
Inhomogeneities are caused when artificial changes affect the statistical
properties of the observations through time. These changes may be abrupt
or gradual, depending on the nature of the disturbance.
Realistically, obtaining perfectly homogeneous data is almost impossible,
as unavoidable changes in the area surrounding the observing station will
often affect the data.
Interpreting climate data with unknown homogeneity:
* * * * * * * * * *
The following method may be used to determine whether a set of data can be
considered homogeneous to a certain degree of accuracy.
10 8 13 11 9 14 12 9 16 13 10 17 14 11 18 15 12 19 16 13 20 17 14 21 18 15 22 19 16 23 20 16 25 25 22 30 30 26 36 35 31 41 40 35 47 45 40 52 50 45 57
Example: Analyze homogeneity of data by comparing the the annual mean of the daily minimum temperature time series for Sherbrooke, Quebec and Shawinigan, Quebec from 1920 to 1970.
Locate Dataset, Variable, and Station
Select Temporal Domain
Compute Yearly Mean Minimum Temperature
T 365 boxAverage
This command computes the mean minimum temperature for each year by taking a 365-day average of the minimum daily temperature.
This is not an exact yearly average because every 4 years is a leap year, with one extra day.
Every four years, the 365-day range will start one day earlier. This being ignored, we are still left with a good approximation of the mean minimum temperature per year.
View Yearly Mean Minimum Temperature Time Series
Subtract Median From Dataset
[T] 1 medianover
The median should be located below the expert mode text box in bold: 0.8683567 degrees Celsius. Take note of this value.
The medianover function is further explained in the Measures of Central Tendency section.
The above command subtracts the median (0.8683567° Celsius) from each value in the dataset.
Analyze Homogeneity of Data
A table will appear with Time in one column and (Min Temp - 0.8683567) in the other column. The day in the Time column changes every four years
because of the leap year issue mentioned earlier.
There are 18 runs in the Sherbrooke data from 1920 to 1970. The total number of elements that make up the sample is 50 (each yearly mean minimum temperature constitutes one element).
According to the table, at a .10 significance limit there should be at least 22 runs.
We can therefore conclude, with 90% confidence, that this data is not homogeneous. Is this inhomogeneity caused by a large-scale climatic change or by an inconsistancy in the area surrounding the observing station?
To answer this question, we analyze the mean minimum temperature at another station only a few miles away.
Repeat the same process for Shawinigan:
Locate Dataset and Variable
Select Temporal Domain and Station
The station ID 7018000 is for Shawinigan. To get
more information on finding station ID's, click the following link
to the tutorial: How
to Find A Station ID
Compute Yearly Mean Minimum Temperature
T 365 boxAverage
View Yearly Mean Minimum Temperature Time Series
Based on visual inspection, these data appear to be more homogeneous than the
data that taken at Sherbrooke. There isn't a distinct upward trend in the
minimum temperatures, as there was in the Sherbrooke data.
Subtract Median From Dataset
[T] 1 medianover
The median should be -0.6845208 degrees Celsius. Take note of this value.
-0.6845208 sub
The above command subtracts the median (-0.6845208° Celsius) from each value in the dataset.
Analyze Homogeneity of Data
A table will appear with Time in one column and (Min Temp - -0.6845208) in the other column.