Singular value decomposition (SVD) is quite possibly the most widely-used multivariate statistical technique used in the atmospheric sciences.
The technique was first introduced to meteorology in a 1956 paper by Edward Lorenz, in which he referred to the process as empirical orthogonal function (EOF) analysis.
Today, it is also commonly known as principal-component analysis (PCA). All three names are still used, and refer to the same set of procedures within the Data Library.
The purpose of singular value decomposition is to reduce a dataset containing a large number of values to a dataset containing significantly fewer values,
but which still contains a large fraction of the variability present in the original data.
Often in the atmospheric and geophysical sciences, data will exhibit large spatial correlations. SVD analysis results in a more compact representation
of these correlations, especially with multivariate datasets and
can provide insight into spatial and temporal variations exhibited in the fields of data being analyzed.
There are a few caveats one should be aware of before computing the SVD of a set of data. First, the data must consist of anomalies. Secondly, the data should be de-trended.
When trends in the data exist over time, the first structure often captures them. If the purpose of the analysis is to find spatial correlations independent of trends, the
data should be de-trended before applying SVD analysis.
* * * * * * * * * *
The first structure is the single pattern that represents the most variance in the data.
The structures are the elements of the eigenvectors of the variance-covariance matrix of the data.
In the Data Library, the eigenvectors are also known as EOF's. The first eigenvector (EOF) points to the direction in which the data vectors jointly exhibit the most variability.
Essentially, a new coordinate system is created, with each axis aligned along the direction of maximum joint variability.
The second structure is the pattern that describes the second largest amount of variance, calculated the same way as the first structure. A very important property of the second structure is that it is completely uncorrelated with the first structure, as well as all other following structures.
The second eigenvector is perpendicular to the first eigenvector, which is perpendicular to the third eigenvector and so on. This property is what led Lorenz to call the technique empirical orthogonal function analysis.
All structures are mutually uncorrelated.
The variance of the nth principal component is the nth eigenvalue.
Therefore, the total variation exhibited by the data is equal to the sum of all eigenvalues.
In the Data Library, eigenvalues are normalized such that the sum of all eigenvalues equals 1.
A normalized eigenvalue will indicate the percentage of total variance explained by its corresponding structure.
Structures have also been normalized so that the root mean square equals 1. This way, the structures can be expressed in terms of standard deviation.
Singular values are equal to the square root of the eigenvalues. Since eigenvalues are automatically normalized in the Data Library, they do not easily provide
information into the total amount of variance they explain.
However, you may calculate the total variance explained by each EOF by squaring the singular values.
In the Data Library there is a time series associated with each structure. These time series are also known as principal components.
The first time series is calculated by projecting the data matrix onto the first eigenvector of the variance-covariance matrix of the data, the
second time series by projecting onto the second eigenvector, and so on.
The time series values indicate the amount of the given structure needed to complete the data field.
It follows that the structure (dimensionless) multiplied by the time series value at a single point in time (units of the data),
summed over all structures, yields the original data at that point in time.
Mathematically, there are as many eigenvectors as there are elements in the vector data set.
The first few eigenvectors will point in directions where the data jointly exhibits large variation.
The remaining eigenvectors will point to directions where the data jointly exhibits less variation.
For this reason, it is often possible to capture most of the variation by considering only the first few eigenvectors.
The remaining eigenvectors, along with their corresponding principal components, are truncated.
The ability of SVD to eliminate a large proportion of the data is a primary reason for its use.
* * * * * * * * * *
* * * * * * * * * *
Example: Perform a singular value decomposition of reconstructed sea surface temperature anomaly data in the North Atlantic for the months of December, January, and February from 1870 to 2004.
Locate Dataset and Variable
Compute Monthly Anomalies
Select Temporal and Spatial Domains
The time range entered will select only December, January, and February values for each year.Compute Singular Value Decomposition
{Y cosd} [X Y] [T] svd
The svd function computes the singular value
decomposition of the SST dataset weighted over the cosine of the latitude. Often, spatial
data will be weighted over the cosine of the latitude to account for area changes between meridians at varying latitudes. A weight term, however, is not necessary to complete the SVD analysis.
Five new variables appear under the Datasets and Variables subheading: normalized eigenvalues, structures, singular values, time series, and weights.
While all of the variables are associated with the same new coordinate system generated by the SVD, each contain a different piece of information about the system.
View Normalized Eigenvalues
The first normalized eigenvalue is .233, the second eigenvalue is .151, and the third eigenvalue is .139. Recall that normalized eigenvalues represent the fraction of variance explained
by the structure associated with that eigenvalue. Therefore, the first structure explains 23% of the variance, the second structure 15%, and so on. Looking at the table, there are 402 structures. Yet, the first
three structures account for over 50% of the variance.Return to Dataset Page
This will remove the normalized eigenvector variable selection and return you to the SVD page.
View Structures
This is an image of the 1st structure, which explains 23.2% of the total variance present in the original data.
Recall that the structures have been normalized, and as a result, are unitless quantities.
Note the large negative values off the coast of West Africa. This variability is caused by an ocean-atmosphere coupling system described in the third example.
This is an image of the second structure, which explains 15% of the total variance present in the original data. Notice the large negative values off the east coast of the United States that extend into the Central Atlantic.
These large values may be produced, in part, by the Gulf Stream current, which causes annual variability of SST's in the region. An image of the gulf stream current is provided below. The large values present in the 2nd EOF structure above and the vectors that represent the gulf stream current in the image below appear to overlap.
This region is also aligned with the jet stream, a narrow area where weather patterns move off the coast and cause additional variability in SST's.
The large values in the 2nd structure may also be caused by an atmospheric circulation
pattern known as the North Atlantic Oscillation.
Return to Dataset Page
This will remove the structures variable selection and return you to the SVD page.
View Time Series
*NOTE: The singular values variable can be accessed the same way as the other three variables shown above.
* * * * * * * * * *
Example: Perform a singular value decomposition analysis of mean sea level pressure anomaly data in the North Atlantic for the months of December, January, and February from 1950 to 2004.
Locate Dataset and Variable
Compute Monthly Anomalies
Select Temporal and Spatial Domains
The time range entered will select only December, January, and February values for each year.Compute Singular Value Decomposition
{Y cosd} [X Y] [T] svd
The svd function computes the singular value decomposition of the mean sea level pressure dataset weighted over the cosine of the latitude.
Find Eigenvalue of 1st Structure
The first normalized eigenvalue is .402, the second eigenvalue is .278, and the third eigenvalue is .100. Normalized eigenvalues represent the fraction of varience explained
by the structure associated with that eigenvalue. In this example, we will only be concerned with the first eigenvalue, which explains 40.2% of the total variance.Return to Dataset Page
This will remove the normalized eigenvector variable selection and return you to the SVD page.
View 1st Structure
This is an image of the first structure, which explains 40.2% of the total variance present
in the original data. The large positive values centered around 45° N and the
large negative values centered around 65° N are indicative of two regions whose
mean sea level pressures are generally inversely related. This system is a well known
low-frequency atmospheric circulation pattern called the North Atlantic Oscillation. The
NAO is characterized by large-scale MSLP variablity associated with a subtropical high /
polar low system over the Northern Atlantic. During a postive NAO, the subtropical high is
stronger than usual and the polar low is deeper than usual. The increased pressure gradient
causes stronger winter storms to cross over the Atlantic. During a negative NAO, the
subtropical high and polar low are both weaker than usual, resulting in fewer / less severe
storms crossing the Atlantic.
* * * * * * * * * *
Example: Correlate a SVD time series of mean sea level pressure anomalies with a SVD time series of SST anomalies in the North Atlantic for the months of December, January, and February.
Select Dataset, Variable, and Domains
*NOTE: Datasets used in the example are similar to those used in the previous two examples.
SOURCES .NOAA .NCEP-NCAR .CDAS-1 .MONTHLY .Intrinsic .MSL .pressure
yearly-anomalies
Y (10N) (70N) RANGEEDGES
T (Dec-Feb 1950-2004) VALUES
X (5W) (80W) RANGEEDGES
Compute Singular Value Decomposition
{Y cosd} [X Y] [T] svd
The svd function computes the singular value decomposition of the mean sea level pressure dataset weighted over the cosine of the latitude.
Select Time Series Variable and 1st Eigenvector
You have selected the first eigenvector, and its associated time series.Add the Second Structure SVD Time Series of Reconstructed SST Anomaly Data.
SOURCES .NOAA .NCDC .ERSST .version2 .SST yearly-anomalies
X (5W) (80W) RANGEEDGES
Y (10N) (70N) RANGEEDGES
T (Dec-Feb 1870-2004) VALUES
{Y cosd}[X Y][T]svd
.Ts
ev (1) VALUE
The above commands add the SST anomaly data to the interface. The singular value decomposition of this data has already been preformed, and the 1st eigenvector has been selected.
Correlate Datasets
[T] correlate
The above command correlates the two sets of data. The correlation coefficient is located under the Expert Mode text box in bold: 0.249616.
We can conclude there is a slight correlation between MSLP anomalies and SST anomalies in the North Atlantic.
The correlation coefficient is not very high because correlations between the 1st SST anomaly strucuture, for example, can be found in multiple MSLP anomaly structures. SVD analyses of the MSLP and SST datasets are independent of each other.
There is no guarantee that the maximum amount of association between two variables will be found in two distinct principal component analysis time series.
However, it has been proven that there is a relationship between these two datasets, specifically between these two structures. Atmospheric anomalies do cause SST anomalies, and vice versa. In this example, changes in MSLP sometimes cause an anomalous atmospheric cyclonic circulation
centered around 40° W and 30° N. The cyclone weakens the normal northerly winds off the west coast of Africa. As a result, coastal upwelling is reduced and positive SST anomalies occur. Scroll up the page to the first EOF structure in the first example. Notice the
extremely low values off the coast of West Africa. This SST variability is associated with variations in MSLP that produce the anomalous low.
* * * * * * * * * *
Unrotated emperical orthogonal functions (EOFs) are often very useful to describe natural modes of variability in a data field, due to their spatial and temporal orthogonality, ability to extract the maximum variance from a field, and relative
simplicity.
Yet, unrotated emperical orthogonal functions generally do a poor job of isolating individual modes of variation.
This weakness is largely due to four inherent characteristcs of unrotated EOFs: domain shape dependence, subdomain instability, sensitivity to sampling, and an inaccurate portrayal of the physical relationships embedded within the input data (Richman 1986).
Unrotated EOFs can be primarily determined by the shape of the domain rather than by the covariation of the data.
In these cases, structures of the unrotated EOF analysis do not resemble any of the single input patterns, but rather, they represent combinations of the input patterns.
Unrotated EOFs usually exhibit poor subdomain stability, where subdomain instability refers to the stability of the modal patterns as sub-portions of the domain.
Richman and Lamb (1985) did a study where unrotated EOF analyses were performed on the same set of data, once over an entire domain and once over the northern and southern halves of the domain separately.
The results for each half of the domain did not correspond with the results of the entire domain, which leads to the question: How robust are the results from an unrotated EOF?
When eigenvalues are close together, they may be dominated by noise and the corresponding EOFs may not be well defined.
Unrotated EOFs sometimes produce results that are not physically meaningful.
* * * * * * * * * *
In a rotated EOF analysis, the eigenvectors are weighted by the square root of their corresponding eigenvalues, so that the weights (i.e., loadings) represent
the correlations between each variable and principal component. Most rotations are simple expressions which approximate a simple structure through the application
of mathematical algorithms which distribute the PC loadings such that the dispersion of the loadings is maximized.
Varimax rotation is the most widely accepted method for analytical rotation. The Varimax method reduces variances of the projection of the data onto the rotated basis, where the projection is the principal component time series.
This improves the alignment of the basis with the actual data and improves the relationship between their spatial and temporal patterns and known physical mechanisms. Varimax is a method for rotating the axes of a plot such that the eigenvectors remain orthogonal as they are rotated.
These rotations are used in principal component analysis so that the axes are rotated to a
position in which the sum of the variances of the loadings is the maximum possible (Oilfield Glossary).
In the Data Library, the varimax function requires the user to specify the number of eigenvectors to use in the rotation.
The matrix of loadings is determined by the truncated eigenvectors.
Many atmospheric scientists argue that rotated EOF analysis is a more effective tool than unrotated EOF analysis for the study of atmospheric circulation patterns.
While EOF rotation is often very useful, it is not meant to be a default operation after every EOF analysis.
The application of actual EOFs should be guided by the specific analysis.
* * * * * * * * * *
Advantages
Disadvantages
* * * * * * * * * *
Example: Perform a varimax rotation of an SVD analysis of East Pacific sea surface temperatures.
Locate Dataset and Variable
Select Temporal and Spatial Domains
Compute Singular Value Decomposition
{Y cosd} [X Y] [T] svd
The svd function computes the singular value decomposition of the SST dataset weighted over the cosine of the latitude.
Five new variables appear under the Datasets and Variables subheading: normalized eigenvalues, structures, singular values, time series, and weights.
View Structures
The first structure is representative of the El Niño Southern Oscillation.
Recall that the first structure is the pattern that explains the most variability in the
original set of data.
The relatively large positive values located immediately off the west coast of South America
correspond to the
variability in SSTs caused by upwelling during La Niño years and the lack of
upwelling during El Niño years.
Notice that these values extend westward in a narrow line, and as a result, do not cover much surface area in the Pacific.
However, ENSO generally effects a greater area than depicted by this first structure. One explanation is that part of the ENSO pattern might be contained
in another strucuture, or multiple structures.
Return to Dataset Page
This will remove the structures variable selection and return you to the SVD page.
Perform Varimax Rotation
3 varimax
The varimax function above performs a varimax rotation using the first three eigenvectors. Changing the number before the varimax command will change the number of eigenvalues
to be entered into the function. Seven new variables appear under the Datasets and Variables subheading: varimax rotation, communalities, energy, rotated structures, singular values, time series, and weights.
Select Rotated Structures Variable
View Structures
Notice that the colorscale is not centered around 0°. To enhance the interpretability of the image, the colormap can be adjusted so that
the scale is centered around 0°.
Return to Dataset Page
Generate Colormap
startcolormap
-1.5 1.5 RANGE
white DarkViolet DarkViolet
-1.5 VALUE
cyan
0 VALUE
white 0 bandmax
yellow orange
0.5 VALUE
red
1.5 VALUE
firebrick endcolormap
The colorscale is depicted at the bottom of the dataset page. Values less than -1.5° are assigned the color DarkViolet and values greater than 1.5° are assigned the color firebrick.
Values of 0° are white. Missing values are also white. For more information on colorscales, see the Data Library Tutorial.
View Structures
By rotating the first three eigenvectors via the varimax method, the resulting structure is more representative of the
physical pattern (ENSO) than the unrotated EOF structure illustrated earlier in the example. Pieces of the ENSO pattern contained in the multiple unrotated principal components have been incorporated
into one rotated component. The negative values now extend farther north and south, as
well as to the west. Many times, rotating the EOFs / PCs will result in a solution that better explains the underlying physical patterns in the input data.