Canonical Correlation Analysis¶
Canonical Correlation Analysis (CCA) is a statistical analysis technique to identify correlations between two sets of variables. Given two vector variables X
and Y
, it finds two projections, one for each, to transform them to a common space with maximum correlations.
The package defines a CCA
type to represent a CCA model, and provides a set of methods to access the properties.
Properties¶
Let M
be an instance of CCA
, dx
be the dimension of X
, dy
the dimension of Y
, and p
the output dimension (i.e the dimensio of the common space).
-
xindim
(M)¶ Get the dimension of
X
, the first set of variables.
-
yindim
(M)¶ Get the dimension of
Y
, the second set of variables.
-
outdim
(M)¶ Get the output dimension, i.e that of the common space.
-
xmean
(M)¶ Get the mean vector of
X
(of lengthdx
).
-
ymean
(M)¶ Get the mean vector of
Y
(of lengthdy
).
-
xprojection
(M)¶ Get the projection matrix for
X
(of size(dx, p)
).
-
yprojection
(M)¶ Get the projection matrix for
Y
(of size(dy, p)
).
-
correlations
(M)¶ The correlations of the projected componnents (a vector of length
p
).
Transformation¶
Given a CCA model, one can transform observations into both spaces into a common space, as
Here, and
are projection matrices for
X
and Y
; and
are mean vectors.
This package provides methods to do so:
-
xtransform
(M, x)¶ Transform observations in the X-space to the common space.
Here,
x
can be either a vector of lengthdx
or a matrix where each column is an observation.
-
ytransform
(M, y)¶ Transform observations in the Y-space to the common space.
Here,
y
can be either a vector of lengthdy
or a matrix where each column is an observation.
Data Analysis¶
One can use the fit
method to perform CCA over given datasets.
-
fit
(CCA, X, Y; ...)¶ Perform CCA over the data given in matrices
X
andY
. Each column ofX
andY
is an observation.X
andY
should have the same number of columns (denoted byn
below).This method returns an instance of
CCA
.Keyword arguments:
name description default method The choice of methods:
:cov
: based on covariance matrices:svd
: based on SVD of the input data
:svd
outdim The output dimension, i.e dimension of the common space min(dx, dy, n)
mean The mean vector, which can be either of:
0
: the input data has already been centralizednothing
: this function will compute the mean- a pre-computed mean vector
nothing
Notes: This function calls
ccacov
orccasvd
internally, depending on the choice of method.
Core Algorithms¶
Two algorithms are implemented in this package: ccacov
and ccasvd
.
-
ccacov
(Cxx, Cyy, Cxy, xmean, ymean, p)¶ Compute CCA based on analysis of the given covariance matrices, using generalized eigenvalue decomposition.
Parameters: - Cxx – The covariance matrix of
X
. - Cyy – The covariance matrix of
Y
. - Cxy – The covariance matrix between
X
andY
. - xmean – The mean vector of the original samples of
X
, which can be a vector of lengthdx
, or an empty vectorFloat64[]
indicating a zero mean. - ymean – The mean vector of the original samples of
Y
, which can be a vector of lengthdy
, or an empty vectorFloat64[]
indicating a zero mean. - p – The output dimension, i.e the dimension of the common space.
Returns: The resultant CCA model.
- Cxx – The covariance matrix of
-
ccasvd
(Zx, Zy, xmean, ymean, p)¶ Compute CCA based on singular value decomposition of centralized sample matrices
Zx
andZy
.Parameters: - Zx – The centralized sample matrix for
X
. - Zy – The centralized sample matrix for
Y
. - xmean – The mean vector of the original samples of
X
, which can be a vector of lengthdx
, or an empty vectorFloat64[]
indicating a zero mean. - ymean – The mean vector of the original samples of
Y
, which can be a vector of lengthdy
, or an empty vectorFloat64[]
indicating a zero mean. - p – The output dimension, i.e the dimension of the common space.
Returns: The resultant CCA model.
- Zx – The centralized sample matrix for