Canonical Correlation Analysis¶
Canonical Correlation Analysis (CCA) is a statistical analysis technique to identify correlations between two sets of variables. Given two vector variables X
and Y
, it finds two projections, one for each, to transform them to a common space with maximum correlations.
The package defines a CCA
type to represent a CCA model, and provides a set of methods to access the properties.
Properties¶
Let M
be an instance of CCA
, dx
be the dimension of X
, dy
the dimension of Y
, and p
the output dimension (i.e the dimensio of the common space).

xindim
(M)¶ Get the dimension of
X
, the first set of variables.

yindim
(M)¶ Get the dimension of
Y
, the second set of variables.

outdim
(M)¶ Get the output dimension, i.e that of the common space.

xmean
(M)¶ Get the mean vector of
X
(of lengthdx
).

ymean
(M)¶ Get the mean vector of
Y
(of lengthdy
).

xprojection
(M)¶ Get the projection matrix for
X
(of size(dx, p)
).

yprojection
(M)¶ Get the projection matrix for
Y
(of size(dy, p)
).

correlations
(M)¶ The correlations of the projected componnents (a vector of length
p
).
Transformation¶
Given a CCA model, one can transform observations into both spaces into a common space, as
Here, and are projection matrices for X
and Y
; and are mean vectors.
This package provides methods to do so:

xtransform
(M, x)¶ Transform observations in the Xspace to the common space.
Here,
x
can be either a vector of lengthdx
or a matrix where each column is an observation.

ytransform
(M, y)¶ Transform observations in the Yspace to the common space.
Here,
y
can be either a vector of lengthdy
or a matrix where each column is an observation.
Data Analysis¶
One can use the fit
method to perform CCA over given datasets.

fit
(CCA, X, Y; ...)¶ Perform CCA over the data given in matrices
X
andY
. Each column ofX
andY
is an observation.X
andY
should have the same number of columns (denoted byn
below).This method returns an instance of
CCA
.Keyword arguments:
name description default method The choice of methods:
:cov
: based on covariance matrices:svd
: based on SVD of the input data
:svd
outdim The output dimension, i.e dimension of the common space min(dx, dy, n)
mean The mean vector, which can be either of:
0
: the input data has already been centralizednothing
: this function will compute the mean a precomputed mean vector
nothing
Notes: This function calls
ccacov
orccasvd
internally, depending on the choice of method.
Core Algorithms¶
Two algorithms are implemented in this package: pcacov
and pcastd
.

ccacov
(Cxx, Cyy, Cxy, xmean, ymean, p)¶ Compute CCA based on analysis of the given covariance matrices, using generalized eigenvalue decomposition.
Parameters:  Cxx – The covariance matrix of
X
.  Cyy – The covariance matrix of
Y
.  Cxy – The covariance matrix between
X
andY
.  xmean – The mean vector of the original samples of
X
, which can be a vector of lengthdx
, or an empty vectorFloat64[]
indicating a zero mean.  ymean – The mean vector of the original samples of
Y
, which can be a vector of lengthdy
, or an empty vectorFloat64[]
indicating a zero mean.  p – The output dimension, i.e the dimension of the common space.
Returns: The resultant CCA model.
 Cxx – The covariance matrix of

ccasvd
(Zx, Zy, xmean, ymean, p)¶ Compute CCA based on singular value decomposition of centralized sample matrices
Zx
andZy
.Parameters:  Zx – The centralized sample matrix for
X
.  Zy – The centralized sample matrix for
Y
.  xmean – The mean vector of the original samples of
X
, which can be a vector of lengthdx
, or an empty vectorFloat64[]
indicating a zero mean.  ymean – The mean vector of the original samples of
Y
, which can be a vector of lengthdy
, or an empty vectorFloat64[]
indicating a zero mean.  p – The output dimension, i.e the dimension of the common space.
Returns: The resultant CCA model.
 Zx – The centralized sample matrix for