It is often convenient/necessary to think of time series as driven by several underlying latent series. For example, for high-dimensional multivariate series, these latent, lower-dimensional series could be the “factors” that are loaded onto the high-dimensional multivariate series and that explain much of the variability in the original series. Another example concerns the so-called blind source separation.
Multiple macroeconomic quarterly indicators
A multivariate (high-dimensional) time series derived from multiple macroeconomic quarterly indicators (e.g. GDP, GNP, industrial product, etc.). 144 component series, spanning 190 quarters. Source: Stock and Watson (2008).
load("usquarter.Rdata")
Ye <- Y
q <- dim(Ye)[1]
q
## [1] 144
T <- dim(Ye)[2]
T
## [1] 190
nf <- 12 ## Number of component series to plot
par(mfrow=c(nf/3,3))
par(mar=c(2,2,2,2))
for (l in (1:nf)) {
plot.ts(Ye[l,])
}
Note that several of the component series look visually similar, and could potentially be driven by one underlying “factor” series.
Audio data
A time series of audio data from two sensors (courtesy of P. Hoff, Duke):
load("audio_data")
Ya <- Y
dim(Ya)
## [1] 80000 2
par(mfrow=c(1,2))
plot(Ya[1000:1500,1],type="l")
plot(Ya[1000:1500,2],type="l")
library(audio)
#play(.1*Ya[,1]/sd(Ya[,1]) ,rate=8000)
#play(.1*Ya[,2]/sd(Ya[,2]) ,rate=8000)
How does one filter out the two underlying audio signals? (The problem is known as blind source separation or “cocktail party” in signal processing.)
The goal of the lecture is to discuss and illustrate some basic techniques related to the questions raised above: dynamic factor models (DFMs) and independent component analysis (ICA).
Suppose a \(q\)-vector time series \(\mathbf{Y}_t\), \(t=1,\ldots, T\), is stationary-like. Suppose the component series have been standardized to have mean 0 and variance 1. We wish to define a factor model for it.
Definition 5.1: A \(q\)-vector time series \(\{\mathbf{Y}_t\}\) follows a dynamic factor model (DFM, for short) if \[ \mathbf{Y}_t = \Lambda \mathbf{F}_t + \varepsilon_t, \] where \(\Lambda\) is a \(q\times r\) loading matrix with \(r<<q\), \(\{\mathbf{F}_t\}\) is a \(r\)-vector time series whose components are \(r\) factors, and \(\{\varepsilon_t\}\) is \(\mbox{WN}(0,\Sigma_\varepsilon)\) series, uncorrelated with \(\{\mathbf{F}_t\}\). Moreover, it is often assumed that the factors \(\{\mathbf{F}_t\}\) follow a VAR\((p)\) model \[ \Phi(B) \mathbf{F}_t = \mathbf{Z}_t, \] where \(\Phi(B) = I - \Phi_1 B - \ldots - \Phi_p B^p\) with \(r\times r\) matrices \(\Phi_i\) and \(\{\mathbf{Z}_t\}\sim \mbox{WN}(0,\Sigma_{\mathbf Z})\).
Notes
Estimation for a given number of factors
For a fixed number of factors \(r\), the DFM parameters can be estimated through the following steps:
If needed, the covariance matrices of the error terms can naturally be estimated through sample covariance matrices of the residuals.
Why is this expected to work? See Bai and Ng (2008), Doz et al. (2011), Stock and Watson (2011) for assumptions and proofs. Note that the assumptions are such that \(q,T\to\infty\).
Notes
Estimation of the number of factors
One popular approach is based on the information criteria (e.g. Bai and Ng (2007)(2008)), with the number of factors estimated by \[ \widehat r = \mathop{\mbox{argmin}}_r \Big\{ \log\Big(\mbox{SSE}(r)\Big) + r g(q,T)\Big\}, \] where \(g(q,T)\) is one of the following three functions, \[ g_1(q,T) = \frac{q+T}{qT} \log\Big( \frac{q+T}{qT} \Big),\quad g_2(q,T) = \frac{q+T}{qT} \log( q\wedge T),\quad g_2(q,T) = \frac{\log( q\wedge T)}{q\wedge T} \] with \(q\wedge T = \min(q,T)\).
Other approaches include ad-hoc eigenvalue scree plots, statistical tests based on random matrix theory (e.g. Onatski (2009)), etc.
Forecasting with DFM
DFM are particularly liked by practitioners because they can be used easily and seem to do well in forecasting. Given a time series \(\mathbf{Y}_t\), \(t=1,\ldots,T\), and its DFM with estimated factors \(\widehat{\mathbf F}_t\) and loading matrix \(\widehat \Lambda\), the \(h\)-step-ahead forecast \(\widehat{\mathbf{Y}}_T(h)\) is defined as \[ \widehat{\mathbf{Y}}_T(h) = \widehat \Lambda \widehat{\mathbf F}_T(h), \] where \(\widehat{\mathbf F}_T(h)\) is the \(h\)-step-ahead forecast of \(\widehat{\mathbf F}_{T+h}\) based on the VAR\((p)\) model fitted to the factors \(\widehat{\mathbf F}_t\).
Multiple macroeconomic quarterly indicators
library(vars)
source("DFM-VAR-library.R")
ICselect(Ye, maxr=5)
## $err
## [1] 0.7576303 0.6797813 0.6333382 0.5912211 0.5569279
##
## $psum1
## [1] 0.8114133 0.7873473 0.7946872 0.8063531 0.8258429
##
## $psum2
## [1] 0.8182998 0.8011203 0.8153467 0.8338991 0.8602755
##
## $psum3
## [1] 0.7921429 0.7488065 0.7368760 0.7292715 0.7294909
##
## $order
## [1] 2 2 4
out <- DFM.VAR(Ye, r=2)
## Factors
#out$Fh
plot.ts(t(out$Fh))
## Loadings are
#out$Lam
plot.ts(out$Lam)
State flu data set
stateflu <- read.csv("states_abbrev.csv", sep=",", header=TRUE)
stateflu <- as.matrix(stateflu[,c(-1)])
stateflu <- log(na.omit(stateflu))
statefluR <- stateflu[,c("CT","ME","MA","NH","RI","VT","NJ","NY","DE","DC","MD","PA","VA","WV","AL","FL","GA","KY","MS","NC","SC","TN", "IL","IN","MI","MN","OH","WI","AR","LA","NM","OK","TX","IA","KS","MO","NE","CO","MT","ND","SD","UT","WY","AZ","CA","HI","NV", "AK","ID","OR","WA")]
statefluR <- statefluR - matrix(1,dim(statefluR)[1],1) %*% colMeans(statefluR)
statefluR <- statefluR %*% diag(diag(var(statefluR))^(-1/2))
ICselect(t(statefluR), maxr=5)
## $err
## [1] 0.09769093 0.06579661 0.05026991 0.03969935 0.03348386
##
## $psum1
## [1] 0.1849141 0.2402429 0.3119394 0.3885920 0.4695996
##
## $psum2
## [1] 0.1887778 0.2479704 0.3235306 0.4040469 0.4889183
##
## $psum3
## [1] 0.1747856 0.2199859 0.2815538 0.3480778 0.4189570
##
## $order
## [1] 1 1 1
vvflu <- svd(cov(statefluR))
qflu <- dim(t(statefluR))[1]
Lamflu <- vvflu$u[,1]*sqrt(qflu)
Fhflu <- 1/qflu*t(Lamflu) %*% t(statefluR)
## Factors
#out$Fh
plot.ts(t(Fhflu))
## Loadings are
#out$Lam
plot(Lamflu,type="l")
abline(h = 0, v = c(6,8,14,22,28,33,37,43,47,51), col = "black")
Note: Out-of-sample forecasts for DFM are worse than those based on sparse VAR model.
Audio data
ICselect(t(Ya), maxr=2)
## $err
## [1] 1.188254e-01 2.506199e-32
##
## $psum1
## [1] 0.4653951 0.6931395
##
## $psum2
## [1] 0.4654076 0.6931645
##
## $psum3
## [1] 0.4653990 0.6931472
##
## $order
## [1] 1 1 1
r <- 1
vv <- svd(cov(Ya))
qa <- dim(t(Ya))[1]
Lam <- vv$u[,1:r]*sqrt(qa)
Fh <- 1/qa*t(Lam)%*%t(Ya)
#play(.1*Fh[1,]/sd(Fh[1,]) ,rate=8000)
#play(.1*Fh[2,]/sd(Fh[2,]) ,rate=8000)
The goal with the audio data set is, in fact, quite different. There are two audio sources, the piano and the radio, and there are two microphones recording the data, one closer to the piano and the other closer to the radio. The data can be thought as \[ \mathbf{Y}_t = A \mathbf{Z}_t + \varepsilon_t, \] where \(\mathbf{Z}_t\) is a \(2\)-vector time series representing the two audio sources (say standardized), \(A\) is a \(2\times 2\) matrix reflecting the distance of the sources to the microphones, and \(\varepsilon_t\) is \(\mbox{WN}(0,\Sigma_\varepsilon)\). The component series \(Z_{1t}\) and \(Z_{2t}\) can be assumed to be uncorrelated/independent. The goal is to estimate the underlying uncorrelated/independent component series \(Z_{1t}\) and \(Z_{2t}\).
One possibility might seem to decorrelate \(\mathbf{Y}_t\). As in DFM estimation, Step 1, consider the sample covariance matrix and its decomposition \[ \widehat \Sigma_{\mathbf Y} = \frac{1}{T} \sum_{t=1}^T \mathbf{Y}_t \mathbf{Y}_t' = \widehat U \widehat D \widehat U', \] where \(\widehat D = \mbox{diag}(\widehat d_1,d_2)\) with \(\widehat d_1\geq \widehat d_2\), and \(\widehat U = (\widehat U_1,\widehat U_2)\) is the orthogonal eigenvector matrix such that \(\widehat U \widehat U' = \widehat U' \widehat U = I_2\). Estimate the underlying sources as \[ \widehat{\mathbf{Z}}_t = \widehat \Sigma_{\mathbf Y}^{-1/2} \mathbf{Y}_t \] (which is implemented below through SVD of the data). Would this work and when?
sYa <- svd(Ya)
UV <- sYa$u %*% t(sYa$v)
# play(.1*UV[,1]/sd(UV[,1]) ,rate=8000)
# play(.1*UV[,2]/sd(UV[,2]) ,rate=8000)
par(mfrow=c(1,2))
plot(UV[1000:1500,1],type="l")
plot(UV[1000:1500,2],type="l")
But …
theta <- pi/4
R <- matrix(c(cos(theta), sin(theta), -sin(theta), cos(theta)), 2, 2)
Yatheta <- Ya %*% R
#play(.1*Yatheta[,1]/sd(Yatheta[,1]) ,rate=8000)
#play(.1*Yatheta[,2]/sd(Yatheta[,2]) ,rate=8000)
sYatheta <- svd(Yatheta)
UVtheta <- sYatheta$u %*% t(sYatheta$v)
# play(.1*UVtheta[,1]/sd(UVtheta[,1]) ,rate=8000)
# play(.1*UVtheta[,2]/sd(UVtheta[,2]) ,rate=8000)
par(mfrow=c(1,2))
plot(UVtheta[1000:1500,1],type="l")
plot(UVtheta[1000:1500,2],type="l")
Independent component analysis, on the other hand, seems to do a nice job …
library(fastICA)
YaICA <- fastICA(Ya, 2, alg.typ = "deflation", fun = "logcosh", alpha = 1,
method = "R", row.norm = FALSE, maxit = 200,
tol = 0.0001, verbose = TRUE)
#play(.1*YaICA$S[,1]/sd(YaICA$S[,1]) ,rate=8000)
#play(.1*YaICA$S[,2]/sd(YaICA$S[,2]) ,rate=8000)
par(mfrow=c(1,2))
plot(YaICA$S[1000:1500,1],type="l")
plot(YaICA$S[1000:1500,2],type="l")
YaICAtheta <- fastICA(Yatheta, 2, alg.typ = "deflation", fun = "logcosh", alpha = 1,
method = "R", row.norm = FALSE, maxit = 200,
tol = 0.0001, verbose = TRUE)
#play(.1*YaICAtheta$S[,1]/sd(YaICAtheta$S[,1]) ,rate=8000)
#play(.1*YaICAtheta$S[,2]/sd(YaICAtheta$S[,2]) ,rate=8000)
par(mfrow=c(1,2))
plot(YaICAtheta$S[1000:1500,1],type="l")
plot(YaICAtheta$S[1000:1500,2],type="l")
Basics of fast independent component analysis (fast ICA)
A simplified setting is \[ \mathbf{Y} = A \mathbf{Z}, \] where \(\mathbf{Y}\) is a random \(q\)-vector, \(A\) is a \(q\times r\) mixing matrix (with \(q\geq r\)) and the \(r\) components of \(\mathbf{Z}\) are independent, non-Gaussian.
In the case of “first” factor, one looks for a “direction” (\(q\)-vector) \(\mathbf{w}\) such that \(\mathbf{w}'\mathbf{Y}\) is most non-Gaussian, which in practice is implemented by maximizing the following non-Gaussianity measure \[ J(\mathbf{w}) = | E G(\mathbf{w}'\mathbf{Y}) - E G({\cal N}(0,1)) |^2, \] where \(G(u) = \frac{1}{\alpha}\log \mbox{cosh} (\alpha u)\) or \(G(u) = - \exp(u^2/2)\). (Why does this make sense?)
The numerical optimization algorithm is described in Hyvarinen and Oja (2000). It is an iterative algorithm that uses an initial random vector \(\mathbf{w}_0\) of unit norm. To get other factors, the algorithm is repeated \(r\) times and the Gramm-Schmitt-like decorrelation is applied to have the estimated factors to be (numerically) uncorrelated.
Q0: Supplement your analysis of multivariate time series with DFM or ICA if seems appropriate.
Q1: State and discuss the conditions used in the literature needed for the above DFM estimation method to work.
Q2: What is the rotation that the above DFM estimation method estimates the loadings and the factors up to?
Q3: How is estimation carried out for DFM in the dynamic form?
Q4: Explain why the above methods to estimate the number of factors in DFM are expected to work?
Q5: In connection to ICA, discuss other measures of non-Gaussianity such as kurtosis, and negentropy and its approximations.
Q6: Describe connections of ICA to mutual information, MLE and projection pursuit.
Baek, C., R. A. Davis, and V. Pipiras. 2017. “Periodic Dynamic Factor Models.” Preprint.
Bai, J., and S. Ng. 2007. “Determining the Number of Primitive Shocks in Factor Models.” Journal of Business & Economic Statistics 25 (1).
———. 2008. Large Dimensional Factor Analysis. Now Publishers Inc.
Doz, C., D. Giannone, and L. Reichlin. 2011. “A Two-Step Estimator for Large Approximate Dynamic Factor Models Based on Kalman Filtering.” Journal of Econometrics 164: 188–205.
Hyvärinen, A., and E. Oja. 2000. “Independent Component Analysis: Algorithms and Applications.” Neural Networks 13 (4): 411–30.
Onatski, A. 2009. “Testing Hypotheses About the Number of Factors in Large Factor Models.” Econometrica 77 (5): 1447–79.
Stock, J. H., and M. W. Watson. 2008. “Forecasting in Dynamic Factor Models Subject to Structural Instability.” Preprint.
———. 2011. “Dynamic Factor Models.” Oxford Handbook of Economic Forecasting. Oxford University Press, USA, 35–59.