This is about building (sparse) VAR like models that are “consistent” with (sparse) partial correlations among the components of multivariate time series data. Source: I will follow Chapter 5 in the book by Wilson et al. (2015). The focus is on lower-dimensional VARs.
Definition 4.1: Given a set of (zero mean) random variables \(X_1,X_2,\ldots,X_n\), the partial correlation \(\pi(X_i,X_j)\) between two random variables \(X_i\) and \(X_j\) included in \(\mathbf{X} = (X_1,X_2,\ldots,X_n)'\) is defined as correlation between \(X_i\) and \(X_j\) conditioned on all the other variables in \(X\), that is, \[ \pi(X_i,X_j) = \rho(X_i,X_j\ |\ \mathbf{X}\setminus \{X_i,X_j\}). \]
It is known that (\(i\neq j\)) \[ \pi(X_i,X_j) = - \frac{W_{ij}}{\sqrt{W_{ii}W_{jj}}}, \] where \(W = (W_{kl})\) is the inverse, that is, \(W = V^{-1}\), of the covariance matrix \(V = E \mathbf{X}\mathbf{X}'\) with \(\mathbf{X} = (X_1,X_2,\ldots,X_n)'\). Name: \(W\) is known as a precision matrix.
Partial correlations are often represented in a graph, the so-called partial correlation graph (PCG), where an edge corresponds to a non-zero partial correlation. In the Gaussian case, PCG is also CIG (Conditional Independence Graph). For example, for \(n=4\) variables, PCG could be as follows.
library("igraph")
g <- graph_from_literal( X1-X2-X3-X4, X1-X4, X2-X4, X4-X5 )
plot.igraph(g, vertex.color="white", edge.curved=0, edge.width = 2, vertex.size=30)
For small dimension \(n\), given a sample of iid observations \(\mathbf{X}_1,\ldots,\mathbf{X}_T\), the partial correlations and PCG are estimated as:
The sample covariance matrix \(\widehat V\) is computed and the sample inverse covariance matrix \(\widehat W = (\widehat W_{ij})\) is computed as \(\widehat W = \widehat V^{-1}\). The sample partial correlations (\(i\neq j\)) are \[ \widehat \pi(X_i,X_j) = - \frac{\widehat W_{ij}}{\sqrt{\widehat W_{ii}\widehat W_{jj}}}. \]
The null hypothesis that \(\pi(X_i,X_j)=0\) is rejected at level \(\alpha\) if \[ |\widehat \pi(X_i,X_j)| > \frac{t_{\alpha/2,T-n+1}}{\sqrt{t^2_{\alpha/2,T-n+1}+(T-n+1)}}, \] where \(t_{\alpha/2,T-n+1}\) is the corresponding critical value of the \(t_{T-n+1}\) distribution.
The rational for the last step comes from linear regression - see e.g. p. 136 in Wilson et al. (2015).
Since multiple testing is performed, a practical recommendation is to perform this for several values of \(\alpha\).
Note: For larger dimension \(n\), the so-called graphical lasso is used: one estimates the precision matrix \(W=V^{-1}\) through \[ \widehat W = \mathop{\rm argmax}_{W} \Big( \log \mbox{det}(W) - \mbox{tr}(\widehat V W) - \lambda \|W\|_1 \Big), \] where \(\lambda\) is a penalty parameter. See Friedman et al. (2008).
The monthly flour price indices in Buffalo, Minneapolis and Kansas city from August 1972 to November 1980.
prices <- read.table("Flourprice.txt")
prices <- ts(prices, start=c(1972,8), end=c(1980,11), frequency=12, names = c("Buff","Minn","Kans"))
matplot(prices, type = "l")
legend('top', c('Buff','Minn','Kans'),lty=c(1,2,3))
library(vars)
VARselect(prices, lag.max=8, type="const")$selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 2 2 2
var <- VAR(prices, p=2, type="const")
summary(var)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: Buff, Minn, Kans
## Deterministic variables: const
## Sample size: 98
## Log Likelihood: -781.229
## Roots of the characteristic polynomial:
## 0.9858 0.9316 0.9316 0.4811 0.1671 0.1671
## Call:
## VAR(y = prices, p = 2, type = "const")
##
##
## Estimation results for equation Buff:
## =====================================
## Buff = Buff.l1 + Minn.l1 + Kans.l1 + Buff.l2 + Minn.l2 + Kans.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Buff.l1 -0.29952 0.35686 -0.839 0.403484
## Minn.l1 1.37412 0.40465 3.396 0.001016 **
## Kans.l1 -0.01335 0.19401 -0.069 0.945287
## Buff.l2 1.39134 0.36558 3.806 0.000256 ***
## Minn.l2 -1.75052 0.41235 -4.245 5.25e-05 ***
## Kans.l2 0.24441 0.19837 1.232 0.221079
## const 7.53267 4.57862 1.645 0.103382
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 6.828 on 91 degrees of freedom
## Multiple R-Squared: 0.938, Adjusted R-squared: 0.934
## F-statistic: 229.6 on 6 and 91 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation Minn:
## =====================================
## Minn = Buff.l1 + Minn.l1 + Kans.l1 + Buff.l2 + Minn.l2 + Kans.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Buff.l1 -0.84455 0.38154 -2.214 0.02936 *
## Minn.l1 1.93768 0.43264 4.479 2.18e-05 ***
## Kans.l1 -0.01731 0.20743 -0.083 0.93369
## Buff.l2 0.97897 0.39086 2.505 0.01404 *
## Minn.l2 -1.40460 0.44088 -3.186 0.00198 **
## Kans.l2 0.28931 0.21209 1.364 0.17591
## const 8.08874 4.89532 1.652 0.10191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 7.301 on 91 degrees of freedom
## Multiple R-Squared: 0.937, Adjusted R-squared: 0.9328
## F-statistic: 225.4 on 6 and 91 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation Kans:
## =====================================
## Kans = Buff.l1 + Minn.l1 + Kans.l1 + Buff.l2 + Minn.l2 + Kans.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Buff.l1 -0.5512 0.4243 -1.299 0.197215
## Minn.l1 0.8799 0.4812 1.829 0.070724 .
## Kans.l1 0.8295 0.2307 3.596 0.000526 ***
## Buff.l2 0.7152 0.4347 1.645 0.103381
## Minn.l2 -1.2951 0.4903 -2.641 0.009724 **
## Kans.l2 0.3461 0.2359 1.467 0.145799
## const 10.7319 5.4445 1.971 0.051747 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 8.12 on 91 degrees of freedom
## Multiple R-Squared: 0.9354, Adjusted R-squared: 0.9312
## F-statistic: 219.8 on 6 and 91 DF, p-value: < 2.2e-16
##
##
##
## Covariance matrix of residuals:
## Buff Minn Kans
## Buff 46.63 48.17 48.24
## Minn 48.17 53.30 53.21
## Kans 48.24 53.21 65.93
##
## Correlation matrix of residuals:
## Buff Minn Kans
## Buff 1.0000 0.9664 0.8700
## Minn 0.9664 1.0000 0.8976
## Kans 0.8700 0.8976 1.0000
V <- matrix(c(1.0000, 0.9664, 0.8700, 0.9664, 1.0000, 0.8976, 0.8700, 0.8976, 1.0000), ncol = 3)
W <- solve(V)
Pi <- -diag(diag(W)^(-1/2)) %*% W %*% diag(diag(W)^(-1/2))
diag(Pi) <- rep(1,3)
Pi
## [,1] [,2] [,3]
## [1,] 1.00000000 0.8534361 0.02258778
## [2,] 0.85343613 1.0000000 0.44843024
## [3,] 0.02258778 0.4484302 1.00000000
# critical value based on normal
message("critical value")
1.96/sqrt(1.96^2 + (98-3+1))
## [1] 0.1961554
The corresponding PCG (CIG):
library("igraph")
g <- graph_from_literal( e1t-e2t, e2t-e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)
This is graphically represented as:
g <- graph_from_literal( e1t-+e2t, e2t-+e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)
Definition 4.2: The diagram is referred to as directed acyclic graph (DAG).
(Why focus on DAGs? See below in connection to structural VARs.)
Note that the PCG above can also be associated to the following regressions and DAG (not the same as above up to permutation): \[\begin{eqnarray} e_{2t} & = & b_{1t} \\ e_{1t} & = & \eta_{2,2} e_{2t} + b_{2t} \\ e_{3t} & = & \eta_{3,2} e_{2t} + b_{3t}. \end{eqnarray}\] But it cannot be associated with \[\begin{eqnarray} e_{1t} & = & c_{1t} \\ e_{2t} & = & c_{2t} \\ e_{3t} & = & \upsilon_{3,1} e_{1t} +\upsilon_{3,2} e_{2t} + c_{3t}. \end{eqnarray}\](Why?) (Explain through matrices.)
The corresponding DAGs are:
g1 <- graph_from_literal( e1t+-e2t, e2t-+e3t )
g2 <- graph_from_literal( e1t-+e2t, e2t+-e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
par(mfrow = c(1, 2))
plot.igraph(g1, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)
plot.igraph(g2, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)
From matrix perspective, a directed link in DAG would translate into a link in PCG. So DAG is a “subset” of PCG.
Another possibility in building a DAG is to start with a “saturated” DAG and have some rules by which DAG edges can be removed based on the corresponding PCG. The Markovian properties below are helpful in this regard.
What is/are the DAGs consistent with the following PCG?
library("igraph")
g <- graph_from_literal( e1t-e2t, e2t-e3t, e3t-e4t, e4t-e5t, e5t-e6t, e1t-e4t, e2t-e4t, e3t-e6t, e4t-e6t )
l <- matrix(c(0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,2,0,-2,0,2,0,-2,0), vertex.color="white", edge.width = 2, vertex.size=20, margin = 0.1)
Local Markov property: Each variable, given its neighbors, is conditionally independent of all the remaining variables.
Example: In the PCG above, \(e_5\) is conditionally independent of \(e_1,e_2,e_3\), given its neighbors \(e_4,e_6\).
Global Markov property: If there are no links between any pair of variables, one taken from each of two blocks, then the two blocks are conditionally independent of each other given all the remaining variables.
Example: In the PCG above, the blocks \(\{e_1,e_2\}\) and \(\{e_5,e_6\}\) (without links) are conditionally independent given the rest of the variables \(\{e_3,e_4\}\).
Applying the local/global Markov property to the “saturated”" DAG leads to the following DAGs consistent with the above PCG.
library("igraph")
g <- graph_from_literal( e1t+-e2t, e2t+-e3t, e3t+-e4t, e4t+-e5t, e5t+-e6t, e1t+-e4t, e2t+-e4t, e3t+-e6t, e4t+-e6t )
l <- matrix(c(0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,-2,2,0,0,-2,2,0), vertex.color="white", edge.width = 2, vertex.size=20, edge.arrow.size=.5, margin =0.1)
Moralization rule: Given a DAG, to obtain a consistent PCG (CIG):
For each node of the DAG, insert an undirected edge between pairs of its parent nodes, unless they are already linked by an edge.
Replace each directed edge in the DAG by an undirected edge.
For example, the following DAG is also consistent with the PCG above:
library("igraph")
g <- graph_from_literal( e1t+-e2t, e2t+-e3t, e3t+-e4t, e4t+-e5t, e1t+-e4t, e2t+-e4t, e3t+-e6t, e4t+-e6t )
l <- matrix(c(0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,-2,2,0,0,-2,2), vertex.color="white", edge.width = 2, vertex.size=20, edge.arrow.size=.5, margin =0.1)
In practice, the selection between different possible DAG explanations of an estimated PCG (CIG) is assisted by comparing the goodness of fit of the estimated models and testing for significance of the coefficients that correspond to links in the competing models. (To be illustrated below.) The so-called product rules for moralized links are also used: e.g. in the DAG and its associated PCG given below, one has \[ \pi(X,Y) = - \pi(X,Z) \times \pi(Y,Z). \]
g1 <- graph_from_literal( X-+Z, Z+-Y )
g2 <- graph_from_literal( X-Z, Z-Y, X-Y )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
par(mfrow = c(1, 2))
plot.igraph(g1, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30)
plot.igraph(g2, layout = l, edge.curved=c(0,2,0), vertex.color="white", edge.width = 2, vertex.size=30)
This product rule can be extended to more sophisticated DAGs that have moralized link in their PCG. See Section 5.8 in Wilson et al. (2015). But for more sophisticated graphs, product rules also become more complex and may not be necessarily easy to keep track of (instead the various models may be fitted and examined).
Based on one of the DAGs suggested for the flour prices innovations, the following DAG model is obtained through OLS regression (with \(t\) values in parentheses):
g <- graph_from_literal( e1t-+e2t, e2t-+e3t )
l <- matrix(c(0, 0, 0, 3, 2, 1), ncol=2)
plot.igraph(g, layout = l, edge.curved=c(0,0,0), vertex.color="white", edge.width = 2, vertex.size=30,edge.label= c("1.033 (37)", "0.998 (20.1)"))
The (approximate) likelihood ratio tests statistic (deviance; based on minus twice the log-likelihood or sample size times log of the sum of the residuals squared) of the DAG regression model against the saturated regression model (including \(e_{1,t}\) in the regression of \(e_{3,t}\)) turns out to be \(0.05\), which after comparing to a critical value of the \(\chi^2(1)=\chi^2_1\) distribution suggests that the reduced DAG regression model is preferred. The same conclusion is reached through model selection criteria (AIC, SIC, etc) - see Wilson et al. (2015), p. 145.
The other DAG regression model consistent with the PCG gives exactly the same deviance and information criteria, and is called likelihood equivalent. The considered DAG regression model NOT consistent with the PCG gives a deviance difference of \(139.6\), suggesting that it is extremely inadequate.
The reason that the VAR error terms (innovations) are correlated, as e.g. in the flour price example above, is that they account for correlation of the components of the series \(\mathbf{X}_t\) for the same \(t\). To have uncorrelated error terms, one could fit instead a (zero mean) structural VAR model (SVAR model, for short) of the form: \[ \Phi_0 \mathbf{X}_t = \Phi_1 \mathbf{X}_{t-1} + \Phi_2 \mathbf{X}_{t-2} + \ldots + \Phi_p \mathbf{X}_{t-p} + \mathbf{a}_t, \] where the components of \(\mathbf{a}_t\) are uncorrelated, with a diagonal matrix \(D = E\mathbf{a}_t \mathbf{a}_t'\).
Note that the above structural VAR model corresponds to the canonical VAR model: \[\begin{eqnarray} \mathbf{X}_t & = & \Phi_0^{-1} \Phi_1 \mathbf{X}_{t-1} + \Phi_0^{-1}\Phi_2 \mathbf{X}_{t-2} + \ldots + \Phi_0^{-1}\Phi_p \mathbf{X}_{t-p} + \Phi_0^{-1}\mathbf{a}_t\\ & = & \Phi_1^* \mathbf{X}_{t-1} + \Phi_2^* \mathbf{X}_{t-2} + \ldots + \Phi_p^* \mathbf{X}_{t-p} + \mathbf{\epsilon}_t, \end{eqnarray}\]where the components of \(\mathbf{\epsilon}_t\) are possible correlated, with the covariance matrix \(V = E\mathbf{\epsilon}_t \mathbf{\epsilon}_t'\). The connection between \(V\) and \(D\) is expressed as: \[ \Phi_0 V \Phi_0' = D, \] that is, \(\Phi_0\) diagonalizes \(V^{-1}\). In particular, sparse \(V^{-1}\) is associated with sparse \(\Phi_0\).
While a structural VAR corresponds to one canonical VAR, this is not true conversely: the same canonical VAR can be associated with different structural VARs. This is because the matrix \(\Phi\) diagonilizing \(V^{-1}\) is not unique. In fact, restrictions need to be imposed for identifiability of structural VAR, and this is often done by taking a lower-triangular (or upper-triangular) \(\Phi_0\), with \(1\)’s on the diagonal.
Note: Another name for structural VAR is simultaneous equations modeling.
We shall fit an SVAR(2) model consistent with partial correlations in the series and its values at lags 1 and 2. The above SVAR model will be thought as an underlying regression equation for DAG. Note that this situation is slightly different from that considered in the general context above: We have only 3 equations, and expect only variables with past lags to have directed links into “future” variables.
X <- cbind(prices[3:100,],prices[2:99,],prices[1:98,])
VX <- cov(X)
WX <- solve(VX)
PiX <- -diag(diag(WX)^(-1/2)) %*% WX %*% diag(diag(WX)^(-1/2))
diag(PiX) <- rep(1,9)
PiX
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00000000 0.85321333 0.02312921 0.45154411 -0.2884547
## [2,] 0.85321333 1.00000000 0.44829127 -0.49678010 0.5219660
## [3,] 0.02312921 0.44829127 1.00000000 0.12991046 -0.4017109
## [4,] 0.45154411 -0.49678010 0.12991046 1.00000000 0.8308836
## [5,] -0.28845473 0.52196598 -0.40171085 0.83088355 1.0000000
## [6,] -0.01158511 -0.29944807 0.65761071 -0.11511950 0.5717615
## [7,] 0.47807639 -0.30062063 -0.13167848 0.47155380 -0.4902154
## [8,] -0.41159896 0.26388327 0.05436952 -0.41577161 0.5961812
## [9,] -0.03597379 0.03647912 0.05781995 0.04755999 -0.3546225
## [,6] [,7] [,8] [,9]
## [1,] -0.01158511 0.47807639 -0.41159896 -0.03597379
## [2,] -0.29944807 -0.30062063 0.26388327 0.03647912
## [3,] 0.65761071 -0.13167848 0.05436952 0.05781995
## [4,] -0.11511950 0.47155380 -0.41577161 0.04755999
## [5,] 0.57176150 -0.49021541 0.59618115 -0.35462254
## [6,] 1.00000000 0.13041190 -0.35595222 0.62381694
## [7,] 0.13041190 1.00000000 0.89001905 -0.05385604
## [8,] -0.35595222 0.89001905 1.00000000 0.45925781
## [9,] 0.62381694 -0.05385604 0.45925781 1.00000000
PiX[,1:3]
## [,1] [,2] [,3]
## [1,] 1.00000000 0.85321333 0.02312921
## [2,] 0.85321333 1.00000000 0.44829127
## [3,] 0.02312921 0.44829127 1.00000000
## [4,] 0.45154411 -0.49678010 0.12991046
## [5,] -0.28845473 0.52196598 -0.40171085
## [6,] -0.01158511 -0.29944807 0.65761071
## [7,] 0.47807639 -0.30062063 -0.13167848
## [8,] -0.41159896 0.26388327 0.05436952
## [9,] -0.03597379 0.03647912 0.05781995
Critical values for \(\alpha=0.1\), \(0.05\) and \(0.01\) are, respectively, \(0.171\), \(0.202\) and \(0.262\). This leads to the following PCG:
gflour <- graph_from_literal( X1t-X2t, X2t-X3t, X1tm1-X1t, X2tm1-X2t, X3tm1-X3t, X1tm1-X2t, X2tm1-X1t, X3tm1-X2t, X2tm1-X3t, X1tm2, X1tm2-X2t, X1tm2-X1t, X2tm2-X1t, X2tm2-X2t, X2tm2, X3tm2)
lflour <- matrix(c(0, 0, 0, -1, -1, -1, -2, -2, -2, 3, 2, 1, 3, 2, 1, 3, 2, 1), ncol=2)
plot.igraph(gflour, layout = lflour, edge.curved=c(0,0,0,-.75,0,0,0,0,0,0,0.75,0,0,0), edge.width = 2, vertex.color="white", vertex.size=50, margin = 0.1, edge.label= c("0.85", "0.45", "-0.29", "0.48","-0.41","0.45","-0.5","0.52","-0.3","-0.3","0.26","-0.4",".66"))
The suggested DAG consistent with this PCG is given below, along with the estimated coefficients in the OLS regression and their \(t\) values. It is motivated by the following considerations:
gflour <- graph_from_literal( X1t-+X2t, X2t-+X3t, X1tm1-+X2t, X2tm1-+X2t, X3tm1-+X3t, X2tm1-+X1t, X2tm1-+X3t, X1tm2-+X2t, X1tm2-+X1t, X2tm2-+X1t, X2tm2-+X2t, X3tm2)
lflour <- matrix(c(0, 0, 0, -1, -1, -1, -2, -2, -2, 3, 2, 1, 3, 2, 1, 3, 2, 1), ncol=2)
plot.igraph(gflour, layout = lflour, edge.curved=c(0,0,0,0,0,0,0,0.75,0,0,-.75), edge.width = 2, vertex.color="white", vertex.size=50, margin = 0.1, edge.label= c("1.04 (37.5)", "1.00 (20.6)", "-0.54 (-5.8)","1.15 (12.8)","0.52 (5.0)","-0.92 (-15..5)","0.91 (18.9)","1.09 (5.9)","-0.47 (4.4)","-1.28 (-6.5)","0.45 (4.2)"))
The conclusion is that \(X_{3t}\) depends in a simple manner on only \(X_{2t}\) and \(X_{3,t-1}\). The series \(X_{1t}\) and \(X_{2t}\) are NOT dependent on \(X_{3t}\) at all, and their dependence is qualitatively symmetric.
The DAG corresponds to the following SVAR model: \[ \Phi_0 = \left( \begin{matrix} 1 & 0 & 0 \\ -1.04 & 1 & 0\\ 0 & -1.01 & 1 \end{matrix} \right),\quad \Phi_1 = \left( \begin{matrix} 0 & 1.16 & 0 \\ -0.54 & 0.52 & 0\\ 0 & -0.92 & 0.89 \end{matrix}\right),\quad \Phi_2 = \left( \begin{matrix} 1.14 & -1.13 & 0 \\ -0.45 & 0.43 & 0\\ 0 & 0 & 0 \end{matrix} \right) \] Against the saturated DAG model, the deviance is \(15.56\), which should be compared to a critical value of \(\chi^2(10)\) (e.g. \(18.307\) at 5%). Thus, the sparse DAG model is preferred. The same conclusion is reached through model selection criteria.
The obtained model also lends itself to easier interpretation. Essentially (approximately), we have: \[ X_{3t} = X_{2t} + N_t,\quad N_t = 0.91 N_{t-1} + A_t, \] \[ X_{1t} = X_{1,t-2} + (X_{2,t-1} - X_{2,t-2}) + A_{1t}, \] \[ X_{2t} = X_{1t} + 0.5 (X_{2,t-1} + X_{2,t-2}) - 0.5 (X_{1,t-1} + X_{1,t-2}) + A_{2t}. \] Further model diagnostics is carried out in Wilson et al. (2015), pp. 151-152.
The following steps are suggested to build an SVAR (Wilson et al. (2015), pp. 152-153):
Q0: Supplement your analysis of multivariate time series with graphical modeling if seems appropriate.
Q1:** Why does it make sense to fit SVARs according to non-zero partial correlations?
Q2: Compare the model fitted to the data in this lecture to the models fitted through (A-)LASSO or other variable selection methods.
Q3: Report on generalizations of the product rule for moralized links mentioned above.
Q4: Prove the spectral domain connection of zero partial correlations mentioned above.
Barigozzi, M., and C. T. Brownlees. 2016. “NETS: Network Estimation for Time Series.” Preprint.
Friedman, J., T. Hastie, and R. Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41.
Lütkepohl, H. 2006. “Structural Vector Autoregressive Analysis for Cointegrated Variables.” In Modern Econometric Analysis: Surveys on Recent Developments, edited by O. Hübler and J. Frohn, 73–86. Berlin, Heidelberg: Springer Berlin Heidelberg.
Wilson, G.T., M. Reale, and J. Haywood. 2015. Models for Dependent Time Series. Chapman & Hall/Crc Monographs on Statistics & Applied Probability. CRC Press.