Univariate III: State-Space and Structural Modeling Approaches

What is this all about?

Many time series models, possibly with trends and seasonal components, and even nonstationary, can be written as components of a vector series \(\mathbf{Y} = \{\mathbf{Y}_t\}\) satisfying \[ \begin{array}{l} \mathbf{Y}_t = G_t \mathbf{X}_t + \mathbf{W}_t,\\ \mathbf{X}_{t+1} = F_t \mathbf{X}_t + \mathbf{V}_t, \end{array} \] where \(\{\mathbf{W}_t\}\sim {\rm WN}(0,R_t)\), \(\{\mathbf{V}_t\}\sim {\rm WN}(0,Q_t)\), and \(G_t\), \(F_t\), \(R_t\), \(Q_t\) are the model “parameters”. Such systems are known as state-space models and can be estimated, predicted, … through computationally efficient Kalman recursions. Here, \(\mathbf{Y}_t\) are observable but \(\mathbf{X}_t\) are latent (not observable).

In the above sense, the model trend and/or seasonal component can be estimated simultaneously with other model parameters. Such modeling approach is referred to as structural. This is one of the more powerful techniques, with extensions/connections to HMM, Bayesian analysis, factor models, etc. and also, as will be seen below, EM algorithm and handling missing data.

Motivating example: Johnson & Johnson Quarterly Earnings per share

data("JohnsonJohnson")
jj <- JohnsonJohnson

par(mfrow = c(1, 2)) 
plot.ts(jj)
plot.ts(log(jj))

Modeling the log of the earnings might be possible through the classical decomposition approach discussed previously. But …

Consider the model \[ \begin{array}{l} Y_t = T_t + S_t + V_t,\\ T_t = \phi T_{t-1} + W_{1t},\\ S_t + S_{t-1} + S_{t-2} + S_{t-3} = W_{2t}, \end{array} \] where \(\{V_t\}\sim {\rm WN}(0,\sigma_V^2)\), \(\{W_{1t}\}\sim {\rm WN}(0,\sigma_{W1}^2)\), \(\{W_{2t}\}\sim {\rm WN}(0,\sigma_{W2}^2)\), and the coefficient \(\phi>1\) characterizes the increase. This can be put into a state space model with \(\mathbf{Y}_t = Y_t\), \(\mathbf{X}_t = (T_t, S_t, S_{t-1},S_{t-2})'\) so that \[ Y_t = (1\ \ 1\ \ 0\ \ 0) \left( \begin{array}{c} T_t \\ S_t\\ S_{t-1}\\ S_{t-2} \end{array} \right) + V_t, \] \[ \left( \begin{array}{c} T_t \\ S_t\\ S_{t-1}\\ S_{t-2} \end{array} \right) = \left( \begin{array}{cccc} \phi & 0 & 0 & 0 \\ 0 & -1 & -1 & -1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} \right) \left( \begin{array}{c} T_{t-1} \\ S_{t-1}\\ S_{t-2}\\ S_{t-3} \end{array} \right) + \left( \begin{array}{c} W_{1t} \\ W_{2t} \\ 0 \\ 0 \end{array} \right) \] with \(R = R_t = \sigma_V^2\) and \[ Q = Q_t = \left( \begin{array}{cccc} \sigma_{W1}^2 & 0 & 0 & 0 \\ 0 & \sigma_{W2}^2 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{array} \right). \]

# borrowed from the book by Shumway and Stoffer
library(astsa)

num = length(jj)
A = cbind(1,1,0,0)                                  

# Function to Calculate Likelihood 
Linn=function(para){
  Phi = diag(0,4) 
  Phi[1,1] = para[1] 
  Phi[2,]=c(0,-1,-1,-1); Phi[3,]=c(0,1,0,0); Phi[4,]=c(0,0,1,0)
  cQ1 = para[2]; cQ2 = para[3]     # sigma_w1 and sigma_w2
  cQ=diag(0,4); cQ[1,1]=cQ1; cQ[2,2]=cQ2
  cR = para[4]                     # sigma_v
  kf = Kfilter0(num,jj,A,mu0,Sigma0,Phi,cQ,cR)
  return(kf$like)  
}

# Initial Parameters 
mu0      = c(.7,0,0,0) 
Sigma0   = diag(.04,4) 
init.par = c(1.03, .1, .1, .5)  # Phi[1,1], the 2 Qs and R

# Estimation
est = optim(init.par, Linn, NULL, method="BFGS", hessian=TRUE, control=list(trace=1,REPORT=1))

## initial  value 2.693646 
## iter   2 value -0.853524
## iter   3 value -9.416502
## iter   4 value -10.241753
## iter   5 value -19.419817
## iter   6 value -30.441179
## iter   7 value -31.825536
## iter   8 value -32.248400
## iter   9 value -32.839908
## iter  10 value -33.019860
## iter  11 value -33.041739
## iter  12 value -33.050572
## iter  13 value -33.055481
## iter  14 value -33.078141
## iter  15 value -33.096859
## iter  16 value -33.098395
## iter  17 value -33.099007
## iter  18 value -33.099374
## iter  19 value -33.099488
## iter  19 value -33.099488
## final  value -33.099488 
## converged

SE  = sqrt(diag(solve(est$hessian)))
u   = cbind(estimate=est$par,SE)
rownames(u)=c("Phi11","sigw1","sigw2","sigv"); u

##          estimate         SE
## Phi11 1.035084765 0.00253645
## sigw1 0.139725568 0.02155155
## sigw2 0.220878294 0.02376430
## sigv  0.000465594 0.24174778

# Smooth
Phi      = diag(0,4) 
Phi[1,1] = est$par[1] 
Phi[2,]  = c(0,-1,-1,-1) 
Phi[3,]  = c(0,1,0,0) 
Phi[4,]  = c(0,0,1,0)
cQ1      = est$par[2]
cQ2      = est$par[3]      
cQ       = diag(0,4)
cQ[1,1]  = cQ1 
cQ[2,2]  = cQ2   
cR       = est$par[4]   
ks       = Ksmooth0(num, jj, A, mu0, Sigma0, Phi, cQ, cR)   

# Plots
Tsm   = ts(ks$xs[1,,], start=1960, freq=4)
Ssm   = ts(ks$xs[2,,], start=1960, freq=4)
p1    = 3*sqrt(ks$Ps[1,1,]); p2 = 3*sqrt(ks$Ps[2,2,])
par(mfrow=c(1,2))
plot(Tsm, main='Trend Component', ylab='Trend')
  xx  = c(time(jj), rev(time(jj)))
  yy  = c(Tsm-p1, rev(Tsm+p1))
polygon(xx, yy, border=NA, col=gray(.5, alpha = .3))
plot(jj, main='Data & Trend+Season', ylab='J&J QE/Share', ylim=c(-.5,17))
  xx  = c(time(jj), rev(time(jj)) )
  yy  = c((Tsm+Ssm)-(p1+p2), rev((Tsm+Ssm)+(p1+p2)) )
polygon(xx, yy, border=NA, col=gray(.5, alpha = .3))

# Forecast
n.ahead = 12
y       = ts(append(jj, rep(0,n.ahead)), start=1960, freq=4)
rmspe   = rep(0,n.ahead) 
x00     = ks$xf[,,num]
P00     = ks$Pf[,,num]
Q       = t(cQ)%*%cQ 
R       = t(cR)%*%(cR)
for (m in 1:n.ahead){
       xp = Phi%*%x00
       Pp = Phi%*%P00%*%t(Phi)+Q
      sig = A%*%Pp%*%t(A)+R
        K = Pp%*%t(A)%*%(1/sig)
      x00 = xp 
      P00 = Pp-K%*%A%*%Pp
 y[num+m] = A%*%xp
 rmspe[m] = sqrt(sig) 
}

par(mfrow=c(1,1))
plot(y, type='o', main='', ylab='J&J QE/Share', ylim=c(5,30),
xlim = c(1975,1984))
upp  = ts(y[(num+1):(num+n.ahead)]+2*rmspe, start=1981, freq=4)
low  = ts(y[(num+1):(num+n.ahead)]-2*rmspe, start=1981, freq=4)
 xx  = c(time(low), rev(time(upp)))
 yy  = c(low, rev(upp))
polygon(xx, yy, border=8, col=gray(.5, alpha = .3))
abline(v=1981, lty=3)

Data/Theory goals of the lecture

The theory goal is to have some basic understanding on how a state-space model is fitted, its latent variables are estimated, and what underlying Kalman recursions are. From the data (modeling) perspective, the goal is to get a sense of the advantages of the state-space modeling, and what the state-space modeling gives one in practice.

The Kalman recursions

Recall the state-space model of a vector series \(\mathbf{Y} = \{\mathbf{Y}_t\}\) satisfying \[ \begin{array}{l} \mathbf{Y}_t = G_t \mathbf{X}_t + \mathbf{W}_t,\\ \mathbf{X}_{t+1} = F_t \mathbf{X}_t + \mathbf{V}_t, \end{array} \] where \(\{\mathbf{W}_t\}\sim {\rm WN}(0,R_t)\), \(\{\mathbf{V}_t\}\sim {\rm WN}(0,Q_t)\), and \(G_t\), \(F_t\), \(R_t\), \(Q_t\) are the model “parameters”.

Kalman recursions involve three different problems: prediction, filtering and smoothing. Estimation of \(\mathbf{X}_t\) in terms of:

\(\mathbf{Y}_0,\ldots,\mathbf{Y}_{t−1}\) defines the prediction problem,
\(\mathbf{Y}_0,\ldots,\mathbf{Y}_t\) defines the filtering problem,
\(\mathbf{Y}_0,\ldots,\mathbf{Y}_n\), \(n>t\), defines the smoothing problem.

We would also like to be able to measure the variability of these estimates. By convention, \(\mathbf{Y}_0\) is a vector of ones.

Definition 3.1: For a random vector \(\mathbf{X} = (X_1,\ldots,X_v)'\), set \[ P_t(\mathbf{X}) = (P_t(X_1),\ldots,P_t(X_v)', \] where \(P_t(X_i) := P(X_i|\mathbf{Y}_0,\ldots,\mathbf{Y}_t)\) is the best linear predictor of \(X_i\) in terms of all the components of \(\mathbf{Y}_0,\ldots,\mathbf{Y}_t\).

Prediction: Want \(P_{t-1}(\mathbf{X}_t) = : \widehat{\mathbf{X}}_t\)
Filtering: Want \(P_{t}(\mathbf{X}_t) = : \mathbf{X}_{t|t}\)
Smoothing: Want \(P_{T}(\mathbf{X}_t) = : \mathbf{X}_{t|T}\) (with \(T\) replaced by \(n\) below, \(P_{n}(\mathbf{X}_t) = : \mathbf{X}_{t|n}\))

Prediction:

Filtering:

Smoothing:

Use of Kalman recursions in estimation for state-space models

Suppose the state-space model is parametrized by a parameter vector \(\mathbf{\theta}\). The likelihood can be written as: \[ L(\mathbf{\theta}; \mathbf{Y}_1,\ldots,\mathbf{Y}_T) = \prod_{t=1}^T f_t(\mathbf{Y}_t|\mathbf{Y}_1,\ldots,\mathbf{Y}_{t-1}) \] where \(f_t(\cdot|\mathbf{y}_1,\ldots,\mathbf{y}_{t-1})\) is the conditional density of \(\mathbf{Y}_t\) given \(\mathbf{Y}_1=\mathbf{y}_1,\ldots,\mathbf{Y}_{t-1}=\mathbf{y}_{t-1}\).

Assuming joint Gaussianity of the state-space model variables, we have \[ f_t(\mathbf{Y}_t|\mathbf{Y}_1,\ldots,\mathbf{Y}_{t-1}) = \frac{1}{(2\pi)^{w/2} \mbox{det}(\Delta_t)^{1/2}} \exp\Big\{ - \frac{1}{2} \mathbf{I}_t'\Delta_t^{-1} \mathbf{I}_t \Big\}, \] where \(\mathbf{I}_t = \mathbf{Y}_t - P_{t-1}\mathbf{Y}_t = \mathbf{Y}_t - G_t\widehat{\mathbf{X}_t}\), and \(P_{t-1}\mathbf{Y}_t\) and \(\Delta_t\) are the one-step predictors and error covariance matrices found from the Kalman prediction recursions. The likelihood above can therefore be expressed as \[ L(\mathbf{\theta}; \mathbf{Y}_1,\ldots,\mathbf{Y}_T) = \frac{1}{(2\pi)^{wT/2} \prod_{t=1}^T \mbox{det}(\Delta_t)^{1/2}} \exp\Big\{ - \frac{1}{2} \sum_{t=1}^T \mathbf{I}_t'\Delta_t^{-1} \mathbf{I}_t \Big\} \] and computed efficiently using the Kalman prediction recursion.

A look back to Johnson and Johnson data set and the code. E.g. What is the function Kfilter0?

Kfilter0

## function (num, y, A, mu0, Sigma0, Phi, cQ, cR) 
## {
##     Q = t(cQ) %*% cQ
##     R = t(cR) %*% cR
##     Phi = as.matrix(Phi)
##     pdim = nrow(Phi)
##     y = as.matrix(y)
##     qdim = ncol(y)
##     xp = array(NA, dim = c(pdim, 1, num))
##     Pp = array(NA, dim = c(pdim, pdim, num))
##     xf = array(NA, dim = c(pdim, 1, num))
##     Pf = array(NA, dim = c(pdim, pdim, num))
##     innov = array(NA, dim = c(qdim, 1, num))
##     sig = array(NA, dim = c(qdim, qdim, num))
##     x00 = as.matrix(mu0, nrow = pdim, ncol = 1)
##     P00 = as.matrix(Sigma0, nrow = pdim, ncol = pdim)
##     xp[, , 1] = Phi %*% x00
##     Pp[, , 1] = Phi %*% P00 %*% t(Phi) + Q
##     sigtemp = A %*% Pp[, , 1] %*% t(A) + R
##     sig[, , 1] = (t(sigtemp) + sigtemp)/2
##     siginv = solve(sig[, , 1])
##     K = Pp[, , 1] %*% t(A) %*% siginv
##     innov[, , 1] = y[1, ] - A %*% xp[, , 1]
##     xf[, , 1] = xp[, , 1] + K %*% innov[, , 1]
##     Pf[, , 1] = Pp[, , 1] - K %*% A %*% Pp[, , 1]
##     sigmat = as.matrix(sig[, , 1], nrow = qdim, ncol = qdim)
##     like = log(det(sigmat)) + t(innov[, , 1]) %*% siginv %*% 
##         innov[, , 1]
##     for (i in 2:num) {
##         if (num < 2) 
##             break
##         xp[, , i] = Phi %*% xf[, , i - 1]
##         Pp[, , i] = Phi %*% Pf[, , i - 1] %*% t(Phi) + Q
##         sigtemp = A %*% Pp[, , i] %*% t(A) + R
##         sig[, , i] = (t(sigtemp) + sigtemp)/2
##         siginv = solve(sig[, , i])
##         K = Pp[, , i] %*% t(A) %*% siginv
##         innov[, , i] = y[i, ] - A %*% xp[, , i]
##         xf[, , i] = xp[, , i] + K %*% innov[, , i]
##         Pf[, , i] = Pp[, , i] - K %*% A %*% Pp[, , i]
##         sigmat = as.matrix(sig[, , i], nrow = qdim, ncol = qdim)
##         like = like + log(det(sigmat)) + t(innov[, , i]) %*% 
##             siginv %*% innov[, , i]
##     }
##     like = 0.5 * like
##     list(xp = xp, Pp = Pp, xf = xf, Pf = Pf, like = like, innov = innov, 
##         sig = sig, Kn = K)
## }
## <environment: namespace:astsa>

Note: The Gaussian likelihood is used even for non-Gaussian models.

Other interesting models having state-space form

Local level model: \[ Y_t = \mu_t + V_t,\quad \mu_t = \mu_{t-1} + W_t, \] where \(\{V_t\}\sim \mbox{WN}(0,\sigma^2_V)\) and \(\{W_t\}\sim \mbox{WN}(0,\sigma^2_W)\). Or the same model with locally linear trend: \[ Y_t = \mu_t + V_t,\quad \mu_t = \mu_{t-1} + b_t + W_t,\quad b_t = b_{t-1} + Z_t, \] where in addition \(\{Z_t\}\sim \mbox{WN}(0,\sigma^2_Z)\).
Extension to one “factor” local level model: \[ Y_{1t} = \mu_t + V_{1t},\quad Y_{2t} = \mu_t + V_{2t},\quad \mu_t = \mu_{t-1} + W_t. \] Shumway and Stoffer (2011) use this for two series of global temepratures, one obtained as the averages from land-based stations and the other from marine-based stations.
The usual (vector) ARIMA models can be written in state-space form.

Many more interesting models can be considered in the case when the state-space models are nonlinear (the so-called hidden Markov models).

Connection to EM algorithm

Expectation-Minimization (EM) algorithm is a well-known and widely used method to compute ML estimates when the model involves latent (i.e. unobservable) variables (and also possibly there are missing values in the data). The missing values aside (see below for a related discussion on these), the state-space model does involve the latent variables \(\mathbf{X}_t\). Were these variables observable (in addition to \(\mathbf{Y}_t\)), under the Gaussian assumption and ignoring constants, the complete data likelihood could be written as \[\begin{eqnarray} -2 \log L_{\mathbf{X},\mathbf{Y}}(\theta) & = & \log |\Sigma_0| + (\mathbf{X}_0 - \mu_0 )' \Sigma_0^{-1}(\mathbf{X}_0 - \mu_0 ) \\ & & + T \log |Q| + \sum_{t=1}^T (\mathbf{X}_t - F \mathbf{X}_{t-1})' Q^{-1} (\mathbf{X}_t - F \mathbf{X}_{t-1}) \\ & & + T \log |R| + \sum_{t=1}^T (\mathbf{Y}_t - G \mathbf{X}_t)' R^{-1} (\mathbf{Y}_t - G \mathbf{X}_t). \end{eqnarray}\]

Thus, if we had the complete data, we could then use the results from multivariate normal theory to easily obtain the MLEs of \(\theta\). We do not have the complete data; however, the EM algorithm provides an iterative method for finding the MLEs of \(\Theta\) based on the incomplete data \(\mathbf{Y}_t\) by successively maximizing the conditional expectation of the complete data likelihood.

To describe the EM method for our state-space model, let \[ Q(\theta|\theta^{(j-1)}) = E_{\mathbf{X}|\mathbf{Y},\theta^{(j-1)}}(-2 \log L_{\mathbf{X},\mathbf{Y}}(\theta) ). \] The EM algorithm then is the iterative procedure involving the following two steps:

E step (“E” for Expectation): Calculate \(Q(\theta|\theta^{(j-1)})\). It turns out that this can be done explicitly using Kalman recursions (e.g. Shumway and Stoffer (2011), p. 343-344).

M step (“M” for Maximization): Minimize \(Q(\theta|\theta^{(j-1)})\), which turns out to be done explicitly as well (e.g. Shumway and Stoffer (2011), p. 343-344). Set \(\theta^{(j)}\) to be the minimum.

Iterate till converges.

Some advantages over direct minimization: The EM does not require calculation of the derivatives of the likelihood (in contrast to other iterative minimization procedures). It generally works better when the dimension of the state vector \(\mathbf{X}_t\) is large. But the real interest is in that it can be adapted to deal with missing data.

Dealing with missing observations

Longitudinal series of blood parameter levels monitored, log (white blood count) [top], log (platelet) [middle], and hematocrit (HCT) [bottom], after a bone marrow transplant (\(T = 91\) days). Approximately 40% of the values are missing, with missing values occurring primarily after the 35th day. (Shumway and Stoffer (2011), Chapter 6.)

plot(blood, type="o", pch=19, xlab="day", main="")

y    = cbind(WBC, PLT, HCT)
num  = nrow(y)       
A    = array(0, dim=c(3,3,num))  # creates num 3x3 zero matrices
for(k in 1:num) if (y[k,1] > 0) A[,,k]= diag(1,3) 

# Initial values 
mu0    = matrix(0,3,1) 
Sigma0 = diag(c(.1,.1,1) ,3)
Phi    = diag(1,3)
cQ     = diag(c(.1,.1,1), 3)
cR     = diag(c(.1,.1,1), 3)  
(em = EM1(num, y, A, mu0, Sigma0, Phi, cQ, cR, 100, .001))

## iteration    -loglikelihood 
##     1          68.28328 
##     2          -183.9361 
##     3          -194.2051 
##     4          -197.5444 
##     5          -199.7442 
##     6          -201.6431 
##     7          -203.4226 
##     8          -205.1253 
##     9          -206.7595 
##     10          -208.3251 
##     11          -209.8209 
##     12          -211.2464 
##     13          -212.602 
##     14          -213.8891 
##     15          -215.1094 
##     16          -216.2651 
##     17          -217.3589 
##     18          -218.3931 
##     19          -219.3705 
##     20          -220.2935 
##     21          -221.1649 
##     22          -221.9869 
##     23          -222.762 
##     24          -223.4924 
##     25          -224.1805 
##     26          -224.8282 
##     27          -225.4377 
##     28          -226.0109 
##     29          -226.5495 
##     30          -227.0555 
##     31          -227.5305 
##     32          -227.9762 
##     33          -228.3941 
##     34          -228.7857 
##     35          -229.1524 
##     36          -229.4956 
##     37          -229.8166 
##     38          -230.1166 
##     39          -230.3967 
##     40          -230.6582 
##     41          -230.9019 
##     42          -231.1289

## $Phi
##             [,1]        [,2]        [,3]
## [1,]  0.98052698 -0.03494377 0.008287009
## [2,]  0.05279121  0.93299479 0.005464917
## [3,] -1.46571679  2.25780951 0.795200344
## 
## $Q
##              [,1]         [,2]       [,3]
## [1,]  0.013786772 -0.001724166 0.01882951
## [2,] -0.001724166  0.003032109 0.03528162
## [3,]  0.018829510  0.035281625 3.61897901
## 
## $R
##             [,1]      [,2]      [,3]
## [1,] 0.007124671 0.0000000 0.0000000
## [2,] 0.000000000 0.0168669 0.0000000
## [3,] 0.000000000 0.0000000 0.9724247
## 
## $mu0
##           [,1]
## [1,]  2.119269
## [2,]  4.407390
## [3,] 23.905038
## 
## $Sigma0
##               [,1]          [,2]          [,3]
## [1,]  4.553949e-04 -5.249215e-05  0.0005877626
## [2,] -5.249215e-05  3.136928e-04 -0.0001199788
## [3,]  5.877626e-04 -1.199788e-04  0.1677365489
## 
## $like
##  [1]   68.28328 -183.93608 -194.20509 -197.54440 -199.74425 -201.64313
##  [7] -203.42258 -205.12530 -206.75951 -208.32511 -209.82091 -211.24639
## [13] -212.60202 -213.88906 -215.10935 -216.26514 -217.35887 -218.39311
## [19] -219.37048 -220.29354 -221.16485 -221.98686 -222.76196 -223.49243
## [25] -224.18049 -224.82824 -225.43771 -226.01085 -226.54953 -227.05552
## [31] -227.53054 -227.97621 -228.39410 -228.78569 -229.15242 -229.49563
## [37] -229.81661 -230.11659 -230.39674 -230.65816 -230.90189 -231.12893
## 
## $niter
## [1] 42
## 
## $cvg
## [1] 0.0009832656

# Graph smoother
ks  = Ksmooth1(num, y, A, em$mu0, em$Sigma0, em$Phi, 0, 0, chol(em$Q), chol(em$R), 0)
y1s = ks$xs[1,,] 
y2s = ks$xs[2,,] 
y3s = ks$xs[3,,]
p1  = 2*sqrt(ks$Ps[1,1,]) 
p2  = 2*sqrt(ks$Ps[2,2,]) 
p3  = 2*sqrt(ks$Ps[3,3,])
par(mfrow=c(3,1))
plot(WBC, type='p', pch=19, ylim=c(1,5), xlab='day')
 lines(y1s) 
 lines(y1s+p1, lty=2, col=4) 
 lines(y1s-p1, lty=2, col=4)
plot(PLT, type='p', ylim=c(3,6), pch=19, xlab='day')
 lines(y2s)
 lines(y2s+p2, lty=2, col=4)
 lines(y2s-p2, lty=2, col=4)
plot(HCT, type='p', pch=19, ylim=c(20,40), xlab='day')
 lines(y3s)
 lines(y3s+p3, lty=2, col=4) 
 lines(y3s-p3, lty=2, col=4)

Some final notes

In my presentation, I followed Shumway and Stoffer (2011), Chapter 6; Brockwell and Davis (2016), Chapter 8. But many other introductory time series textbooks cover state-space modeling as well. Several books dedicated to the subject are: Durbin and Koopman (2012), Gomez (2016).
State-space modeling is particularly suitable for Bayesian analysis, where it is known by the name “dynamic linear modeling”. See, for example, the books by Petris et al. (2009), Prado and West (2010).
Linear state-space models naturally generalize to hidden Markov models (HMMs) that will be covered later in the course under “non-linear models”.
Issues not covered: model diagnostics; prediction errors;

Questions/Homework

Q0: Supplement your analysis of time series data from Q1, Lecture I, by state-space modeling if seems appropriate. You can also omit some of the data so that some values are missing.

Q1: Why confidence intervals associated with trend and seasonality are decreasing in time for jj data? Is this feature general or specific to jj data?

Q2: Derive Kalman recursions for prediction, filtering and smoothing.

Q3: Write down the details behind the use of the EM algorithm for state-space models when it is applied with missing data.

Q4: Outline the Bayesian treatment of state-space models.

Q5: Write down and examine Kalman recursions explicitly for the local level model.

Q6: Write down how ARIMA\((p,d,q)\) model is embedded into a state-space model, and explain how this can be used to handle missing data.

Q7: The so-called AR\((1)\) model with observational noise written in state-space form turns out to be equivalent to an ARMA\((1,1)\) model (Shumway and Stoffer (2011), Example 6.3). Is this a more general phenomena?

References

Brockwell, P. J., and R. A. Davis. 2016. Introduction to Time Series and Forecasting. Third. Springer Texts in Statistics. Springer.

Durbin, J., and S. J. Koopman. 2012. Time Series Analysis by State Space Methods. Second. Vol. 38. Oxford Statistical Science Series. Oxford University Press, Oxford.

Gómez, V. 2016. Multivariate Time Series with Linear State Space Structure. Springer.

Petris, G., S. Petrone, and P. Campagnoli. 2009. Dynamic Linear Models with R. Use R! Springer, New York.

Prado, R., and M. West. 2010. Time Series. Texts in Statistical Science Series. CRC Press, Boca Raton, FL.

Shumway, R. H., and D. S. Stoffer. 2011. Time Series Analysis and Its Applications. Third. Springer Texts in Statistics. Springer, New York.