This is about detecting changes in time series (models), for example, their mean, variance or other characteristics/parameters. We assume implicitly that these structural changes are present for considerable period of time (in particular, allowing for reasonable estimation). In this sense, the changes are not “outliers” but rather are associated with an intrinsic structure of the time series in question.
We shall also focus on retroactive change point detection, as opposed to sequential (monitoring) change point detection. Moreover, we are thinking about abrupt changes, rather than trying to model slowly evolving changes. We shall not model occurrences of change points either, as in e.g. Markov switching models.
A typical (preprocessed) fMRI data consists of BOLD signal measurements for brain regions of interest (ROIs) across various brain networks (e.g. auditory, visual, postsalience, and so on). For example, the plots below are the measurements for 3 ROIs in the auditory network for an individual in rest state.
fmri.ts <- read.csv("fmri.csv")
par(mfrow=c(3,1))
plot.ts(fmri.ts[1:150,"auditory1"],ylab="aud 1")
plot.ts(fmri.ts[1:150,"auditory2"],ylab="aud 2")
plot.ts(fmri.ts[1:150,"auditory3"],ylab="aud 3")
Changes in the dependence structure is expected across ROIs (and networks) over time, for example,
What are ways to detect such changes? (For basic background information on fMRI signals and their statistical analysis, see e.g. Lindquist (2008).)
The goal of the lecture is to discuss several change point detection techniques applicable in the time series context, starting with detection of changes in time series mean, and to illustrate them on simulated and real data. The focus will be on univariate series throughout, but with multivariate series discussed in the notes.
From R perspective, the main sources are the packages \(\verb!strucchange!\) (see Zeileis et al. (2001)), \(\verb!changepoint!\) (see Killick and Eckley (2014)), as well as bits and pieces of added code. These are not “perfect” but in most cases sufficient for illustration purposes needed here.
Available methods for change points are easiest and most instructive to introduce in the case of testing for changes in mean. Moreover, consider first the case of IID residuals/observations. That is, the model \[ X_t = \mu_t + Z_t,\quad t=1,\ldots,T, \] where \(\{Z_t\}\sim \mbox{IID}(0,\sigma^2_Z)\). The case \[ \mu_t \equiv \bar \mu_0 \] corresponds to no change in mean, and the case \[ \mu_t \equiv \bar \mu_i,\quad t = T_{i-1}+1,\ldots,T_i,\quad i=1,\ldots,m+1, \] to \(m\) changes in mean at the times \(T_1,T_2,\ldots,T_m\) (with \(T_0=0\) and \(T_{m+1}=T\)). One is interested to test the hypothesis \[ \mbox{H}_0 : \mu_t \equiv \bar \mu_0\quad (\mbox{no change point in mean}). \] There are several well-known tests and approaches for this task (some of which can then be generalized to more complex problems), including:
These are explained and illustrated next.
“Fluctuation” tests (CUSUM, etc.)
With the data \(x_1,\ldots,x_T\), let \(\bar x_{(1:t)} = \frac{1}{t} \sum_{i=1}^t x_i\) be the average of the observations up to time \(t\), and \(\bar x = \bar x_{(1:T)}\) be the average of all \(T\) observations. Consider the OLS and recursive residuals defined as: \[ \widehat z_t = x_t - \bar x,\quad \widetilde z_t = x_t - \bar x_{(1:t)} \] and the corresponding variance estimators \[ \widehat \sigma^2 = \frac{1}{T-1} \sum_{t=1}^T \widehat z_t^2,\quad \widetilde \sigma^2 = \frac{1}{T-1} \sum_{t=1}^T (\widetilde z_t - \bar{\widetilde z})^2. \] The corresponding CUSUM processes are then defined as: \[ W_T^0(u) = \frac{1}{\widehat \sigma \sqrt{T}} \sum_{t=1}^{[Tu]} \widehat z_t,\quad W_T(u) = \frac{1}{\widetilde \sigma \sqrt{T}} \sum_{t=1}^{[Tu]} \widetilde z_t,\quad u\in [0,1]. \] One can show that \[ W_T^0(u) \approx W^0(u) = W(u) - uW(1),\quad W_T(u) \approx W(u) \] where \(\{W(u)\}_{u\in [0,1]}\) is the standard Brownian motion (Wiener process) and \(\{W^0(u)\}_{u\in [0,1]}\) is its Brownian bridge. (Why?)
Under AMOC (at most one change point) at time \(T_1\) (or after rescaling, at \(u_1=T_1/T\)), the recursive residuals will only have zero mean up to \(u_1 = T_1/T\). Hence the path of the process \(W_T(u)\) should be close to \(0\) up to \(u_1\) and leave its mean afterwards. Similarly, the path of \(W_T^0(u)\) should have a peak around \(u_1\). In fact, with AMOC at \(u_1\) as the alternative hypothesis \(\mbox{H}_1\) and under Gaussianity, the likelihood ratio test statistic is related to \(W_T^0(u_1)\) and it is natural to estimate the change point \(u_1\) as \[ \widehat u_1 = \mathop{\rm argmax}_{u} |W_T^0(u)| \] (see e.g. Csorgo and Horvath (1997)).
The idea that is common to all generalized fluctuation tests is that the null hypothesis of “no structural change” should be rejected when the fluctuation of the empirical process \(efp(u)\), here either \(W_T^0(u)\) or \(W_T(u)\), gets improbably large compared to the fluctuation of the limiting process. This comparison is performed by some appropriate boundary \(b(u)\), that the limiting process just crosses with a given probability \(\alpha\). Thus, if \(efp(u)\) crosses either \(b(u)\) or \(-b(u)\) for any \(u\), then it has to be concluded that the fluctuation is improbably large and the null hypothesis can be rejected at confidence level \(\alpha\).
For the limiting Brownian motion and Brownian bridge, it would seem natural to use boundaries that are proportional to the standard deviation function of the corresponding theoretic process, i.e., \[ b(u) = \lambda \sqrt{u},\quad b(u) = \lambda \sqrt{u(1-u)}, \] for the OLS-based CUSUM and the Recursive CUSUM path respectively, where \(\lambda\) determines the confidence level. Other boundaries that are commonly used are linear, because a closed form solution for the crossing probability is known. So the standard boundaries for the two process are of type \[ b(u) = \lambda (1+2u), \quad b(u) = \lambda. \]
set.seed(12)
m.data <- c(rnorm(100, 0, 1), rnorm(100, 1, 1), rnorm(100, 0.2, 1))
ts.plot(m.data, xlab = "Index")
library("strucchange")
temp.cus1 <- efp(m.data~1, type="OLS-CUSUM")
plot(temp.cus1, alpha = 0.01, alt.boundary = TRUE)
plot(temp.cus1, alpha = 0.01, alt.boundary = FALSE)
temp.cus2 <- efp(m.data~1, type="Rec-CUSUM")
plot(temp.cus2, alpha = 0.01, alt.boundary = TRUE)
plot(temp.cus2, alpha = 0.01, alt.boundary = FALSE)
These (non)crossings can also be put into a formal test, based on, for example, the statistic \[ \sup_u \frac{|efp(u)|}{f(u)} \] where \(f(u)\) appears in \(b(u) = \lambda f(u)\).
sctest(m.data~1, type="OLS-CUSUM",alt.boundary = TRUE)
##
## OLS-based CUSUM test with alternative boundaries
##
## data: m.data ~ 1
## S0 = 4.7939, p-value = 1e-04
sctest(m.data~1, type="OLS-CUSUM",alt.boundary = FALSE)
##
## OLS-based CUSUM test
##
## data: m.data ~ 1
## S0 = 2.2599, p-value = 7.33e-05
Multiple change points could be dealt with by using e.g. the binary segmentation - to be discussed below.
\(F\)-tests (Chow, etc.)
Supposing AMOC, one could also consider a test based on \(F\)-statistic (also known as Chow test, after Chow (1960)): for a known change point at \(t_0\), the \(F\)-statistic is defined as \[ F_{t_0} = \frac{(RSS_r - RSS_f)/1}{RSS_f/(T-2)}, \] where \(RSS_r\) is the sum of the residuals squared under the “restricted” model where there the mean is the same, and \(RSS_f\) is the sum of the residuals squared under the “full” model having two means. Under the null hypotheses of the same mean, the \(F\)-statistic has (approximately) an \(F\) distribution, with \((1, T-2)\) degrees of freedom.
Not knowing \(t_0\) and as with empirical fluctuation processes, one could look at the statistics \(F_t\) across \(t\), devise a crossing boundary and build formal tests.
set.seed(12)
m2.data <- c(rnorm(100, 0, 1), rnorm(100, .5, 1))
ts.plot(m2.data, xlab = "Index")
fs <- Fstats(m2.data~1)
plot(fs)
sctest(fs, type="supF")
##
## supF test
##
## data: fs
## sup.F = 16.749, p-value = 0.00108
Model selection
From the model selection perspective, another possibility is to choose a model that minimizes the objective function (with respect to the number of change points \(m\), and change points \(T_i\), \(i=1,\ldots,m\)) \[ \sum_{i=1}^{m+1} {\cal C}(x_{(T_{i-1}+1:T_i)}) + \beta g(m), \] where \({\cal C}(x_{(T_{i-1}+1:T_i)})\) is a cost associated with the segment \(x_{T_{i-1}+1},\ldots,x_{T_i}\) of the data and \(\beta f(m)\) is a penalty. A typical cost is twice the negative (Gaussian) log-likelihood, in this case \[ C(x_{(T_{i-1}+1:T_i)}) = \sum_{t=T_{i-1}+1}^{T_i} (x_t - \bar x_{(T_{i-1}+1:T_i)})^2. \] A penalty that is linear in \(m\) is often considered, as e.g.
\[ \beta g(m) = \beta m, \] where \(\beta = p \log T\) corresponds to SIC (BIC), with \(p=1\) being the number of additional parameters introduced by adding a change point.
The optimization problem can be solved by several methods, including
(Discuss.)
library("changepoint")
set.seed(10)
m2.data <- c(rnorm(100, 0, 1), rnorm(100, 1, 1), rnorm(100, 0, 1),rnorm(100, 0.2, 1))
ts.plot(m2.data, xlab = "Index")
m2.pelt <- cpt.mean(m2.data, method = "PELT")
plot(m2.pelt, type = "l", cpt.col = "blue", xlab = "Index", cpt.width = 4)
cpts(m2.pelt)
## [1] 97 192
m2.binseg <- cpt.mean(m2.data, method = "BinSeg")
plot(m2.binseg, type = "l", xlab = "Index", cpt.width = 4)
cpts(m2.binseg)
## [1] 79 192
Notes
Question: What does one do if the residuals/observations are temporally correlated? What difference does it make?
set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)
ar.pelt <- cpt.mean(ar.data, method = "PELT")
plot(ar.pelt, type = "l", cpt.col = "blue", xlab = "Index", cpt.width = 4)
#ar.binseg <- cpt.mean(ar.data, method = "BinSeg")
#plot(ar.binseg, type = "l", xlab = "Index", cpt.width = 4)
Important point: Whether a time series should be modeled as changes in mean superimposed by IID series, or as changes in mean superimposed by temporally dependent series is eventually motivated by physical considerations (and if these are not available, by the usefulness of the model).
“Fluctuation” tests (CUSUM, etc.) with time dependent residuals
“Fluctuation” tests still work for dependent residuals but with one important modification. For example, in \[ W_T^0(u) = \frac{1}{\widehat \sigma \sqrt{T}} \sum_{t=1}^{[Tu]} \widehat z_t,\quad W_T^0(u) \approx W^0(u),\quad u\in [0,1]. \] the variance estimator \(\widehat \sigma^2\) now needs to estimate not \(\mbox{Var}(Z_t)\) but rather the so-called long-run variance \[ \sigma^2 = \sum_{h=-\infty}^\infty \gamma_Z(h) = \lim_{T\to\infty} \frac{1}{T} \mbox{Var}\Big( \sum_{t=1}^T Z_t \Big). \] Long-run variance is a tricky quantity to estimate and there is a lot of work on how to do this reasonably well (e.g. Andrews (1991)). For example, \[ \widehat \sigma^2 = \sum_{h=-(T-1)}^{T-1} \widehat \gamma_Z(h) \] will NOT work (since it is always equal to \(0\)). A common choice is to take \[ \widehat \sigma^2 = \sum_{h=-(T-1)}^{T-1} K(\frac{h}{S_T}) \widehat \gamma_Z(h), \] where \(K\) is a kernel function, e.g. the Bartlett kernel \(K(x)=(1-|x|)1_{|x|\leq 1}\), and \(S_T\) is a “bandwidth.”
In the illustrations below: CUSUM = as above with \(S_T\) corresponding to the optimal bandwidth supposing AR\((1)\) model, CUSUM-MAC = as above but estimating the long-run variance through the spectral domain, CUSUM-RO = working with the residuals obtained by fitting a parametric ARMA\((p, q)\) model to the given data.
source(file="Mult-breaks-library.R")
set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)
data = ar.data
#out1 = supF_multi(data)
out2 = cusum_multi(data)
#out3 = cusum_multiJX(data)
out4 = cusum_multiRO(data)
out5 = cusum_multiMAC(data)
c(out2$iter, out4$iter, out5$iter)-1
## [1] 0 0 1
# Plotting
plot(data, type="l", ylab="Yearly minimum", xlab="")
points(out2$fitted, type="l", col="red", lwd=2)
points(out4$fitted, type="l", col="blue", lwd=2)
points(out5$fitted, type="l", col="green", lwd=2)
legend(20, 3, c("CUSUM","RO","MAC"), cex=1.0, col=c("red", "blue", "green"), lty=c(1, 1, 1), lwd=2, horiz=TRUE)
set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)
ar.data <- ar.data + c(rep(0,200),rep(1,200))
data = ar.data
#out1 = supF_multi(data)
out2 = cusum_multi(data)
#out3 = cusum_multiJX(data)
out4 = cusum_multiRO(data)
out5 = cusum_multiMAC(data)
c(out2$iter, out4$iter, out5$iter)-1
## [1] 1 1 1
# Plotting
plot(data, type="l", ylab="Yearly minimum", xlab="")
points(out2$fitted, type="l", col="red", lwd=2)
points(out4$fitted, type="l", col="blue", lwd=2)
points(out5$fitted, type="l", col="green", lwd=2)
legend(20, 3, c("CUSUM","RO","MAC"), cex=1.0, col=c("red", "blue", "green"), lty=c(1, 1, 1), lwd=2, horiz=TRUE)
Approaches based on decorrelated residuals
Regress on the past values and work with the residuals.
set.seed(10)
ar.data <- arima.sim(list(order = c(1,0,0), ar = 0.7), n = 400)
temp.cus.ar <- efp(ar.data~1, type="OLS-CUSUM")
plot(temp.cus.ar, alpha = 0.01, alt.boundary = TRUE)
plot(temp.cus.ar, alpha = 0.01, alt.boundary = FALSE)
temp.cus.ar2 <- efp(ar.data~1, type="OLS-CUSUM",dynamic = TRUE)
plot(temp.cus.ar2, alpha = 0.01, alt.boundary = TRUE)
plot(temp.cus.ar2, alpha = 0.01, alt.boundary = FALSE)
The methods discussed above extend straightforwardly to time series models, in particular, AR models \[ X_t = \phi_0 + \phi_1 X_{t-1} + \ldots + \phi_p X_{t-p} + Z_t \] or other time series regression models (e.g. with covariates), where the analysis is carried out on the residuals. Moreover, now a change is defined as that for model parameter vector (not just the mean). Here are key modifiations:
See Zeileis et al. (2001) for more details.
ECM example from the package \(\verb!strucchange!\). The data set contains the aggregated monthly personal income and personal consumption expenditures (in billion US dollars) between January 1959 and February 2001, which are seasonally adjusted at annual rates.
data(USIncExp)
USIncExp2 <- window(USIncExp, start = c(1985,12))
coint.res <- residuals(lm(expenditure ~ income, data = USIncExp2))
coint.res <- lag(ts(coint.res, start = c(1985,12), freq = 12), k = -1)
USIncExp2 <- cbind(USIncExp2, diff(USIncExp2), coint.res)
USIncExp2 <- window(USIncExp2, start = c(1986,1), end = c(2001,2))
colnames(USIncExp2) <- c("income", "expenditure", "diff.income","diff.expenditure", "coint.res")
plot.ts(USIncExp2[,c("diff.income","diff.expenditure", "coint.res")],main="Time series")
ecm.model <- diff.expenditure ~ coint.res + diff.income
ocus <- efp(ecm.model, type="OLS-CUSUM", data=USIncExp2)
me <- efp(ecm.model, type="fluctuation", data=USIncExp2, h=0.2)
bound.ocus <- boundary(ocus, alpha=0.05)
plot(ocus, boundary = FALSE)
lines(bound.ocus, col = 4)
lines(-bound.ocus, col = 4)
plot(me, functional = NULL)
sctest(ocus)
##
## OLS-based CUSUM test
##
## data: ocus
## S0 = 1.5511, p-value = 0.01626
sctest(ecm.model, type="OLS-CUSUM", data=USIncExp2)
##
## OLS-based CUSUM test
##
## data: ecm.model
## S0 = 1.5511, p-value = 0.01626
Q0: Try some of the considered change point detection methods on real time series.
Q1: Relate the test statistic \(W_T^0(u)\) to the likelihood ratio test statistics when the alternative consists of a single change point. Also, argue that the cumulative sum process of recursive residuals can be approximated by Brownian motion.
Q2: Explain in more detail the dynamic programming approach and the PELT algorithm to estimating change points.
Q3: Explain how exactly the fluctuation statistic is defined for recursive regression model coefficient estimates and what its asymptotic limit is expected.
Andrews, D. W. K. 1991. “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation.” Econometrica 59 (3): 817–58.
Bai, J. 1997. “Estimating Multiple Breaks One at a Time.” Econometric Theory 13 (3): 315–52.
Bai, J., and P. Perron. 1998. “Estimating and Testing Linear Models with Multiple Structural Changes.” Econometrica 66 (1): 47–78.
———. 2003. “Computation and Analysis of Multiple Structural Change Models.” Journal of Applied Econometrics 18 (1): 1–22.
Candelon, B., and H. Lütkepohl. 2001. “On the Reliability of Chow-Type Tests for Parameter Constancy in Multivariate Dynamic Models.” Economics Letters 73 (2): 155–60.
Chan, N. H., C. Y. Yau, and R.-M. Zhang. 2014. “Group LASSO for Structural Break Time Series.” Journal of the American Statistical Association 109 (506): 590–99.
Choi, H., H. Ombao, and B. Ray. 2008. “Sequential Change-Point Detection Methods for Nonstationary Time Series.” Technometrics 50 (1): 40–52.
Chow, G. C. 1960. “Tests of Equality Between Sets of Coefficients in Two Linear Regressions.” Econometrica 28: 591–605.
Csörgo, M., and L. Horváth. 1997. Limit Theorems in Change-Point Analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester.
Davis, R. A., T. C. M. Lee, and G. A. Rodriguez-Yam. 2006. “Structural Break Estimation for Nonstationary Time Series Models.” Journal of the American Statistical Association 101 (473): 223–39.
Jackson, B., J. D. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan, and T. T. Tsai. 2005. “An Algorithm for Optimal Partitioning of Data on an Interval.” IEEE Signal Processing Letters 12 (2): 105–8.
Killick, R., and I. Eckley. 2014. “Changepoint: An R Package for Changepoint Analysis.” Journal of Statistical Software 58 (3): 1–19.
Killick, R., P. Fearnhead, and I. A. Eckley. 2012. “Optimal Detection of Changepoints with a Linear Computational Cost.” Journal of the American Statistical Association 107 (500): 1590–8.
Lavielle, M., and E. Moulines. 2000. “Least-Squares Estimation of an Unknown Number of Shifts in a Time Series.” Journal of Time Series Analysis 21 (1): 33–59.
Lindquist, M. A. 2008. “The Statistical Analysis of FMRI Data.” Statistical Science 23 (4): 439–64.
Polunchenko, A. S., and A. G. Tartakovsky. 2012. “State-of-the-Art in Sequential Change-Point Detection.” Methodology and Computing in Applied Probability 14 (3): 649–84.
Tibshirani, R., M. Saunders, S. Rosset, J. Zhu, and K. Knight. 2005. “Sparsity and Smoothness via the Fused Lasso.” Journal of the Royal Statistical Society. Series B. Statistical Methodology 67 (1): 91–108.
Zeileis, A., F. Leisch, K. Hornik, and C. Kleiber. 2001. “Strucchange. an R Package for Testing for Structural Change in Linear Regression Models.” Preprint.