This is about some time series models for count data.
Example 1: The monthly number of cases of poliomyelitis in the United States between 1970 and 1983.
library(acp)
data(polio)
plot.ts(polio)
ts1 <- polio$polio
acf(ts1,lag.max = 20)
#pacf(ts1,lag.max = 20)
mean(ts1)
## [1] 1.333333
var(ts1)
## [1] 3.50499
plot(table(ts1)/length(ts1),ylab="probs",xlab="values")
points(0:15, dpois(0:15,lambda=mean(ts1)))
Example 2: The number of cases of campylobacter infections in the north of the province Quebec (Canada) in four week intervals from January 1990 to the end of October 2000.
library(tscount)
data(campy)
plot.ts(campy)
ts2 <- campy
acf(ts2,lag.max = 20)
#pacf(ts1,lag.max = 20)
mean(ts2)
## [1] 11.54286
var(ts2)
## [1] 53.24275
plot(table(ts2)/length(ts2),ylab="probs",xlab="values")
points(0:55, dpois(0:55,lambda=mean(ts2)))
Example 3: The monthly totals of car drivers in Great Britain killed or seriously injured Jan 1969 to Dec 1984. Compulsory wearing of seat belts was introduced on 31 Jan 1983.
data(Seatbelts)
ts3 <- Seatbelts[, "VanKilled"]
plot.ts(ts3)
acf(ts3,lag.max = 20)
#pacf(ts1,lag.max = 20)
mean(ts3)
## [1] 9.057292
var(ts3)
## [1] 13.22707
plot(table(ts3)/length(ts3),ylab="probs",xlab="values")
points(0:55, dpois(0:55,lambda=mean(ts3)))
The goals is to discuss count time series models based on GLM and illustrate them on data above. The main R package used is \(\verb!tscount!\), though there are a number of others. See Liboschik et al. (2016) and references therein.
Let \(\{Y_t\}\) be a count time series and \(\{\mathbf{X}_t\}\) be an \(r\)-vector series of covariates. Let also \({\cal F}_t\) be the history of the joint process \(\{Y_t,\mathbf{X}_{t+1}\}\) up to time \(t\). Set \(\lambda_t = \mathbb{E} (Y_t|{\cal F}_{t-1})\). The general model is of the form \[ g(\lambda_t) = \beta_0 + \sum_{k=1}^p \beta_k \widetilde g(Y_{t-i_k}) + \sum_{\ell=1}^q \alpha_j g(\lambda_{t-j_\ell}) + \mathbf{u}'\mathbf{X}_t, \] where \(g\) is a link function, \(\widetilde g\) is a transformation function and \(P = \{i_1,\ldots,i_p\}\), \(Q = \{j_1,\ldots,j_q\}\) are sets of integers with \(0<i_1<\ldots<i_p\), \(0<j_1<\ldots<j_q\). For example, with \(g(x) = \widetilde g(x) = x\) and \(P=\{1,\ldots,p\}\), \(Q = \{1,\ldots,q\}\) and \(\mathbf{u}=0\), the model is \[ \lambda_t = \beta_0 + \sum_{k=1}^p \beta_k Y_{t-k} + \sum_{\ell=1}^q \alpha_j \lambda_{t-\ell}, \] which is known as integer-valued GARCH model of order \(p\) and \(q\), abbreviated as INGARCH\((p,q)\), and also as autoregressive conditional Poisson (ACP) model. In addition, the distributional assumption on \(Y_t\) given \({\cal F}_{t-1}\) is made, for example, Poisson, negative binomial or other.
The inference is (quasi) maximum likelihood-based, with its advantages for e.g. testing. E.g. 1-step-ahead prediction is \(\lambda_{T+1}\). In model diagnostics, e.g. the following residuals are considered \(y_t - \widehat \lambda_t\). See Liboschik et al. (2016).
Example 1: The monthly number of cases of poliomyelitis in the United States between 1970 and 1983.
# trend=(1:168/168)
# cos12=1+cos((2*pi*(1:168))/12)
# sin12=1+sin((2*pi*(1:168))/12)
# cos6=1+cos((2*pi*(1:168))/6)
# sin6=1+sin((2*pi*(1:168))/6)
#polio_cov <- data.frame(trend, cos12, sin12, cos6, sin6)
#polio_pois <- tsglm(polio$polio, model = list(past_obs = 1, past_mean = c(1,2)), xreg = polio_cov, distr = "poisson")
polio_pois <- tsglm(polio$polio, model = list(past_obs = 1), distr = "poisson")
summary(polio_pois)
##
## Call:
## tsglm(ts = polio$polio, model = list(past_obs = 1), distr = "poisson")
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 0.861 0.0993 0.667 1.06
## beta_1 0.360 0.0665 0.230 0.49
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: identity
## Distribution family: poisson
## Number of coefficients: 2
## Log-likelihood: -280.4975
## AIC: 564.995
## BIC: 571.2429
## QIC: 564.9943
polio_nbin <- tsglm(polio$polio, model = list(past_obs = 1), distr = "nbinom")
summary(polio_nbin)
##
## Call:
## tsglm(ts = polio$polio, model = list(past_obs = 1), distr = "nbinom")
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 0.861 0.127 0.612 1.111
## beta_1 0.360 0.103 0.157 0.563
## sigmasq 0.533 NA NA NA
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: identity
## Distribution family: nbinom (with overdispersion coefficient 'sigmasq')
## Number of coefficients: 3
## Log-likelihood: -258.1501
## AIC: 522.3002
## BIC: 531.6721
## QIC: 569.0743
par(mfrow=c(2,2))
# Marginal calibration reports (1/T) sum_{t=1}^T P(Y_t < y|F_{t-1}) - (1/T) sum_{t=1}^T 1_{y_t < y}
# Liboschik et al: "If the predictions from a model are appropriate the marginal distribution of the predictions resembles the marginal distribution of the observations"
acf(residuals(polio_pois), main = "ACF of response residuals")
marcal(polio_pois, ylim = c(-0.13, 0.13), main = "Marginal calibration")
lines(marcal(polio_nbin,plot=FALSE),lty="dashed")
# PIT stands for Probability Integral Transform. If the model is adequate, the PIT should be close to uniform
legend("bottomright",legend = c("Pois","NegBin"), lwd=1,lty=c("solid","dashed"))
pit(polio_pois, ylim = c(0, 1.5), main = "PIT Poisson")
pit(polio_nbin, ylim = c(0, 1.5), main = "PIT Negative Binomial")
# These are mean scores, namely, mean "differences" for the predictive distribution and the observations
rbind(Poisson = scoring(polio_pois), NegBin = scoring(polio_nbin))
## logarithmic quadratic spherical rankprob dawseb normsq
## Poisson 1.669628 -0.2509310 -0.4942208 0.8352967 2.027905 1.8222479
## NegBin 1.536608 -0.2682583 -0.5152435 0.8012837 1.714268 0.9880942
## sqerror
## Poisson 3.179298
## NegBin 3.179298
polio_pois_pred <- predict(polio_pois, n.ahead = 10, level = 0.9, global = TRUE)
polio_pois_pred$pred
## [1] 3.020990 1.948746 1.562809 1.423898 1.373899 1.355903 1.349425
## [8] 1.347094 1.346255 1.345953
polio_pois_pred$interval
## lower upper
## [1,] 0 8
## [2,] 0 6
## [3,] 0 6
## [4,] 0 6
## [5,] 0 5
## [6,] 0 6
## [7,] 0 6
## [8,] 0 6
## [9,] 0 5
## [10,] 0 6
The fitted model is: \(Y_t|{\cal F}_{t-1} \sim \mbox{Pois}(\lambda_t)\) and \[ \lambda_t = 0.861 + 0.36 Y_{t-1}. \]
Example 2: The number of cases of campylobacter infections in the north of the province Quebec (Canada) in four week intervals from January 1990 to the end of October 2000.
interventions <- interv_covariate(n = length(campy), tau = c(84, 100), delta = c(1, 0))
campyfit_pois <- tsglm(campy, model = list(past_obs = 1, past_mean = 13), xreg = interventions, distr = "poisson")
campyfit_nbin <- tsglm(campy, model = list(past_obs = 1, past_mean = 13), xreg = interventions, distr = "nbinom")
par(mfrow=c(2,2))
acf(residuals(campyfit_pois), main = "ACF of response residuals")
marcal(campyfit_pois, ylim = c(-0.03, 0.03), main = "Marginal calibration")
lines(marcal(campyfit_nbin,plot=FALSE),lty="dashed")
legend("bottomright",legend = c("Pois","NegBin"), lwd=1,lty=c("solid","dashed"))
pit(campyfit_pois, ylim = c(0, 1.5), main = "PIT Poisson")
pit(campyfit_nbin, ylim = c(0, 1.5), main = "PIT Negative Binomial")
rbind(Poisson = scoring(campyfit_pois), NegBin = scoring(campyfit_nbin))
## logarithmic quadratic spherical rankprob dawseb normsq
## Poisson 2.750006 -0.07669318 -0.2751017 2.200186 3.662209 1.3080625
## NegBin 2.722028 -0.07799704 -0.2765909 2.185105 3.605935 0.9642855
## sqerror
## Poisson 16.51282
## NegBin 16.51282
summary(campyfit_nbin)
##
## Call:
## tsglm(ts = campy, model = list(past_obs = 1, past_mean = 13),
## xreg = interventions, distr = "nbinom")
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 3.3184 0.7851 1.7797 4.857
## beta_1 0.3690 0.0696 0.2326 0.505
## alpha_13 0.2198 0.0942 0.0352 0.404
## interv_1 3.0810 0.8560 1.4032 4.759
## interv_2 41.9541 12.0914 18.2554 65.653
## sigmasq 0.0297 NA NA NA
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: identity
## Distribution family: nbinom (with overdispersion coefficient 'sigmasq')
## Number of coefficients: 6
## Log-likelihood: -381.0839
## AIC: 774.1678
## BIC: 791.8176
## QIC: 787.6442
The fitted model is: \(Y_t|{\cal F}_{t-1} \sim \mbox{NegBin}(\lambda_t,33.61)\) and \[ \lambda_t = 3.32 + 0.37 Y_{t-1} + 0.22 \lambda_{t-13} + 3.08 \mathbf{1}_{\{t\geq 84\}} + 41.95 \mathbf{1}_{\{t= 100\}} \]
Example 3: The monthly totals of car drivers in Great Britain killed or seriously injured Jan 1969 to Dec 1984. Compulsory wearing of seat belts was introduced on 31 Jan 1983.
timeseries <- Seatbelts[, "VanKilled"]
regressors <- cbind(PetrolPrice = Seatbelts[, c("PetrolPrice")],linearTrend = seq(along = timeseries)/12)
timeseries_until1981 <- window(timeseries, end = 1981 + 11/12)
regressors_until1981 <- window(regressors, end = 1981 + 11/12)
seatbeltsfit <- tsglm(timeseries_until1981,model = list(past_obs = c(1, 12)), link = "log", distr = "poisson", xreg = regressors_until1981)
#summary(seatbeltsfit, B = 500)
summary(seatbeltsfit)
##
## Call:
## tsglm(ts = timeseries_until1981, model = list(past_obs = c(1,
## 12)), xreg = regressors_until1981, link = "log", distr = "poisson")
##
## Coefficients:
## Estimate Std.Error CI(lower) CI(upper)
## (Intercept) 1.9037 0.36911 1.1803 2.6272
## beta_1 0.0856 0.08095 -0.0730 0.2443
## beta_12 0.1509 0.08499 -0.0156 0.3175
## PetrolPrice 0.0826 2.30408 -4.4333 4.5985
## linearTrend -0.0291 0.00793 -0.0446 -0.0135
## Standard errors and confidence intervals (level = 95 %) obtained
## by normal approximation.
##
## Link function: log
## Distribution family: poisson
## Number of coefficients: 5
## Log-likelihood: -396.3849
## AIC: 802.7697
## BIC: 818.019
## QIC: 802.7697
timeseries_1982 <- window(timeseries, start = 1982, end = 1982 + 11/12)
regressors_1982 <- window(regressors, start = 1982, end = 1982 + 11/12)
predict(seatbeltsfit, n.ahead = 12, level = 0.9, global = TRUE, B = 2000, newxreg = regressors_1982)$pred
## Jan Feb Mar Apr May Jun Jul
## 1982 7.707988 7.454226 7.568846 7.409922 7.210433 6.985772 7.145880
## Aug Sep Oct Nov Dec
## 1982 7.826338 7.493486 7.816908 8.022388 7.451928
seatbeltsfit_alldata <- tsglm(timeseries, link = "log",model = list(past_obs = c(1, 12)), xreg = regressors, distr = "poisson")
interv_test(seatbeltsfit_alldata, tau = 170, delta = 1, est_interv = TRUE)
##
## Score test on intervention(s) of given type at given time
##
## Chisq-Statistic: 1.152953 on 1 degree(s) of freedom, p-value: 0.2829319
##
## Fitted model with the specified intervention:
##
## Call:
## tsglm(ts = fit$ts, model = model_extended, xreg = xreg_extended,
## link = fit$link, distr = fit$distr)
##
## Coefficients:
## (Intercept) beta_1 beta_12 PetrolPrice linearTrend
## 0.19508 0.08819 0.80446 3.17408 -0.04788
## interv_1
## 0.24570
Special cases of the introduced count models can be viewed and treated as Hidden Markov Models (to be discussed next class).
Other types of count time series models have also been considered, for example, based on binomial thinning, aggregation of suitable binomial random variables, …
Analyze a real count time series data using the methods discussed.
Liboschik, T., K. Fokianos, and R. Fried. 2016. “Tscount: An R Package for Analysis of Count Time Series Following Generalized Linear Models.”