For some multivariate time series data \(\mathbf{X}_t = (X_{1t},\ldots,X_{dt})'\), a component series \(X_{j,t}\) can be associated with a spatial location \(j\) and it seems natural to incorporate the potential spatial dependence of the data more explicitely into modeling. Such spatio-temporal data can be of different types, with the focus here on spatio-temporal data that is sparse in space but rich in time, and where spatial locations, in particular, can vary continuously. Other types of spatio-temporal data will be discussed later in the course.
Modeling spatio-temporal data makes an area of Statistics and other fields in itself. (In fact, even spatial statistics is a huge area on its own; e.g. Bivand et al. (2013).) See, for example, the textbooks by Sherman (2011), Cressie and Wikle (2015) on the Statistics side, and the webpage CRAN Task View: Handling and Analyzing Spatio-Temporal Data. The goal here is to provide a flavor of spatio-temporal modeling through an example borrowed from and using several R packages, most notably \(\verb!spacetime!\), \(\verb!sp!\), \(\verb!gstat!\).
Source: \(\verb!gstat!\), \(\verb!spacetime!\). The original paper is by Haslett and Raftery (1989). The discussion on using the data with R in Pebesma (2012). See also Graler et al. (2016).
options(width=120)
library(gstat)
data("wind")
# The Irish wind data with two tables: The data frame "wind" with daily wind speed averages for years
# 1961-1978, for 12 stations, in knots (1 knot = 0.5418 m/s). Locations included in the frame wind.loc
head(wind)
## year month day RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL
## 1 61 1 1 15.04 14.96 13.17 9.29 13.96 9.87 13.67 10.25 10.83 12.58 18.50 15.04
## 2 61 1 2 14.71 16.88 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83
## 3 61 1 3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 8.04 8.50 7.67 12.75 12.71
## 4 61 1 4 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
## 5 61 1 5 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
## 6 61 1 6 13.21 8.12 9.96 6.67 5.37 4.50 10.67 4.42 7.17 7.50 8.12 13.17
head(wind.loc)
## Station Code Latitude Longitude MeanWind
## 1 Valentia VAL 51d56'N 10d15'W 5.48
## 2 Belmullet BEL 54d14'N 10d00'W 6.75
## 3 Claremorris CLA 53d43'N 8d59'W 4.32
## 4 Shannon SHA 52d42'N 8d55'W 5.38
## 5 Roche's Point RPT 51d48'N 8d15'W 6.36
## 6 Birr BIR 53d05'N 7d53'W 3.65
# Converting character representation (such as 51d56’N) to proper numerical coordinates, and
# Converting the station locations to a SpatialPointsDataFrame object that e.g. can be plotted
library(sp)
wind.loc$y = as.numeric(char2dms(as.character(wind.loc[["Latitude"]])))
wind.loc$x = as.numeric(char2dms(as.character(wind.loc[["Longitude"]])))
coordinates(wind.loc) = ~x+y
proj4string(wind.loc) = "+proj=longlat +datum=WGS84"
head(wind.loc)
## coordinates Station Code Latitude Longitude MeanWind
## 1 (-10.25, 51.93333) Valentia VAL 51d56'N 10d15'W 5.48
## 2 (-10, 54.23333) Belmullet BEL 54d14'N 10d00'W 6.75
## 3 (-8.983333, 53.71667) Claremorris CLA 53d43'N 8d59'W 4.32
## 4 (-8.916667, 52.7) Shannon SHA 52d42'N 8d55'W 5.38
## 5 (-8.25, 51.8) Roche's Point RPT 51d48'N 8d15'W 6.36
## 6 (-7.883333, 53.08333) Birr BIR 53d05'N 7d53'W 3.65
# Visualizing the station locations on the map of Ireland
library(mapdata)
map("worldHires", xlim = c(-11,-5.4), ylim = c(51,55.5))
plot(wind.loc, add=TRUE, pch=16)
text(coordinates(wind.loc), pos=1, label=wind.loc$Station)
# Recoding the time columns to an appropriate time data structure
wind$time = ISOdate(wind$year+1900, wind$month, wind$day)
wind$jday = as.numeric(format(wind$time, '%j'))
head(wind[-c(4:15)])
## year month day time jday
## 1 61 1 1 1961-01-01 12:00:00 1
## 2 61 1 2 1961-01-02 12:00:00 2
## 3 61 1 3 1961-01-03 12:00:00 3
## 4 61 1 4 1961-01-04 12:00:00 4
## 5 61 1 5 1961-01-05 12:00:00 5
## 6 61 1 6 1961-01-06 12:00:00 6
# E.g. the wind speed time series data for Dublin
plot(DUB~time, wind, type= 'l', ylab = "windspeed (knots)", main = "Dublin")
windsqrt = sqrt(0.5148 * as.matrix(wind[4:15]))
Jday = 1:366
windsqrt = windsqrt - mean(windsqrt)
daymeans = sapply(split(windsqrt, wind$jday), mean)
plot(daymeans ~ Jday)
lines(lowess(daymeans ~ Jday, f = 0.1))
meanwind = lowess(daymeans ~ Jday, f = 0.1)$y[wind$jday]
velocity = apply(windsqrt, 2, function(x) { x - meanwind })
Spatial dependence:
pts = coordinates(wind.loc[match(names(wind[4:15]), wind.loc$Code),])
dists = spDists(pts, longlat=TRUE)
corv = cor(velocity)
sel = !(as.vector(dists) == 0)
plot(as.vector(corv[sel]) ~ as.vector(dists[sel]), xlim = c(0,500), ylim = c(.4, 1), xlab = "distance (km)", ylab = "correlation")
ros = rownames(corv) == "ROS"
dists.nr = dists[!ros,!ros]
corv.nr = corv[!ros,!ros]
sel = !(as.vector(dists.nr) == 0)
plot(as.vector(corv.nr[sel]) ~ as.vector(dists.nr[sel]), pch = 3,xlim = c(0,500), ylim = c(.4, 1), xlab = "distance (km)", ylab = "correlation")
points(corv[ros,!ros] ~ dists[ros,!ros], pch=16, cex=.5)
xdiscr = 1:500
lines(xdiscr, .968 * exp(- xdiscr/750))
Temporal dependence:
acf(velocity[,7])
tdiscr = 0:40
lines(tdiscr, exp(- tdiscr/1.5))
acf_vel <- acf(velocity, plot=FALSE, type="correlation", lag.max = 30)
#acf_vel <- acf(velocity, plot=FALSE, type="covariance", lag.max = 30)
lags <- c(0:30)
plot(lags,acf_vel$acf[,1,1],ylab="acf")
lines(lags, exp(- lags/1.5))
for (j in 2:12){
par(new=TRUE)
plot(lags,acf_vel$acf[,j,j],yaxt='n',ylab="acf")
}
The goal of the lecture is to fit a basic spatio-temporal model to capture spatio and temporal dependencies in the data, and illustrate the model use in the so-called kriging.
One could put spatio-temporal data into an R object of the corresponding class, which can then be manipulated more conveniently e.e. in visualization.
wind.loc = wind.loc[match(names(wind[4:15]), wind.loc$Code),]
pts = coordinates(wind.loc[match(names(wind[4:15]), wind.loc$Code),])
rownames(pts) = wind.loc$Station
pts = SpatialPoints(pts, CRS("+proj=longlat +datum=WGS84"))
library(spacetime)
library(rgdal)
utm29 = CRS("+proj=utm +zone=29 +datum=WGS84")
pts = spTransform(pts, utm29)
wind.data = stConstruct(velocity, space = list(values = 1:ncol(velocity)), time = wind$time, SpatialObj = pts, interval = TRUE)
class(wind.data)
## [1] "STFDF"
## attr(,"package")
## [1] "spacetime"
# Basic plots
scales=list(x=list(rot = 45))
stplot(wind.data, mode = "xt", scales = scales, xlab = NULL)
#stplot(wind.data[1:12,"1963"], mode = "xt", scales = scales, xlab = NULL)
stplot(wind.data[1:12,"1974-11-01::1974-11-30"], mode = "xt", scales = scales, xlab = NULL)
#stplot(wind.data[1:12,"1963"], mode = "xt", scales = scales, xlab = NULL)
stplot(wind.data[1:12,"1974-11-01::1974-11-30"], mode = "tp", scales = scales, xlab = NULL)
We need a covariance model for (zero-mean) random variables \(X(t,u)\) in time \(t\) and space \(u\). We are looking for a stationary covariance model satisfying \[ C(h,v) = EX(t,u) X(t+h,u+v). \] The separable model assumes that we can write: \[ C(h,v) = C_t(h)C_u(v) = C_0 \overline C_t(h)\overline C_u(v), \] where the bar indicates that the quantities are standardized and the constant \(C_0\) is known as the sill (in this case, just the variance). That is, we can model the spatial domain covariance separate from the time domain covariance. A typical separable model is the exponential model given by \[ C(h,v) = \sigma^2 \exp\{- |h|/a_t\} \exp\{- \|v\|/a_u\}, \] where \(a\) is called the range for the two underlying univariate exponential models. Note: the exponential model is analogous to the AR model.
Non-separable models are also used, e.g. (Sherman (2011), p. 129) \[ C(h,v) = \frac{\sigma^2}{(|h|^{2\gamma}+1)^\tau} \exp\Big\{ - \frac{c\|v\|^{2\gamma}}{(|h|^{2\gamma}+1)^{\beta\gamma}} \Big\}, \] where \(\tau\) determines the smoothness of the temporal correlation, while \(\gamma\) determines the smoothness of the spatial correlation, \(c\) determines the strength of the spatial correlation, and \(\beta\in(0,1)\) determines the strength of space/time interaction.
For the wind.data, we shall use below the separable exponential model with the following parameters, motivated by the dependence consideration above:
library(gstat)
v = vgmST("separable", space = vgm(1, "Exp", 750*1000), time = vgm(1, "Exp", 1.5 * 3600 * 24),sill=0.3)
Note: The model could also be estimated. (To be done.)
Kriging is a method of interpolation originally used in geostatistics. It assumes that the points can be modeled by a Gaussian process with given covariance. The inputs to a kriging estimator are the known data points and a covariance model. Under suitable assumptions, it provides the best linear unbiased prediction.
The goal of kriging is to come up with weights \(\lambda(t,u)\) to be applied to the data for each considered “new” point \((t,u)\). Assuming mean \(0\), suppose the data (to be used for kriging) is available at points \((t_i,u_i)\), \(i=1,\ldots,n\). We are thus looking for the best linear predictor in the form \[ X^*(t,u) = \lambda(t,u)'\ \mathbf{X} = \sum_{i=1}^n \lambda_i(t,u) X(t_i,u_i), \] where \(\lambda(t,u)\) is the vector of weights and \(\mathbf{X} = (X(t_1,u_1),\ldots,X(t_n,u_n))'\) is the vector of observed values.
We wish to minimize the prediction error, which leads to the conditions: \[ E (X(t,u) - \sum_{i=1}^n \lambda_i(t,u) X(t_i,u_i)) X(t_j,u_j) = 0,\quad j=1,\ldots,n, \] and hence the following system of equations: \[ \sum_{i=1}^n \lambda_i(t,u) C(t_i-t_j,u_i-u_j) = C(t_j-t,u_j-u) ,\quad j=1,\ldots,n. \] The standard errors of the prediction estimates can also be given, and the prediction validity can be examined.
library(maptools)
m = map2SpatialLines(
#map("worldHires", xlim = c(-11,-5.4), ylim = c(51,55.5), plot=F))
map("worldHires", xlim = c(-11.5,-6.0), ylim = c(51.3,55.0), plot=F))
proj4string(m) = "+proj=longlat +datum=WGS84"
m = spTransform(m, utm29)
#prediction grid in space
grd = SpatialPixels(SpatialPoints(makegrid(m, n = 300)),proj4string = proj4string(m))
#arbitrarily choose month for prediction
wind.data2 = wind.data[, "1961-04"]
#wind.data2 = wind.data[, "1961-04::1964-04"]
#prediction grid in time
n = 10
library(xts)
tgrd <- xts(1:n, seq(min(index(wind.data2)), max(index(wind.data2)),length = n))
#prediciton grid in space-time
pred.grd <- STF(grd, tgrd)
#Kriging model
pred <- krigeST(values ~ 1, wind.data2, pred.grd,modelList=v, computeVar = TRUE)
#results of kriging model in right format
wind.ST <- STFDF(grd, tgrd, as.data.frame(pred$var1.pred))
wind.STvar <- STFDF(grd, tgrd, as.data.frame(pred$var1.var))
colnames(wind.ST@data) <- "sqrt_speed"
colnames(wind.STvar@data) <- "sqrt_speed"
#plot results
library(RColorBrewer)
layout <- list(list("sp.lines", m, col = "gray"),list("sp.points", pts, first = FALSE, cex = 0.5))
stplot(wind.ST, col.regions = brewer.pal(11, "RdBu")[-c(10, 11)],at = seq(-1.375, 1, by = 0.25),par.strip.text = list(cex = 0.7), sp.layout = layout,animate=0)
stplot(wind.STvar, col.regions = brewer.pal(11, "RdBu")[-c(10, 11)],at = seq(0, .15, by = 0.025),par.strip.text = list(cex = 0.7), sp.layout = layout,animate=0)
Other analyses are used for spatio-temporal data as well, for example, empirical orthogonal functions (EOFs) which is similar to principal component analysis. See Cressie and Wikle (2015).
In fact, in Haslett and Raftery (1989), the velocities are modeled through a long memory model as \[ X_{jt} = \mu_j + (I-B)^{-d}\phi(B)^{-1} \theta(B) Z_{jt},\quad j=1,\ldots,12, \] where \(\mathbf{Z}_t = (Z_{1t},\ldots,Z_{12,t})'\sim \mbox{WN}(0,\sigma^2_Z R)\) and \(R = (r_{jk})\) with \(r_{jk} = \alpha \exp\{-\beta d_{jk}\}\), \(j\neq k\). Long memory models will be discussed next.
Q0: If really motivated and appropriate, supplement your analysis of multivariate time series with spatio-temporal analysis.
Q1: In addition to covariance, define a variogram and relate the two notions. Explain how these are estimated in practice non-parametrically, and provide an example in R.
Q2: Describe some ways to decide on what covariance/variogram model to fit to spatio-temporal data (separable, non-separable, isotropic, non-isotropic, etc).
Bivand, R.S., E. Pebesma, and V. Gómez-Rubio. 2013. Applied Spatial Data Analysis with R. Use R! Springer.
Cressie, N., and C. K. Wikle. 2015. Statistics for Spatio-Temporal Data. John Wiley & Sons.
Gräler, B., E. Pebesma, and G. Heuvelink. 2016. “Spatio-Temporal Interpolation Using Gstat.” The R Journal 8 (1): 204–18.
Haslett, J., and A. E. Raftery. 1989. “Space-Time Modelling with Long-Memory Dependence: Assessing Ireland’s Wind Power Resource.” Applied Statistics, 1–50.
Pebesma, E. 2012. “Spacetime: Spatio-Temporal Data in R.” Journal of Statistical Software 51 (7): 1–30.
Sherman, M: 2011. Spatial Statistics and Spatio-Temporal Data: Covariance Functions and Directional Properties. John Wiley & Sons.