CC-BY-SA, Edzer Pebesma 2015.

Time, Space, Spacetime in R

Package requirements

The packages used in this course are as follows:

pkgs <- c(
  "sp", # foundational spatial package
  "spacetime", # core spacetime classes and methods
  "stpp", # space-time point pattern simulation
  "foreign", # for loading data
  "plm", # time series for panel data
  "zoo", # for time-series analysis
  "xts", # extensible time series analysis
  "geonames", # query the geonames api
  "geosphere", # for plotting on geographic coordinates
  "gstat" # geostatistics
)

It is likely you will need to install some of these on your computer. To find out which ones need to be loaded we can as R, using require().

# Which packages are already installed?
reqs <- vapply(pkgs, require, character.only = TRUE, FUN.VALUE = logical(1))
reqs # print which packages you need
##        sp spacetime      stpp   foreign       plm       zoo       xts 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE 
##  geonames geosphere     gstat 
##      TRUE      TRUE      TRUE
# Install the ones that are needed
if(!all(reqs)){
  install.packages(pkgs[!reqs])
}

Data in R

Data in R are often organized in vectors,

a = c(15, 20, 30)
a
## [1] 15 20 30
b = c("blue", "brown", "blue")
b
## [1] "blue"  "brown" "blue"

in matrices

m = matrix(1:4, 2, 2)
m
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

or in data.frames

d = data.frame(a, b = c("blue", "brown", "brown"), c = c(65, 77, 69.6))
d
##    a     b    c
## 1 15  blue 65.0
## 2 20 brown 77.0
## 3 30 brown 69.6

Such data has little meaning, because it is unclear what the variables refer to, and what the records refer to. A more descriptive way would be

Patients = data.frame(PatientID = a, EyeColor = b, Weight_kg = c(65, 77, 69.6))
Patients
##   PatientID EyeColor Weight_kg
## 1        15     blue      65.0
## 2        20    brown      77.0
## 3        30     blue      69.6

where another table would be needed for the personal details related to PatientID.

Note here that

The weight numbers

Patients$Weight_kg
## [1] 65.0 77.0 69.6

carry no information about their measurement unit, allowing for meaningless computations such as

with(Patients, Weight_kg + PatientID)
## [1] 80.0 97.0 99.6

The answer is correct, though, as + requires two numeric arguments. If we try to add eye color to body weight, we get a warning if EyeColor is a factor (by default, data.frame converts character vectors into factors!),

with(Patients, EyeColor + Weight_kg)
## Warning in Ops.factor(EyeColor, Weight_kg): '+' not meaningful for factors
## [1] NA NA NA

or in case EyeColor is character, we get an error:

Patients$EyeColor = as.character(Patients$EyeColor)
print(try(with(Patients, EyeColor + Weight_kg)))
## [1] "Error in EyeColor + Weight_kg : non-numeric argument to binary operator\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in EyeColor + Weight_kg: non-numeric argument to binary operator>

Take home messages:

  1. data.frame may change your information (convert character into factor), which is useful for some purposes (categorical variables, models), but not for others (DNA sequences).
  2. computations that are syntactically correct are not necessarily meaningful
  3. never ever ignore errors or warnings, but drill down into where they come from
  4. Records in a table (rows in a data.frame) often refer to objects, variables (columns) often to properties of these objects.

Exercises

  1. how can the data.frame command above be modified such that the character variable is not modified into a factor?
  2. how can R be configured such that this is always the case?
  3. in a long script, warnings are printed at the end. How can you make them print where they happen, or convert them into errors? (hint: ?options)
  4. how is a data.frame related to a list?
  5. try to understand the following commands
  6. matrix and data.frame both represent two-dimensional organisations of data; why don't we use matrix for everything?

Temporal data in R

How do we represent time?

People communicate time by using words, which can be written down as character information. Phrases like Sunday, 16 Aug 2015 and 18:36 are widely understood (although language dependent) to indicate date and time. Combined, we sometime see

if(Sys.info()["sysname"] == "Linux"){
  system("date > date.file") # works on unix systems!
  readLines("date.file")     # read & print back in R
}
## [1] "Tue Aug 25 12:34:56 CEST 2015"

There are actually a lot of different ways in which we can represent time information, and still understand it. For computers, this is less easy:

data(wind, package = "gstat")
head(wind[,1:7])
##   year month day   RPT   VAL   ROS   KIL
## 1   61     1   1 15.04 14.96 13.17  9.29
## 2   61     1   2 14.71 16.88 10.83  6.50
## 3   61     1   3 18.50 16.88 12.33 10.13
## 4   61     1   4 10.58  6.63 11.75  4.58
## 5   61     1   5 13.33 13.25 11.42  6.17
## 6   61     1   6 13.21  8.12  9.96  6.67
data(Produc, package = "plm")
head(Produc[,1:6])
##     state year     pcap     hwy   water    util
## 1 ALABAMA 1970 15032.67 7325.80 1655.68 6051.20
## 2 ALABAMA 1971 15501.94 7525.94 1721.02 6254.98
## 3 ALABAMA 1972 15972.41 7765.42 1764.75 6442.23
## 4 ALABAMA 1973 16406.26 7907.66 1742.41 6756.19
## 5 ALABAMA 1974 16762.67 8025.52 1734.85 7002.29
## 6 ALABAMA 1975 17316.26 8158.23 1752.27 7405.76
library(foreign)
read.dbf(system.file("shapes/sids.dbf", package="maptools"))[1:5,c(5,9:14)]
##          NAME BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79
## 1        Ashe  1091     1      10  1364     0      19
## 2   Alleghany   487     0      10   542     3      12
## 3       Surry  3188     5     208  3616     6     260
## 4   Currituck   508     1     123   830     2     145
## 5 Northampton  1421     9    1066  1606     3    1197

ISO 8601

ISO 8601 - Data elements and interchange formats - Information interchange - Representation of dates and times tries to create some order in this mess. It uses e.g. 2015-08-07 for Aug 7 2015, and 2015-08-07T18:30:27+00:00 to specify a given time in a particular time zone on that date.

ISO 8601 implicitly defines time stamps to refer to time intervals: 2015-08-07 refers to the full day of Aug 7, i.e. from 2015-08-07 00:00 up to but not including 2015-08-08 00:00. The time stamp 2015-08-07 00:00 has a duration of one second. 2015-08 refers to a full month.

Who knows what time it is?

Coordinated Universal Time UTC “It is, within about 1 second, mean solar time at 0 degree longitude; it does not observe daylight saving time.'' (formerly: GMT). It keeps track of the Earth's rotation and International Atomic Time (TAI), by introducing leap seconds now and then. Real-time approximations to UTC are broadcast wireless by GPS and radio time signals; computers usually use the network time protocol to synchronize clocks.

Does R know all this?

Kind of - for instance

as.Date(.leap.seconds)
##  [1] "1972-07-01" "1973-01-01" "1974-01-01" "1975-01-01" "1976-01-01"
##  [6] "1977-01-01" "1978-01-01" "1979-01-01" "1980-01-01" "1981-07-01"
## [11] "1982-07-01" "1983-07-01" "1985-07-01" "1988-01-01" "1990-01-01"
## [16] "1991-01-01" "1992-07-01" "1993-07-01" "1994-07-01" "1996-01-01"
## [21] "1997-07-01" "1999-01-01" "2006-01-01" "2009-01-01" "2012-07-01"
## [26] "2015-07-01"

To find out what time it is, R of course relies on the system clock. The R source code is updated every time a new leap second is announced. It is based on the code that operating systems use to handle date/time.

Representing time in R

Although one could argue that time is represented by a real number (\(\mathbb{R}\)), for the sequence

c(55, 60, 65, 70)
## [1] 55 60 65 70

it is not clear to which points in time we refer to: are these years since 1900, or seconds since the start of Today? To fix this, one needs to set an offset (when is zero?) and a unit (how long takes a unit increas?).

R has two built-in formats: for date it has Date, for time POSIXt:

(d = Sys.Date())
## [1] "2015-08-25"
class(d)
## [1] "Date"
print(as.numeric(d), digits = 15)
## [1] 16672
(t = Sys.time())
## [1] "2015-08-25 12:34:56 CEST"
class(t)
## [1] "POSIXct" "POSIXt"
print(as.numeric(t), digits = 15)
## [1] 1440498896.6631

Date is stored as the (integer) number of days since 1970-01-01 (in the local time zone), time is stored as the (fractional) number of seconds since 1970-01-01 00:00 UTC. We can see that time is printed in the current time zone.

R also understands'' time differences, e.g.

d - (d+1)
## Time difference of -1 days
d - (d+1.1)
## Time difference of -1.1 days
t - (t+24*3600)
## Time difference of -1 days
t - t+24*3600
## Time difference of 86400 secs

Exercise

  1. why do the last two expressions print a different result?

POSIXct uses one double to representation a time instance, POSIXlt represents time as a list (each time stamp one list), and is convenient to isolate time components:

t
## [1] "2015-08-25 12:34:56 CEST"
(lt = as.POSIXlt(t))
## [1] "2015-08-25 12:34:56 CEST"
unlist(lt)
##               sec               min              hour              mday 
## "56.663104057312"              "34"              "12"              "25" 
##               mon              year              wday              yday 
##               "7"             "115"               "2"             "236" 
##             isdst              zone            gmtoff 
##               "1"            "CEST"            "7200"
sapply(lt, class)
##         sec         min        hour        mday         mon        year 
##   "numeric"   "integer"   "integer"   "integer"   "integer"   "integer" 
##        wday        yday       isdst        zone      gmtoff 
##   "integer"   "integer"   "integer" "character"   "integer"
lt$sec
## [1] 56.6631

Exercise

  1. explain all the fields of a POSIXlt object

Time zones in R

Time zones make it easy to understand whether it is day or night without asking where you are, and take care of daylight saving time. There is a time zone database and a wikipedia entry, and R uses this either directly or through the GNU C library.

Sys.setenv(TZ="CET")
as.POSIXct("2015-03-28") - as.POSIXct("2015-03-27") # regular
## Time difference of 1 days
as.POSIXct("2015-03-30") - as.POSIXct("2015-03-29") # 23 hrs: DST starts
## Time difference of 23 hours
as.POSIXct("2015-10-26") - as.POSIXct("2015-10-25") # 25 hours: DST ends
## Time difference of 1.041667 days

whereas in UTC,

Sys.setenv(TZ="UTC")
as.POSIXct("2015-03-30") - as.POSIXct("2015-03-29")
## Time difference of 1 days
as.POSIXct("2015-10-26") - as.POSIXct("2015-10-25")
## Time difference of 1 days
Sys.setenv(TZ="CET")

having irregular intervals (day lengths) makes it complicated to compute daily means from e.g. hourly values; working with UTC may solve this problem. Climate scientists often model climate for years with 360 days, to get rid of the complexity of handling unequal month lengths and leap years. Aligning such data with POSIX time would be fun!

Reading time data into R

Date and time are understood from character strings (or factor) when entered like this:

as.Date("2015-08-17")
## [1] "2015-08-17"
as.POSIXct("2015-08-17")
## [1] "2015-08-17 CEST"
as.POSIXct("2015-08-17 12:25")
## [1] "2015-08-17 12:25:00 CEST"

but not like this

as.POSIXct("2015-08-07T18:30:27Z")
## [1] "2015-08-07 CEST"

where everything after the T is ignored, without warning!

Arbitrary time formats can be specified using strptime, e.g.

strptime("2015-08-07T18:30:27Z", "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
## [1] "2015-08-07 18:30:27 UTC"

The (implicit) interval information of ISO 8601 time expressions is lost when read as POSIXt objects, as

identical(as.POSIXct("2015-08-17"), as.POSIXct("2015-08-17 00:00"))
## [1] TRUE

To convert numerical values into dates or time, the offset needs to be specified:

print(try(as.Date(365)))
## [1] "1971-01-01"
as.Date(365, origin = "1970-01-01")
## [1] "1971-01-01"
as.POSIXct(365 * 24 * 3600, origin = "1970-01-01")
## [1] "1971-01-01 01:00:00 CET"

although package zoo takes a somewhat different take on this:

library(zoo)
as.Date(365)
## [1] "1971-01-01"

Packages with tools to deal with time

The CRAN Time Series Analysis Task View describes all packages dealing with time, time series data, and its analysis. Several packages provide other ways of representing time; package lubridate for instance provides explicit representation of time intervals.

Packages for analyzing time series data

The time series task view is complete here, but I will mention two important packages: zoo and xts.

Zoo is for ordered observations, where the ordering does not have to be time:

library(zoo)
zoo(rnorm(10), 1:10)
##           1           2           3           4           5           6 
##  0.27072595 -2.25979086 -1.11897038  0.35686920 -0.25076671 -0.12688160 
##           7           8           9          10 
## -0.05619057  1.49861427 -1.47440469  0.63996567

it provides a lot of convenient functions, including aggregate, na.fill, and e.g. moving average filters. See for instance ?aggregate.zoo for many examples aggregating time series to daily, monthly, quarterly or yearly values.

Package xts builds on top of zoo (xts is a subclass of zoo), and requires that data are ordered by time. It allows for different time classes (e.g. Date and POSIXct) but works with POSIXct under the hood. xts adds very convenient ISO 8601 style time interval selection:

library(xts)
x = xts(1:365, as.Date("2015-01-01")+0:364)
nrow(x)
## [1] 365
nrow(x["2015-05"]) # all days in May
## [1] 31
nrow(x["2015-05/06"]) # all days in May and June
## [1] 61

Spatial data in R

How far from home am I?

Geonames is a gazetteer (geographical dictionary or directory) that has a collection of place names, place types and geographical coordinates in WGS84, along with several other things.

You can register yourself at geonames.

options(geonamesUsername = "myUserName") # register, then use your own name
library(geonames)
pts = rbind(
 LA = GNsearch(name = "Lancaster University", adminCode1 = "ENG"),
 MS = GNsearch(name = "Münster")[1,]
)[c("lng", "lat")]
pts
##         lng      lat
## LA -2.78679  54.0082
## MS  7.62571 51.96236

It turns out that pts is still filled with character information,

class(pts$lng)
## [1] "character"

so we need to transform it into numeric:

pts = sapply(pts, as.numeric) # this drops rownames, so:
rownames(pts) = c("LA", "MS") # Lancaster, Muenster
pts
##         lng      lat
## LA -2.78679 54.00820
## MS  7.62571 51.96236

R has a function called dist, which computes (euclidean) distances in \(\mathbb{R}^n\):

dist(pts)
##          LA
## MS 10.61158

which returns 10.6, but 10.6 what? Degrees? Nautical miles? Oranges? No, plane nonsense, and dist is only useful for geographical coordinates (long/lat) in small areas close to the equator: our data are not in planar space, so we need to be more clever.

Package sp provides classes for points, lines, polygons and grids which register their coordinate reference system (CRS), and chooses appropriate distance functions:

library(sp)
pts = SpatialPoints(pts, CRS("+proj=longlat +datum=WGS84"))
spDists(pts) # km, great circle distance over the WGS84 ellipsoid
##          [,1]     [,2]
## [1,]   0.0000 734.5819
## [2,] 734.5819   0.0000

package geosphere provides several alternative spherical/ellipsoidal distance measures:

library(geosphere)
distHaversine(pts[1],pts[2])
## [1] 733229.8
# distGeo(pts[1],pts[2]) # different from spDist: # commented out - bug in geosphere?
spDists(pts)[2,1]*1000 - distHaversine(pts[1],pts[2])  # difference in m
## [1] 1352.072

Subsetting sp objects

as we have seen, a SpatialPoints object can be subsetted by its index, but also by its row name; pts[1] and pts[1,] yield the same: we think of these geometry-only objects as having only records:

pts[1] # selects record 1, but looks ambiguous
## SpatialPoints:
##         lng     lat
## LA -2.78679 54.0082
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
pts[1,] # looks less ambiguous
## SpatialPoints:
##         lng     lat
## LA -2.78679 54.0082
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
pts["MS",]
## SpatialPoints:
##        lng      lat
## MS 7.62571 51.96236
## Coordinate Reference System (CRS) arguments: +proj=longlat
## +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

If we add attributes to pts, e.g. by

pts$pop = c(299.708, 45.952) # according to Wikipedia
pts$area = c(302.9, 19.2) # I asked google
pts
##           coordinates     pop  area
## 1 (-2.78679, 54.0082) 299.708 302.9
## 2 (7.62571, 51.96236)  45.952  19.2

we see that the class of pts now has changed from SpatialPoints to SpatialPointsDataFrame, the reason being that in addition to geometry (locations), the object now has attributes (in a data.frame slot). We can subset or manipulate these attributes in the same fashion as we can with plane data.frames:

pts[1] # now selects a variable!
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts[,1] # does the same, less ambiguous
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts["pop"] # does the same, by name
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts[,"pop"] # does the same
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
## 2 (7.62571, 51.96236)  45.952
pts[["pop"]] # extract vector
## [1] 299.708  45.952
pts$pop      # synonymous
## [1] 299.708  45.952
pts[1,"pop"] # select 1 geometry, 1 variable
##           coordinates     pop
## 1 (-2.78679, 54.0082) 299.708
pts[1,]$pop  # select 1 geometry, extract variale
## [1] 299.708
(pts$popDensity = pts$pop / pts$area) # create & add new variable
## [1] 0.9894619 2.3933333

We can even select using the spatial predicate intersect, as in

library(spacetime)
data(air) # loads the boundary of Germany, called DE_NUTS1
proj4string(DE_NUTS1) = proj4string(pts) # semantically identical, but syntactically different
## Warning in ReplProj4string(obj, CRS(value)): A new CRS was assigned to an object with an existing CRS:
## +init=epsg:4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
## without reprojecting.
## For reprojection, use function spTransform in package rgdal
pts[DE_NUTS1,] # selects the point(s) in pts intersecting with polygon DE_NUTS1
##           coordinates    pop area popDensity
## 2 (7.62571, 51.96236) 45.952 19.2   2.393333
plot(DE_NUTS1, axes = TRUE)
points(pts[DE_NUTS1,], col = 'red', pch = 16)

plot of chunk unnamed-chunk-39

Note that

  1. Coordinate reference systems (CRS) need to be defined in order to decide which distance metrics are meaningful, and whether objects can be meaningfully combined by comparing coordinates (e.g., pts and DE_NUTS1)
  2. sp objects register coordinate reference systems, and warn/err in case of mismatch

Exercises

  1. Create a login for geonames, and retrieve the coordinates of your home address, or else use some other gazetteer to retrieve those
  2. Compute how far away from home you currently are, using different distance metrics.

Spatiotemporal data in R

The SpatioTempofal Task View describes the packages that represent and/or analyze spatiotemporal data available on CRAN.

How are spatiotemporal data organized in package spacetime?

Essentially, spacetime distinguishes between three types of spatiotemporal data:

  1. regular data where for fixed locations (or regions) observations are collected for the same time instances or intervals (e.g. socio-economic data or data from fixed sensors)
  2. irregular data for which observations are arbitrarily distributed in space-time
  3. trajectories, data for moving objects.

Support for trajectories in package spacetime is rudimentary, and is more developed in package trajectories (see below)

Irregular spacetime data are stored in STI objects, or STIDF if they have attributes (magnitude of an earth quake, cause of a disease):

showClass("STI")
## Class "STI" [package "spacetime"]
## 
## Slots:
##                               
## Name:       sp    time endTime
## Class: Spatial     xts POSIXct
## 
## Extends: "ST"
## 
## Known Subclasses: "STIDF"

the sp and time slots need to have the same number of records, and are simply matched by order to identify the location and time of a particular event. A data slot of STIDF has the same number of records and is matched accordingly.

Regular spacetime data have recurrent observations for fixed spatial entities. The STF, space-time-full, data structure represents this by

showClass("STF")
## Class "STF" [package "spacetime"]
## 
## Slots:
##                               
## Name:       sp    time endTime
## Class: Spatial     xts POSIXct
## 
## Extends: "ST"
## 
## Known Subclasses: "STFDF"

Although the class structure seems identical, sp and time may have different number of records, and the assumption here is that every geometry record has a time series of data of length nrow(time). The data slot of STFDF has length(sp) * nrow(time) records, space cycling fastest.

A third type, STF (space-time-sparse), defined as

showClass("STS")
## Class "STS" [package "spacetime"]
## 
## Slots:
##                                       
## Name:    index      sp    time endTime
## Class:  matrix Spatial     xts POSIXct
## 
## Extends: "ST"
## 
## Known Subclasses: "STSDF"

mimics STF but does not store all cell (space x time) values; it keeps an index vector to those cells that are filled, in order to be efficient for very sparse ST regular layouts.

Note that

  1. the sp slot can contain any type of spatial data (points, lines, polygons, grid)
  2. time does not need to be regular, only the sequence of time points (or intervals) is identical for each space feature
  3. endTime can be used to define intervals; by default STI objects have zero interval width (indicating time instance), STF have interval length equal to the time to next observation (consecutive intervals).

Quite often, regular or irregular data are aggregated to new spatial and/or temporal units. Package spacetime tries hard to accomodate all possibilities.

Analysis of regular spatiotemporal data (air quality sensor data)

Package spacetime contains a dataset called air, which contains daily PM\(_{10}\) values for rural air quality stations over Germany, from 1998-2009.

library(sp)
library(spacetime)
data(air)
# rural = STFDF(stations, dates, data.frame(PM10 = as.vector(air))) # stations not there!
dim(rural)
## Error in eval(expr, envir, enclos): object 'rural' not found
# stbox(rural) # commented for now: Error: object 'stbox' not found

We now aggregate the 2008 daily values to NUTS-1 region-average daily values by

x = as(rural[,"2008"], "xts")
## Error in .class1(object): object 'rural' not found
apply(x, 1, mean, na.rm=TRUE)[1:5]
## 2015-01-01 2015-01-02 2015-01-03 2015-01-04 2015-01-05 
##          1          2          3          4          5
dim(rural[,"2008"])
## Error in eval(expr, envir, enclos): object 'rural' not found
x = aggregate(rural[,"2008"], DE_NUTS1, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008"], DE_NUTS1, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
dim(x)
## [1] 365   1
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'

We can aggregate to the complete region, by which the object becomes a xts object (unless we would specify simplify=FALSE in the call to aggregate):

x = aggregate(rural[,"2008"], DE_NUTS1, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008"], DE_NUTS1, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
class(x)
## [1] "xts" "zoo"
plot(x[,"PM10"])
## Error in plot(x[, "PM10"]): error in evaluating the argument 'x' in selecting a method for function 'plot': Error in `[.xts`(x, , "PM10") : subscript out of bounds
## Calls: [ -> [.xts

Some stations contain only missing values for certain time periods; we can de-select those, and aggregate to monthly means:

x = as(rural[,"2008"], "xts")
## Error in .class1(object): object 'rural' not found
apply(x, 2, mean, na.rm=TRUE)[1:5]
## [1] 183  NA  NA  NA  NA
sel = which(!apply(as(rural[,"2008"], "xts"), 2, function(x) all(is.na(x))))
## Error in .class1(object): object 'rural' not found
x = aggregate(rural[sel, "2008"], "month", mean, na.rm=TRUE)
## Error in aggregate(rural[sel, "2008"], "month", mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'

Next, we use zoo::as.yearqtr to aggregate to quarterly values:

library(zoo)
x = aggregate(rural[sel,"2005::2011"], as.yearqtr, median, na.rm=TRUE)
## Error in aggregate(rural[sel, "2005::2011"], as.yearqtr, median, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found
stplot(x, mode = "tp", par.strip.text = list(cex=.6))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'stplot' for signature '"xts"'

Finally, we compute a yearly mean values for 2008 and 2009, for the whole country:

DE.years = STF(DE_NUTS1, as.Date(c("2008-01-01", "2009-01-01")))
x_subset <- aggregate(rural[,"2008::2009"], DE.years, mean, na.rm=TRUE)
## Error in aggregate(rural[, "2008::2009"], DE.years, mean, na.rm = TRUE): error in evaluating the argument 'x' in selecting a method for function 'aggregate': Error: object 'rural' not found

We can do some space-time geostatistics on these data, e.g. by

rr = rural[,"2005::2010"]
## Error in eval(expr, envir, enclos): object 'rural' not found
unsel = which(apply(as(rr, "xts"), 2, function(x) all(is.na(x))))
## Error in .class1(object): object 'rr' not found
r5to10 = rr[-unsel,] # remove series that are empty for this time period
## Error in eval(expr, envir, enclos): object 'rr' not found
summary(r5to10)
## Error in summary(r5to10): error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'r5to10' not found
dim(r5to10)
## Error in eval(expr, envir, enclos): object 'r5to10' not found
rn = row.names(r5to10@sp)[4:7]
## Error in row.names(r5to10@sp): object 'r5to10' not found
rn
## Error in eval(expr, envir, enclos): object 'rn' not found

To keep computation time in bounds, we select 100 random time instances, rbind the Spatial objects, and compute a pooled (pure spatial) variogram:

rs = sample(dim(r5to10)[2], 100)
## Error in sample(dim(r5to10)[2], 100): object 'r5to10' not found
lst = lapply(rs, function(i) { x = r5to10[,i]; x$ti = i; x} )
## Error in lapply(rs, function(i) {: object 'rs' not found
pts = do.call(rbind, lst)
## Error in do.call(rbind, lst): object 'lst' not found
library(gstat)
v = variogram(PM10~ti, pts[!is.na(pts$PM10),], dX=0)
## Error in eval(expr, envir, enclos): object 'PM10' not found
vmod = fit.variogram(v, vgm(1, "Exp", 200, 1))
## Error in inherits(object, "gstatVariogram"): object 'v' not found
plot(v, vmod)
## Error in plot(v, vmod): error in evaluating the argument 'x' in selecting a method for function 'plot': Error: object 'v' not found
vmod
## Error in eval(expr, envir, enclos): object 'vmod' not found

The full, pre-computed space-time variogram is obtained by

rr = rural[,"2005::2010"]
unsel = which(apply(as(rr, "xts"), 2, function(x) all(is.na(x))))
r5to10 = rr[-unsel,]
vv = variogram(PM10~1, r5to10, width=20, cutoff = 200, tlags=0:5)

but we will load pre-computed values from package gstat:

data(vv, package = "gstat")
vv <- vv[c("np", "dist", "gamma", "id", "timelag", "spacelag")]

The following two graphs first show a variogram map in space (x) and time (y), and then a set of spatial variograms where color denotes the time lag:

print(plot(vv), split = c(1,1,1,2), more = TRUE)
print(plot(vv, map = FALSE), split = c(1,2,1,2))

plot of chunk unnamed-chunk-53

We see that lag(spatial lag = 0, time lag = 0) is always missing, which is typical for regular data without replicates.

We can fit a metric variogram model to these data by

metricVgm <- vgmST("metric",
                   joint=vgm(50,"Exp",100,0),
                   stAni=50)
metricVgm <- fit.StVariogram(vv, metricVgm)
attr(metricVgm, "optim")$value
## [1] 60.60854
plot(vv, metricVgm)

plot of chunk unnamed-chunk-54

or a separable model by

# commented - get error "Error in switch(model$stModel,...: EXPR must be a length 1 vector ...)"
# sepVgm <- vgmST("separable",
#                 space=vgm(0.9,"Exp", 123, 0.1),
#                 time =vgm(0.9,"Exp", 2.9, 0.1),
#                 sill=100)
# sepVgm <- fit.StVariogram(vv, sepVgm, method = "L-BFGS-B",
#                           lower = c(10,0,0.01,0,1),
#                           upper = c(500,1,20,1,200))
# attr(sepVgm, "optim")$value
# plot(vv, list(sepVgm, metricVgm))

These models can then be used for spatiotemporal kriging; the interested reader is refered to vignettes of package gstat, and examples and demo scripts using the function krigeST found there.

Analysis of events (ST point pattern data)

we'll analyse the events from the foot-and-mouth (fmd) disease data that come with the stpp package:

library(stpp)
data("fmd")
data("northcumbria")
head(fmd)
##           X      Y ReportedDay
## [1,] 350850 557590          28
## [2,] 337320 569500          28
## [3,] 341730 572090          28
## [4,] 350410 527040          28
## [5,] 347800 541850          29
## [6,] 356900 541120          29
head(northcumbria)
##             x        y
## [1,] 346618.7 583056.7
## [2,] 340011.7 575941.6
## [3,] 338487.0 573908.6
## [4,] 332896.5 574162.8
## [5,] 332896.5 573400.4
## [6,] 333658.8 571621.6

To make the northcumbria polygon more useful, we'll convert it into a SpatialPolygons object by

nc = rbind(northcumbria, northcumbria[1,]) # close polygon
library(sp)
nc = SpatialPolygons(list(Polygons(list(Polygon(nc)), "NC")))

Next, we'll creat a SpatialPoints object for the locations

pts = SpatialPoints(fmd[,1:2])     # CRS unknown!
plot(nc, axes = TRUE)
plot(pts, cex = 0.5, add = TRUE, col = "#88888888")

plot of chunk unnamed-chunk-58

We can do some simple cell counts by creating a grid first

grd = SpatialPixels(SpatialPoints(makegrid(nc, n = 100)))
plot(grd)
plot(nc, add = TRUE)

plot of chunk unnamed-chunk-59

and then assigning a dummy (1) attribute, and aggregating this using sum:

pts$id = rep(1, length(pts))
image(aggregate(pts, grd, sum))
plot(nc, add = TRUE)
points(pts, cex = .3, col = "#88888888")
title("cell counts")

plot of chunk unnamed-chunk-60

and get an idea of the temporal development by computing average event time per cell

day = as.Date("2001-01-01") + fmd[,3] # also unknown!
pts$day = day
image(aggregate(pts["day"], grd, mean))
plot(nc, add = TRUE)
points(pts, cex = .3, col = "#88888888")
title("cell mean event time")

plot of chunk unnamed-chunk-61

We can create an irregular space-time object (STI) from this, without attributes (DF), by

library(spacetime)
sti = STI(pts, day)
plot(sti) # space-time layout

plot of chunk unnamed-chunk-62

This plot shows the space-time (time=x, space=y) layout of the individual events, where points are simply numbered. Note that STI objects are always time ordered.

We can create a plot that shows attributes if we add an attribute; here we add a simple index, normalized to \([0,1]\), to indicate temporal distribution.

stidf = STIDF(pts, day, data.frame(one = (0:(length(pts)-1))/(length(pts)-1)))
stplot(stidf, sp.layout = nc, key.space = "right", cex = .5,
    xlim = bbox(nc)[1,], ylim = bbox(nc)[2,],
    main = "colour indicates quintile")

plot of chunk unnamed-chunk-63

A variety of the number of observations per grid cell per time interval is obtained by aggregating stidf by a regular space-time grid; we create this grid by

grd = SpatialPixels(SpatialPoints(makegrid(nc, n = 25)))
n = 6 # number of time classes
tcuts = seq(min(index(sti)), max(index(sti)), length.out=n)
grd = STF(grd, tcuts[1:(n-1)])
plot(grd)

plot of chunk unnamed-chunk-64

STF creates end times of the time intervals by default by

Sys.setenv(TZ = "UTC")
delta(tcuts[1:(n-1)])
## [1] "2001-03-04 UTC" "2001-04-07 UTC" "2001-05-11 UTC" "2001-06-14 UTC"
## [5] "2001-07-18 UTC"

The number of events per space-time blocks is then computed by aggregate, and plotted with stplot:

# commented due to error flagged: "Error in over(ax(x, "STI") ..."
# a = aggregate(stidf, grd, sum)
# summary(a)
# stplot(a, xlim = bbox(nc)[1,], ylim = bbox(nc)[2,], 
#   col.regions=gray((45:1)/51), sp.layout = list(nc, first=FALSE),
#   main = "number of observations")

Moving objects

A third type of spatiotemporal data concerns trajectories, describing the paths of objects moving through space. Data consists of sequences of fixes, time-stamped locations measurements. Methods for data analysis have been developed by very different communities, ranging from the transportation and logistics sector to animal ecology. Problems addressed include

  1. modelling movement from sample data
  2. home range estimation
  3. predicting behaviour of individuals, or arrival times
  4. studying group behaviour and interactions

Package trajectories provides classes and some methods for handling and analysing such data, building upon the work in packages sp and spacetime.