Spatial correlation and prediction

\( \newcommand{\E}{{\rm E}} % E expectation operator \newcommand{\Var}{{\rm Var}} % Var variance operator \newcommand{\Cov}{{\rm Cov}} % Cov covariance operator \newcommand{\Cor}{{\rm Corr}} \)

What is spatial correlation?

Idea from time series: look at lagged correlations, and the \(h\)-scatterplot.

What is it? Plots of (or correlation between) \(Z(s)\) and \(Z(s+h)\), where \(s+h\) is \(s\), shifted by \(h\) (time distance, spatial distance).

Random variables: expectation, variance, covariance

Random variable: \(Z\) follows a probability distribution, specified by a density function \(f(z)= \Pr(Z=z)\) or a distribution function \(F(z)=\Pr(Z \le z)\)

Expectation: \(\E(Z) = \int_{-\infty}^{\infty} f(s)ds\) – center of mass, mean.

Variance: \(\Var(Z)=\E(Z-\E(Z))^2\) – mean squared distance from mean; measure of spread; square root: standard deviation of \(Z\).

Covariance: \(\Cov(X,Y)=\E((X-\E(X))(Y-\E(Y)))\) – mean product; can be negative; \(\Cov(X,X)=\Var(X)\).

Correlation: \(r_{XY}=\frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\) – normalized \([-1,1]\) covariance. -1 or +1: perfect correlation.

Normal distribution

library(mvtnorm)
r = 0.0
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.337 -0.655 -0.019  0.687  3.232 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.9027     0.2313   21.19   <2e-16 ***
## x             0.0206     0.0450    0.46     0.65    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.992 on 498 degrees of freedom
## Multiple R-squared:  0.00042,    Adjusted R-squared:  -0.00159 
## F-statistic: 0.209 on 1 and 498 DF,  p-value: 0.647
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

r = .5
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0262 -0.5605  0.0136  0.5552  2.5215 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.4964     0.1894    13.2   <2e-16 ***
## x             0.4999     0.0378    13.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.852 on 498 degrees of freedom
## Multiple R-squared:  0.26,   Adjusted R-squared:  0.259 
## F-statistic:  175 on 1 and 498 DF,  p-value: <2e-16
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

r = .9
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2730 -0.2809 -0.0206  0.2946  1.1040 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.4001     0.1057    3.79  0.00017 ***
## x             0.9199     0.0205   44.88  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.429 on 498 degrees of freedom
## Multiple R-squared:  0.802,  Adjusted R-squared:  0.801 
## F-statistic: 2.01e+03 on 1 and 498 DF,  p-value: <2e-16
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

r = .95
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
y = x + 0.1 * rnorm(500)
plot(x, y)
model = lm(y~x)
summary(model)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30446 -0.07428  0.00012  0.07436  0.27446 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.04980    0.02331   -2.14    0.033 *  
## x            1.00954    0.00456  221.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1 on 498 degrees of freedom
## Multiple R-squared:  0.99,   Adjusted R-squared:  0.99 
## F-statistic: 4.91e+04 on 1 and 498 DF,  p-value: <2e-16
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

How can correlation help prediction?

Problem:

Questions

Given observation \(z(s_1)\), how to predict \(z(s_0)\)?

Obviously, given only \(z(s_1)\), the best predictor for \(z(s_0)\) is \(\hat{z}(s_0)=z(s_1)\).

But what is the error variance, i.e. \(\mbox{Var}(\hat{z}(s_0)-z(s_0))\)?

Estimation error

Let both \(z(s_1)\) and \(z(s_0)\) come from a field that has variance 1, i.e. \(\mbox{Var}(z(s_0)) = \mbox{Var}(z(s_1))=1\), and that has a constant mean: \(\mbox{E}(z(s_0)) = \mbox{E}(z(s_1))=m\)

Then, \[\mbox{Var}(\hat{z}(s_0)-z(s_0)) = \mbox{Var}(z(s_1)-z(s_0))\]

As both have the same mean, this can be written as \[\mbox{E}(\hat{z}(s_0)-z(s_0))^2 = \mbox{Var}(z(s_1)) + \mbox{Var}(z(s_0)) - 2\mbox{Cov}(z(s_1),z(s_0))\]

As both have variance 1, this equals \(2(1-r)\) with \(r\) the correlation between \(z(s_0)\) and \(z(s_1)\). Examples follow.

Suppose we know the mean

If we know the mean \(\mu\), it may be a good idea to use a compromise between the observation and the mean, e.g. \[\hat{z}(s_0) = (1-r) \mu + r z(s_1)\]

Next problems…

What is Geostatistical Interpolation?

Geostatistical interpolation (kriging) uses linear predictors \[\hat{z}(s_0) = \sum_{i=1}^n \lambda_i z(s_i)\] with weights chosen such that

All that is needed is variances and correlations.

Random variables

Random variables (RVs) are numeric variables whose outcomes are subject to chance.

The cumulative distribution of probability \(F_x(\cdot)\) over outcomes \(z\) over all possible values of the RV \(Z\) is the probability distribution function: \[P(Z \le z) = F_Z(z) = \int_{-\infty}^z f_Z(u)du\] where \(f_Z(\cdot)\) is the probability density function of \(Z\). The sum of all probability is 1.

Random variables have an expectation (mean): \(E(Z) = \int_{-\infty}^{\infty} u f_Z(u) du\) and a variance: \(\Var(Z) = E[(Z-E(Z))^2]\).

Try to think of \(E(Z)\) as \(\frac{1}{n}\sum_{i=1}^{n} z_i\), with \(i \rightarrow \infty\).

Two random variables \(X\) and \(Y\) have covariance defined as \(\Cov(X,Y) = E[(X-E(X))(Y-E(Y))]\)

Correlation and covariance

Correlation is scaled covariance, scaled by the variances. For two variables \(X\) and \(Y\), it is \[\Cor(X,Y) = \frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\]

It is quite easy to show that \(|\Cov(X,Y)| \le \sqrt{\Var(X)\Var(Y)}\), so correlation ranges from -1 to 1. For this, note that \(\Cov(X,X)=\Var(X)\). and \(\Cov(X,-X)=-\Var(X)\).

It is perhaps easier to think of covariance as unscaled correlation.

Note: A large covariance does not imply a strong correlation

The quadratic form

We will not consider single random variables, but rather large collections of them. In fact, we will consider each observation \(z(s_i)\) as a realisation (outcome) of a random variable \(Z(s_i)\), and consider the \(Z\) variable at all other locations also as separate random variables, say \(Z(s_0)\) for any \(s_0\) in the domain of interest.

Let \(Z = [Z(s_1)\ Z(s_2)\ …\ Z(s_n)]‘\) then \(\Var(Z)=V\) is the covariance matrix of vector \(Z\), with \(i,j\)-th element \(\Cov(Z(s_i),Z(s_j))\), implying it has variances on the diagonal.

Then, it is easy to show that for non-random weights \(\lambda = [\lambda_1 … \lambda_n]’\) the quadratic form \(\lambda'Z = \sum_{i=1}^n \lambda_i Z(s_i)\) has variance \[ \Var(\lambda'Z) = \lambda' \Var(Z) \lambda = \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j \Cov(Z(s_i),Z(s_j)) = \lambda'V\lambda\]

Why do we need this?

When we predict (interpolate), we're forming linear combinations, \(\sum_{i=1}^n \lambda_i Z(s_i)\), and want to know the variance of \(\sum_{i=1}^n \lambda_i Z(s_i) - Z(s_0)\), the interpolation error variance. Only then can we find weights such that it is minimum.

What is the scalar \(\Var(\sum_{i=1}^n \lambda_i Z(s_i)-Z(s_0))\)? Write as

\[\Var(\lambda'Z - Z(s_0)) = \Var(\lambda'Z) + \Var(Z(s_0)) - 2\Cov(\lambda'Z,Z(s_0))\] \[=\lambda'V\lambda + \sigma_0^2 + \sum_{i=1}^n \lambda_i \Cov(Z(s_i),Z(s_0)) \] with \(\sigma_0^2 = \Var(Z(s_0))\)

So, we need variances of all \(Z(s_i)\), including for all \(s_0\), and all covariances between pairs \(Z(s_i)\) and \(Z(s_j)\), including all \(s_0\).

Suppose we know all that

Kriging: find weights \(\lambda\) such that \(\Var(Z(s_0)-\hat{Z}(s_0))= \Var(Z(s_0)-\sum_{i=1}^n\lambda_i Z(s_i))\) is minimized, and we have the best (minimum variance) linear predictor.

Best linear prediction weights: Let \(V=\Var(Z)\ \ (n\times n)\) and \(v=\Cov(Z(s_0),Z)\ \ (n\times 1)\), and scalar \(\Var(Z(s_0)) = \sigma^2_0\).

Expected squared prediction error \(\E(Z(s_0)-\hat{Z}(s_0))^2 = \sigma^2(s_0)\)

Replace \(Z\) with \(Z-\mu\) (or assume \(\mu=0\))

\[\sigma^2(s_0) = \E(Z(s_0)-\lambda ‘ Z)^2 = \E(Z(s_0))^2 - 2 \lambda ’\E(Z(s_0) Z)+\lambda'\E(Z Z')\lambda \]

\[ = \Var(Z(s_0)) - 2 \lambda'\Cov(Z(s_0),Z) + \lambda'\Var(Z)\lambda = \sigma^2_0 - 2 \lambda'v + \lambda'V\lambda \]

Choose \(\lambda\) such that \(\frac{\delta \sigma^2(s_0)}{\delta\lambda} = -2 v' + 2\lambda'V = 0\)

\(\lambda' = v' V^{-1}\)

BLP/Simple kriging:

  1. \(\hat{Z}(s_0) = \mu + v'V^{-1} (Z-\mu)\)
  2. \(\sigma^2(s_0) = \sigma^2_0 - v'V^{-1}v\)
library(sp)
cov = function(h) exp(-h)

sk = function(data, newdata, mu, cov) {
    library(sp) # spDists
    V = cov(spDists(data))
    v = cov(spDists(data, newdata))
    mu + t(v) %*% solve(V, data[[1]] - mu)
}

# prediction location at (0,1):
newdata = SpatialPoints(cbind(0,1))

# observation location at (1,1), with attribute value (y) 3:
pts = SpatialPoints(cbind(1,1))
data = SpatialPointsDataFrame(pts, data.frame(z = 3))

sk(data, newdata, 0, cov) # mu = 0
##       [,1]
## [1,] 1.104
newdata = SpatialPoints(cbind(.1 * 0:20, 1))
sk(data, newdata, 0, cov) # mu = 0
##        [,1]
##  [1,] 1.104
##  [2,] 1.220
##  [3,] 1.348
##  [4,] 1.490
##  [5,] 1.646
##  [6,] 1.820
##  [7,] 2.011
##  [8,] 2.222
##  [9,] 2.456
## [10,] 2.715
## [11,] 3.000
## [12,] 2.715
## [13,] 2.456
## [14,] 2.222
## [15,] 2.011
## [16,] 1.820
## [17,] 1.646
## [18,] 1.490
## [19,] 1.348
## [20,] 1.220
## [21,] 1.104

Plotting them:

newdata = SpatialPoints(cbind(seq(-4,6,by=.1), 1))
Z = sk(data, newdata, 0, cov) # mu = 0
plot(coordinates(newdata)[,1], Z, type='l', ylim = c(0,3))
points(as(data, "data.frame")[,c(2,1)], col='red', pch=16)
abline(0,0, col='blue', lty=2)

plot of chunk unnamed-chunk-3