Spatial correlation and prediction

\( \newcommand{\E}{{\rm E}} % E expectation operator \newcommand{\Var}{{\rm Var}} % Var variance operator \newcommand{\Cov}{{\rm Cov}} % Cov covariance operator \newcommand{\Cor}{{\rm Corr}} \)

What is spatial correlation?

Idea from time series: look at lagged correlations, and the \(h\)-scatterplot.

What is it? Plots of (or correlation between) \(Z(s)\) and \(Z(s+h)\), where \(s+h\) is \(s\), shifted by \(h\) (time distance, spatial distance).

Random variables: expectation, variance, covariance

Random variable: \(Z\) follows a probability distribution, specified by a density function \(f(z)= \Pr(Z=z)\) or a distribution function \(F(z)=\Pr(Z \le z)\)

Expectation: \(\E(Z) = \int_{-\infty}^{\infty} f(s)ds\) – center of mass, mean.

Variance: \(\Var(Z)=\E(Z-\E(Z))^2\) – mean squared distance from mean; measure of spread; square root: standard deviation of \(Z\).

Covariance: \(\Cov(X,Y)=\E((X-\E(X))(Y-\E(Y)))\) – mean product; can be negative; \(\Cov(X,X)=\Var(X)\).

Correlation: \(r_{XY}=\frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\) – normalized \([-1,1]\) covariance. -1 or +1: perfect correlation.

Normal distribution

univariate: If \(Z\) follows a normal distribution, its probability distribution is completely characterized by its mean \(E(Z)=\mu\) and variance \(\Var(Z)=\sigma^2\)
multivariate: If the vector \(Z=(Z_1,Z_2,…,Z_p)\) follows a multivariate normal distribution, its marginal distributions are univariate normal, and its joint probability distribution is completely characterized by the mean vector \(E(Z)=\mu=(\mu_1,…\mu_p)\) and covariance matrix \(V\), of which element \((i,j)\) equals \(\Cov(Z_i,Z_j)\) \item covariance matrices have variances on the diagonal

library(mvtnorm)
r = 0.0
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.337 -0.655 -0.019  0.687  3.232 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.9027     0.2313   21.19   <2e-16 ***
## x             0.0206     0.0450    0.46     0.65    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.992 on 498 degrees of freedom
## Multiple R-squared:  0.00042,    Adjusted R-squared:  -0.00159 
## F-statistic: 0.209 on 1 and 498 DF,  p-value: 0.647

lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

r = .5
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0262 -0.5605  0.0136  0.5552  2.5215 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.4964     0.1894    13.2   <2e-16 ***
## x             0.4999     0.0378    13.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.852 on 498 degrees of freedom
## Multiple R-squared:  0.26,   Adjusted R-squared:  0.259 
## F-statistic:  175 on 1 and 498 DF,  p-value: <2e-16

lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

r = .9
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2730 -0.2809 -0.0206  0.2946  1.1040 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.4001     0.1057    3.79  0.00017 ***
## x             0.9199     0.0205   44.88  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.429 on 498 degrees of freedom
## Multiple R-squared:  0.802,  Adjusted R-squared:  0.801 
## F-statistic: 2.01e+03 on 1 and 498 DF,  p-value: <2e-16

lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

r = .95
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
y = x + 0.1 * rnorm(500)
plot(x, y)
model = lm(y~x)
summary(model)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30446 -0.07428  0.00012  0.07436  0.27446 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.04980    0.02331   -2.14    0.033 *  
## x            1.00954    0.00456  221.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1 on 498 degrees of freedom
## Multiple R-squared:  0.99,   Adjusted R-squared:  0.99 
## F-statistic: 4.91e+04 on 1 and 498 DF,  p-value: <2e-16

lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))

plot of chunk unnamed-chunk-1

How can correlation help prediction?

Problem:

Questions

Given observation \(z(s_1)\), how to predict \(z(s_0)\)?

What is the best predicted value at \(s_0\), \(\hat{z}(s_0)\)?
How can we compute a measure of error for \(\hat{z}(s_0)-z(s_0)\)?
Can we compute e.g.~95\% prediction intervals for the unknown \(z(s_0)\)?

Obviously, given only \(z(s_1)\), the best predictor for \(z(s_0)\) is \(\hat{z}(s_0)=z(s_1)\).

But what is the error variance, i.e. \(\mbox{Var}(\hat{z}(s_0)-z(s_0))\)?

Estimation error

Let both \(z(s_1)\) and \(z(s_0)\) come from a field that has variance 1, i.e. \(\mbox{Var}(z(s_0)) = \mbox{Var}(z(s_1))=1\), and that has a constant mean: \(\mbox{E}(z(s_0)) = \mbox{E}(z(s_1))=m\)

Then, \[\mbox{Var}(\hat{z}(s_0)-z(s_0)) = \mbox{Var}(z(s_1)-z(s_0))\]

As both have the same mean, this can be written as \[\mbox{E}(\hat{z}(s_0)-z(s_0))^2 = \mbox{Var}(z(s_1)) + \mbox{Var}(z(s_0)) - 2\mbox{Cov}(z(s_1),z(s_0))\]

As both have variance 1, this equals \(2(1-r)\) with \(r\) the correlation between \(z(s_0)\) and \(z(s_1)\). Examples follow.

Suppose we know the mean

If we know the mean \(\mu\), it may be a good idea to use a compromise between the observation and the mean, e.g. \[\hat{z}(s_0) = (1-r) \mu + r z(s_1)\]

Next problems…

What is Geostatistical Interpolation?

Geostatistical interpolation (kriging) uses linear predictors \[\hat{z}(s_0) = \sum_{i=1}^n \lambda_i z(s_i)\] with weights chosen such that

the interpolated values is unbiased: \(\mbox{E}(\hat{z}(s_0)-z(s_0))=0\) and
has mininum variance: \(\mbox{Var}(\hat{z}(s_0)-z(s_0))\) is at minimum.

All that is needed is variances and correlations.

Random variables

Random variables (RVs) are numeric variables whose outcomes are subject to chance.

The cumulative distribution of probability \(F_x(\cdot)\) over outcomes \(z\) over all possible values of the RV \(Z\) is the probability distribution function: \[P(Z \le z) = F_Z(z) = \int_{-\infty}^z f_Z(u)du\] where \(f_Z(\cdot)\) is the probability density function of \(Z\). The sum of all probability is 1.

Random variables have an expectation (mean): \(E(Z) = \int_{-\infty}^{\infty} u f_Z(u) du\) and a variance: \(\Var(Z) = E[(Z-E(Z))^2]\).

Try to think of \(E(Z)\) as \(\frac{1}{n}\sum_{i=1}^{n} z_i\), with \(i \rightarrow \infty\).

Two random variables \(X\) and \(Y\) have covariance defined as \(\Cov(X,Y) = E[(X-E(X))(Y-E(Y))]\)

Correlation and covariance

Correlation is scaled covariance, scaled by the variances. For two variables \(X\) and \(Y\), it is \[\Cor(X,Y) = \frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\]

It is quite easy to show that \(|\Cov(X,Y)| \le \sqrt{\Var(X)\Var(Y)}\), so correlation ranges from -1 to 1. For this, note that \(\Cov(X,X)=\Var(X)\). and \(\Cov(X,-X)=-\Var(X)\).

It is perhaps easier to think of covariance as unscaled correlation.

Note: A large covariance does not imply a strong correlation

The quadratic form

We will not consider single random variables, but rather large collections of them. In fact, we will consider each observation \(z(s_i)\) as a realisation (outcome) of a random variable \(Z(s_i)\), and consider the \(Z\) variable at all other locations also as separate random variables, say \(Z(s_0)\) for any \(s_0\) in the domain of interest.

Let \(Z = [Z(s_1)\ Z(s_2)\ …\ Z(s_n)]‘\) then \(\Var(Z)=V\) is the covariance matrix of vector \(Z\), with \(i,j\)-th element \(\Cov(Z(s_i),Z(s_j))\), implying it has variances on the diagonal.

Then, it is easy to show that for non-random weights \(\lambda = [\lambda_1 … \lambda_n]’\) the quadratic form \(\lambda'Z = \sum_{i=1}^n \lambda_i Z(s_i)\) has variance \[ \Var(\lambda'Z) = \lambda' \Var(Z) \lambda = \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j \Cov(Z(s_i),Z(s_j)) = \lambda'V\lambda\]

Why do we need this?

When we predict (interpolate), we're forming linear combinations, \(\sum_{i=1}^n \lambda_i Z(s_i)\), and want to know the variance of \(\sum_{i=1}^n \lambda_i Z(s_i) - Z(s_0)\), the interpolation error variance. Only then can we find weights such that it is minimum.

What is the scalar \(\Var(\sum_{i=1}^n \lambda_i Z(s_i)-Z(s_0))\)? Write as

\[\Var(\lambda'Z - Z(s_0)) = \Var(\lambda'Z) + \Var(Z(s_0)) - 2\Cov(\lambda'Z,Z(s_0))\] \[=\lambda'V\lambda + \sigma_0^2 + \sum_{i=1}^n \lambda_i \Cov(Z(s_i),Z(s_0)) \] with \(\sigma_0^2 = \Var(Z(s_0))\)

So, we need variances of all \(Z(s_i)\), including for all \(s_0\), and all covariances between pairs \(Z(s_i)\) and \(Z(s_j)\), including all \(s_0\).

Suppose we know all that

Kriging: find weights \(\lambda\) such that \(\Var(Z(s_0)-\hat{Z}(s_0))= \Var(Z(s_0)-\sum_{i=1}^n\lambda_i Z(s_i))\) is minimized, and we have the best (minimum variance) linear predictor.

Best linear prediction weights: Let \(V=\Var(Z)\ \ (n\times n)\) and \(v=\Cov(Z(s_0),Z)\ \ (n\times 1)\), and scalar \(\Var(Z(s_0)) = \sigma^2_0\).

Expected squared prediction error \(\E(Z(s_0)-\hat{Z}(s_0))^2 = \sigma^2(s_0)\)

Replace \(Z\) with \(Z-\mu\) (or assume \(\mu=0\))

\[\sigma^2(s_0) = \E(Z(s_0)-\lambda ‘ Z)^2 = \E(Z(s_0))^2 - 2 \lambda ’\E(Z(s_0) Z)+\lambda'\E(Z Z')\lambda \]

\[ = \Var(Z(s_0)) - 2 \lambda'\Cov(Z(s_0),Z) + \lambda'\Var(Z)\lambda = \sigma^2_0 - 2 \lambda'v + \lambda'V\lambda \]

Choose \(\lambda\) such that \(\frac{\delta \sigma^2(s_0)}{\delta\lambda} = -2 v' + 2\lambda'V = 0\)

\(\lambda' = v' V^{-1}\)

BLP/Simple kriging:

\(\hat{Z}(s_0) = \mu + v'V^{-1} (Z-\mu)\)
\(\sigma^2(s_0) = \sigma^2_0 - v'V^{-1}v\)

library(sp)
cov = function(h) exp(-h)

sk = function(data, newdata, mu, cov) {
    library(sp) # spDists
    V = cov(spDists(data))
    v = cov(spDists(data, newdata))
    mu + t(v) %*% solve(V, data[[1]] - mu)
}

# prediction location at (0,1):
newdata = SpatialPoints(cbind(0,1))

# observation location at (1,1), with attribute value (y) 3:
pts = SpatialPoints(cbind(1,1))
data = SpatialPointsDataFrame(pts, data.frame(z = 3))

sk(data, newdata, 0, cov) # mu = 0

##       [,1]
## [1,] 1.104

newdata = SpatialPoints(cbind(.1 * 0:20, 1))
sk(data, newdata, 0, cov) # mu = 0

##        [,1]
##  [1,] 1.104
##  [2,] 1.220
##  [3,] 1.348
##  [4,] 1.490
##  [5,] 1.646
##  [6,] 1.820
##  [7,] 2.011
##  [8,] 2.222
##  [9,] 2.456
## [10,] 2.715
## [11,] 3.000
## [12,] 2.715
## [13,] 2.456
## [14,] 2.222
## [15,] 2.011
## [16,] 1.820
## [17,] 1.646
## [18,] 1.490
## [19,] 1.348
## [20,] 1.220
## [21,] 1.104

Plotting them:

newdata = SpatialPoints(cbind(seq(-4,6,by=.1), 1))
Z = sk(data, newdata, 0, cov) # mu = 0
plot(coordinates(newdata)[,1], Z, type='l', ylim = c(0,3))
points(as(data, "data.frame")[,c(2,1)], col='red', pch=16)
abline(0,0, col='blue', lty=2)

plot of chunk unnamed-chunk-3