\( \newcommand{\E}{{\rm E}} % E expectation operator \newcommand{\Var}{{\rm Var}} % Var variance operator \newcommand{\Cov}{{\rm Cov}} % Cov covariance operator \newcommand{\Cor}{{\rm Corr}} \)
Idea from time series: look at lagged correlations, and the \(h\)-scatterplot.
What is it? Plots of (or correlation between) \(Z(s)\) and \(Z(s+h)\), where \(s+h\) is \(s\), shifted by \(h\) (time distance, spatial distance).
Random variable: \(Z\) follows a probability distribution, specified by a density function \(f(z)= \Pr(Z=z)\) or a distribution function \(F(z)=\Pr(Z \le z)\)
Expectation: \(\E(Z) = \int_{-\infty}^{\infty} f(s)ds\) – center of mass, mean.
Variance: \(\Var(Z)=\E(Z-\E(Z))^2\) – mean squared distance from mean; measure of spread; square root: standard deviation of \(Z\).
Covariance: \(\Cov(X,Y)=\E((X-\E(X))(Y-\E(Y)))\) – mean product; can be negative; \(\Cov(X,X)=\Var(X)\).
Correlation: \(r_{XY}=\frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\) – normalized \([-1,1]\) covariance. -1 or +1: perfect correlation.
library(mvtnorm)
r = 0.0
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.337 -0.655 -0.019 0.687 3.232
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.9027 0.2313 21.19 <2e-16 ***
## x 0.0206 0.0450 0.46 0.65
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.992 on 498 degrees of freedom
## Multiple R-squared: 0.00042, Adjusted R-squared: -0.00159
## F-statistic: 0.209 on 1 and 498 DF, p-value: 0.647
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))
r = .5
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0262 -0.5605 0.0136 0.5552 2.5215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4964 0.1894 13.2 <2e-16 ***
## x 0.4999 0.0378 13.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.852 on 498 degrees of freedom
## Multiple R-squared: 0.26, Adjusted R-squared: 0.259
## F-statistic: 175 on 1 and 498 DF, p-value: <2e-16
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))
r = .9
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
plot(x, y)
model = lm(y~x)
summary(model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2730 -0.2809 -0.0206 0.2946 1.1040
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4001 0.1057 3.79 0.00017 ***
## x 0.9199 0.0205 44.88 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.429 on 498 degrees of freedom
## Multiple R-squared: 0.802, Adjusted R-squared: 0.801
## F-statistic: 2.01e+03 on 1 and 498 DF, p-value: <2e-16
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))
r = .95
Sigma = cbind(c(1, r), c(r, 1))
out = rmvnorm(500, c(5,5), Sigma)
x = out[,1]
y = out[,2]
y = x + 0.1 * rnorm(500)
plot(x, y)
model = lm(y~x)
summary(model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30446 -0.07428 0.00012 0.07436 0.27446
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.04980 0.02331 -2.14 0.033 *
## x 1.00954 0.00456 221.57 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1 on 498 degrees of freedom
## Multiple R-squared: 0.99, Adjusted R-squared: 0.99
## F-statistic: 4.91e+04 on 1 and 498 DF, p-value: <2e-16
lines(abline(model))
mse = round(mean((x-y)^2), digits=3)
title(paste("correlation:", round(cor(x,y),digits=3), "MSE", mse))
Problem:
Given observation \(z(s_1)\), how to predict \(z(s_0)\)?
Obviously, given only \(z(s_1)\), the best predictor for \(z(s_0)\) is \(\hat{z}(s_0)=z(s_1)\).
But what is the error variance, i.e. \(\mbox{Var}(\hat{z}(s_0)-z(s_0))\)?
Let both \(z(s_1)\) and \(z(s_0)\) come from a field that has variance 1, i.e. \(\mbox{Var}(z(s_0)) = \mbox{Var}(z(s_1))=1\), and that has a constant mean: \(\mbox{E}(z(s_0)) = \mbox{E}(z(s_1))=m\)
Then, \[\mbox{Var}(\hat{z}(s_0)-z(s_0)) = \mbox{Var}(z(s_1)-z(s_0))\]
As both have the same mean, this can be written as \[\mbox{E}(\hat{z}(s_0)-z(s_0))^2 = \mbox{Var}(z(s_1)) + \mbox{Var}(z(s_0)) - 2\mbox{Cov}(z(s_1),z(s_0))\]
As both have variance 1, this equals \(2(1-r)\) with \(r\) the correlation between \(z(s_0)\) and \(z(s_1)\). Examples follow.
If we know the mean \(\mu\), it may be a good idea to use a compromise between the observation and the mean, e.g. \[\hat{z}(s_0) = (1-r) \mu + r z(s_1)\]
Geostatistical interpolation (kriging) uses linear predictors \[\hat{z}(s_0) = \sum_{i=1}^n \lambda_i z(s_i)\] with weights chosen such that
All that is needed is variances and correlations.
Random variables (RVs) are numeric variables whose outcomes are subject to chance.
The cumulative distribution of probability \(F_x(\cdot)\) over outcomes \(z\) over all possible values of the RV \(Z\) is the probability distribution function: \[P(Z \le z) = F_Z(z) = \int_{-\infty}^z f_Z(u)du\] where \(f_Z(\cdot)\) is the probability density function of \(Z\). The sum of all probability is 1.
Random variables have an expectation (mean): \(E(Z) = \int_{-\infty}^{\infty} u f_Z(u) du\) and a variance: \(\Var(Z) = E[(Z-E(Z))^2]\).
Try to think of \(E(Z)\) as \(\frac{1}{n}\sum_{i=1}^{n} z_i\), with \(i \rightarrow \infty\).
Two random variables \(X\) and \(Y\) have covariance defined as \(\Cov(X,Y) = E[(X-E(X))(Y-E(Y))]\)
Correlation is scaled covariance, scaled by the variances. For two variables \(X\) and \(Y\), it is \[\Cor(X,Y) = \frac{\Cov(X,Y)}{\sqrt{\Var(X)\Var(Y)}}\]
It is quite easy to show that \(|\Cov(X,Y)| \le \sqrt{\Var(X)\Var(Y)}\), so correlation ranges from -1 to 1. For this, note that \(\Cov(X,X)=\Var(X)\). and \(\Cov(X,-X)=-\Var(X)\).
It is perhaps easier to think of covariance as unscaled correlation.
Note: A large covariance does not imply a strong correlation
We will not consider single random variables, but rather large collections of them. In fact, we will consider each observation \(z(s_i)\) as a realisation (outcome) of a random variable \(Z(s_i)\), and consider the \(Z\) variable at all other locations also as separate random variables, say \(Z(s_0)\) for any \(s_0\) in the domain of interest.
Let \(Z = [Z(s_1)\ Z(s_2)\ …\ Z(s_n)]‘\) then \(\Var(Z)=V\) is the covariance matrix of vector \(Z\), with \(i,j\)-th element \(\Cov(Z(s_i),Z(s_j))\), implying it has variances on the diagonal.
Then, it is easy to show that for non-random weights \(\lambda = [\lambda_1 … \lambda_n]’\) the quadratic form \(\lambda'Z = \sum_{i=1}^n \lambda_i Z(s_i)\) has variance \[ \Var(\lambda'Z) = \lambda' \Var(Z) \lambda = \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j \Cov(Z(s_i),Z(s_j)) = \lambda'V\lambda\]
When we predict (interpolate), we're forming linear combinations, \(\sum_{i=1}^n \lambda_i Z(s_i)\), and want to know the variance of \(\sum_{i=1}^n \lambda_i Z(s_i) - Z(s_0)\), the interpolation error variance. Only then can we find weights such that it is minimum.
What is the scalar \(\Var(\sum_{i=1}^n \lambda_i Z(s_i)-Z(s_0))\)? Write as
\[\Var(\lambda'Z - Z(s_0)) = \Var(\lambda'Z) + \Var(Z(s_0)) - 2\Cov(\lambda'Z,Z(s_0))\] \[=\lambda'V\lambda + \sigma_0^2 + \sum_{i=1}^n \lambda_i \Cov(Z(s_i),Z(s_0)) \] with \(\sigma_0^2 = \Var(Z(s_0))\)
So, we need variances of all \(Z(s_i)\), including for all \(s_0\), and all covariances between pairs \(Z(s_i)\) and \(Z(s_j)\), including all \(s_0\).
Kriging: find weights \(\lambda\) such that \(\Var(Z(s_0)-\hat{Z}(s_0))= \Var(Z(s_0)-\sum_{i=1}^n\lambda_i Z(s_i))\) is minimized, and we have the best (minimum variance) linear predictor.
Best linear prediction weights: Let \(V=\Var(Z)\ \ (n\times n)\) and \(v=\Cov(Z(s_0),Z)\ \ (n\times 1)\), and scalar \(\Var(Z(s_0)) = \sigma^2_0\).
Expected squared prediction error \(\E(Z(s_0)-\hat{Z}(s_0))^2 = \sigma^2(s_0)\)
Replace \(Z\) with \(Z-\mu\) (or assume \(\mu=0\))
\[\sigma^2(s_0) = \E(Z(s_0)-\lambda ‘ Z)^2 = \E(Z(s_0))^2 - 2 \lambda ’\E(Z(s_0) Z)+\lambda'\E(Z Z')\lambda \]
\[ = \Var(Z(s_0)) - 2 \lambda'\Cov(Z(s_0),Z) + \lambda'\Var(Z)\lambda = \sigma^2_0 - 2 \lambda'v + \lambda'V\lambda \]
Choose \(\lambda\) such that \(\frac{\delta \sigma^2(s_0)}{\delta\lambda} = -2 v' + 2\lambda'V = 0\)
\(\lambda' = v' V^{-1}\)
BLP/Simple kriging:
library(sp)
cov = function(h) exp(-h)
sk = function(data, newdata, mu, cov) {
library(sp) # spDists
V = cov(spDists(data))
v = cov(spDists(data, newdata))
mu + t(v) %*% solve(V, data[[1]] - mu)
}
# prediction location at (0,1):
newdata = SpatialPoints(cbind(0,1))
# observation location at (1,1), with attribute value (y) 3:
pts = SpatialPoints(cbind(1,1))
data = SpatialPointsDataFrame(pts, data.frame(z = 3))
sk(data, newdata, 0, cov) # mu = 0
## [,1]
## [1,] 1.104
newdata = SpatialPoints(cbind(.1 * 0:20, 1))
sk(data, newdata, 0, cov) # mu = 0
## [,1]
## [1,] 1.104
## [2,] 1.220
## [3,] 1.348
## [4,] 1.490
## [5,] 1.646
## [6,] 1.820
## [7,] 2.011
## [8,] 2.222
## [9,] 2.456
## [10,] 2.715
## [11,] 3.000
## [12,] 2.715
## [13,] 2.456
## [14,] 2.222
## [15,] 2.011
## [16,] 1.820
## [17,] 1.646
## [18,] 1.490
## [19,] 1.348
## [20,] 1.220
## [21,] 1.104
Plotting them:
newdata = SpatialPoints(cbind(seq(-4,6,by=.1), 1))
Z = sk(data, newdata, 0, cov) # mu = 0
plot(coordinates(newdata)[,1], Z, type='l', ylim = c(0,3))
points(as(data, "data.frame")[,c(2,1)], col='red', pch=16)
abline(0,0, col='blue', lty=2)