Spatial-Statistics-Course/14-Point-Pattern-Analysis-IV.Rmd at master · paezha/Spatial-Statistics-Course · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
---
title: "14 Point Pattern Analysis IV"
output: html_notebook
---

# Point Pattern Analysis IV

*NOTE*: You can download the source files for this book from [here](https://github.com/paezha/Spatial-Statistics-Course). The source files are in the format of R Notebooks. Notebooks are pretty neat, because the allow you execute code within the notebook, so that you can work interactively with the notes.

In the last practice/session your learning objectives included:

1. Learning about clustered and dispersed (or regular) patterns.
2. Learning the concept of nearest neighbors.
3. Learning about distance-based methods for point pattern analysis.
4. Learning about the $G$-function for the analysis of event-to-event nearest neighbor distances.

If you wish to work interactively with this chapter you will need the following:

* An R markdown notebook version of this document (the source file).

* A package called `geog4ga3`.

## Learning Objectives

In this chapter, you will:

1. Learn about the $F$- or empty space function.
2. Consider the issue of patterns at multiple scales.
3. Learn about the $K$-function.
4. Apply both of these techniques using a simple example.

## Suggested Readings

- Bailey TC and Gatrell AC [-@Bailey1995] Interactive Spatial Data Analysis, Chapter 3. Longman: Essex.
- Baddeley A, Rubak E, Turner R [-@Baddeley2015] Spatial Point Pattern: Methodology and Applications with R, Chapters 7 - 8. CRC: Boca Raton.
- Bivand RS, Pebesma E, Gomez-Rubio V [-@Bivand2008] Applied Spatial Data Analysis with R, Chapter 7. Springer: New York.
- Brunsdon C and Comber L [-@Brunsdon2015R] An Introduction to R for Spatial Analysis and Mapping, Chapter 6, 6.1 - 6.6. Sage: Los Angeles.
- O'Sullivan D and Unwin D [-@Osullivan2010] Geographic Information Analysis, 2nd Edition, Chapter 5. John Wiley & Sons: New Jersey.

## Preliminaries

As usual, it is good practice to clear the working space to make sure that you do not have extraneous items there when you begin your work. The command in `R` to clear the workspace is `rm` (for "remove"), followed by a list of items to be removed. To clear the workspace from _all_ objects, do the following:
```{r}
rm(list = ls())
```

Note that `ls()` lists all objects currently on the workspace.

Load the libraries you will use in this activity:
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(spatstat)
library(geog4ga3)
```

Load the datasets that you will use for this practice:
```{r}
data("pp1_df")
data("pp2_df")
data("pp3_df")
data("pp4_df")
data("pp5_df")
```

These five dataframes include the coordinates of events set in the space of a unit square. To convert these dataframes into `ppp` objects we first define a window:
```{r}
# We use "owin" to define a window of coordinates which is in the five dataframes.
W <- owin(c(0, 1), c(0, 1))
```

And then use the function `as.ppp` to convert into `ppp`:
```{r}
# `as.ppp()` is a function that we use to convert dataframes into ppp objects
pp1.ppp <- as.ppp(pp1_df, W = W)
pp2.ppp <- as.ppp(pp2_df, W = W)
pp3.ppp <- as.ppp(pp3_df, W = W)
pp4.ppp <- as.ppp(pp4_df, W = W)
pp5.ppp <- as.ppp(pp5_df, W = W)
```

## Motivation

Distance-based approaches like the $\hat{G}$-function provide a useful complement to density-based approached. They can be implemented in more ways than we have seen so far.

In this practice, you will learn about two more tools for conducting distance-based analysis, the $\hat{F}$-function and the $\hat{K}$-function.

## F-function

The $\hat{G}$-function was defined as the cumulative distribution of the distances from _events_ to their nearest neighboring _event_. The $\hat{F}$-function is based on the same premise, but instead of using event-to-event distances, it uses point-to-event distances.

Recall that a point is an arbitrary location on a map that is not necessarily the location of an event. It may well be (and typically is) _empty space_. For this reason, the $\hat{F}$-function is sometimes called the _empty space function_: when there is more empty space in a region, the distance from a point to the nearest neighboring event is typically longer.

More formally, this function is defined as follows, with $d_{ik}$ as the distance from the _point_ at $i$ (not necessarily an event!) to its nearest neighboring event at location $k$:
$$
\hat{F}(x)=\frac{(d_{ik}\le x, \forall i)}{n}
$$

Again, we use the hat notation to indicate that the function is estimated from the data.

The theoretical distribution of this function is known (based on a null landscape generated by a spatially random Poisson process: remember that a Poisson process is a type of random process that consists of points randomly located on a landscape). It is as follows:
$$
F_{pois}(x) = 1 - exp(-\lambda \pi x^2).
$$

Notice that the distribution is in fact identical to that for $G$. This makes sense: if the distribution of events is spatially random, the distribution of empty space in the region must be random as well!

The interpretation of $\hat{F}(x)$ is the opposite of $\hat{G}(x)$: when the empirical $\hat{F}(x)$ is greater than the theoretical function, this suggests that _empty spaces are closer to events_ than expected, compared to the null landscape, as in a dispersed pattern. On the contrary, when the empirical function is less than the theoretical function, this would suggest a clustered pattern, since the events tend to be far away from the points used to calculate the function.

The $\hat{F}$-function can be implemented in at least two ways: (1) by using a fine grid to measure the distance to events; or (2) by measuring the distance to events from randomly drawn coordinates. The implementation in `spatstat` is the first one, which results in a pixel-based image of empty space.

We can illustrate this function with the point pattern `pp1.ppp`. First, we verify that `pp1.ppp` is already a `ppp` object:
```{r}
class(pp1.ppp)
```

Begin by plotting the pattern:
```{r}
plot(pp1.ppp)
```

An empty space map is obtained by means of the `distmap()` function:
```{r}
# The "distmap()" function computes the distance map of point pattern X and returns the distance map as a pixel image

empty_space_map1 <- distmap(pp1.ppp)
```

The plot of this is:
```{r}
plot(empty_space_map1)
```

Similar to the Stienen diagrams that you used previously, this map shows the distance from any location on the map to the nearest event: the smaller the value, the closer the point is to an event. It is evident in this pixel image that the values are mostly smaller, illustrating that points are closer to events.

Compare the map above to `pp2.ppp`:
```{r}
empty_space_map2 <- distmap(pp2.ppp)
plot(empty_space_map2)
```

In the second point pattern, there is more open space in the region. This is also apparent from the symbols map:
```{r}
plot(pp2.ppp)
```

The $\hat{F}$-function is implemented in `spatstat` as `Fest()` (for F-estimated), and it requires a `ppp` object as an input. Another possible input is whether a correction is to be used. This refers to boundary corrections. Since we have not yet discussed them, select "none":
```{r}
# The "Fest()" function computes an estimate of the empty space function, and it also called the "point to nearest event" distribution. This function estimates the nearest neighbors of a point (in this example, for pp1)
f_pattern1 <- Fest(pp1.ppp, correction = "none")
```

This function can be plotted as follows:
```{r}
plot(f_pattern1)
```

The black line is the empirical function, and we see that it is in general very similar to the theoretical function that corresponds to a null landscape. Compare to the second pattern:
```{r}
f_pattern2 <- Fest(pp2.ppp, correction = "none")
plot(f_pattern2)
lines(x = c(0, 0.097), y = c(0.4, 0.4), col = "blue", lty = "dotted")
lines(x = c(0.045, 0.045), y = c(0.0, 0.4), col = "blue", lty = "dotted")
lines(x = c(0.097, 0.097), y = c(0.0, 0.4), col = "blue", lty = "dotted")
```

In the empirical (black) pattern, points on a grid tend to be more distant from events than what you would expect from the null landscape. For example, whereas under the theoretical function 40% of points have a nearest event that is at a distance of approximately 0.045 or less, under the empirical function, the events are generally more distant from the points, and for the same value of F (0.4 or 40%) the distance is closer to 0.1. See:
```{r}
# Repeat the plot of the F-function of `pp2.ppp` and use the function `lines()` to add lines to compare the distances for a given value of F, say 0.4 (or 40%)
plot(f_pattern2)
lines(x = c(0, 0.097), y = c(0.4, 0.4), col = "blue", lty = "dotted")
lines(x = c(0.045, 0.045), y = c(0.0, 0.4), col = "blue", lty = "dotted")
lines(x = c(0.097, 0.097), y = c(0.0, 0.4), col = "blue", lty = "dotted")
```

This suggests that the points are clustered. Try plotting the $\hat{G}$-functions for the patterns in this example, and compare.

## $\hat{K}$-function

A limitation of the two techniques that we have seen so far is that they deal with a single scale: the distance to the first nearest neighbor (or, more generally, to the $k$-th nearest neighbor; these functions can be used for the 2nd, 3rd, and so on nearest neighbor!). Their single scale nature means that these functions can easily miss patterns when they are only evident at different scales.

Consider for instance the following point pattern:
```{r}
plot(pp3.ppp)
```

The events above initially appear to be clustered. However, at a different scale, a second pattern becomes evident. In fact, what we observe is a _regular_ distribution of _clusters_. At a smaller scale, a single cluster may actually be a random distribution of events. In contrast, the following pattern appears to be a random distribution of regularly spaced events:
```{r}
plot(pp4.ppp)
```

Whereas the last point pattern is of clusters of dispersed events that are themselves regularly spaced:
```{r}
plot(pp5.ppp)
```

Both $\hat{G}(x)$ or $\hat{F}(x)$ when applied to any of these patterns will strongly hint at clustering at the scale of the first nearest neighbor. Regrettably, they fail to detect patterns that might exist at other scales. For instance:
```{r}
f_pattern3 <- Fest(pp3.ppp, correction = "none")
plot(f_pattern3)
```

```{r}
g_pattern3 <- Gest(pp3.ppp, correction = "none")
plot(g_pattern3)
```

A different technique, called the $\hat{K}$-function, is designed to detect patterns at multiple scales [see @Ripley1976; and @Haase1995spatial]. The intuition behind the function is as follows.

Imagine that you visit every on of the events in the point patter in sequence. Each time you visit an event you do the following: first, you create a circle with radius "x" centered on the event, and then you count the number of events that are within the circle. Then you increase "x" by some distance, and repeat the process. Once that you have created the last circle (which will be suitably large to capture patterns at that scale), you move and visit the next event in the pattern and repeat the exact same process. These counts of events at distances "x" are aggregated and normalized by the estimated intensity of the point pattern.

More formally, this is (with $A$ as the area of the region):
$$
\hat{K}(x)=\frac{1}{\hat{\lambda}A}\sum_{i}\sum_{j\neq i}(d_{ij}\le x).
$$

As before, the theoretical values for this function are known for the case of a null landscape generated by a Poisson process:
$$
K_{pois}(x)=\pi x^2.
$$
When the empirical function is greater than the theoretical function, this would suggest that events are typically surrounded by _more_ events at that distance than what the null landscape would have. This is interpreted as evidence of clustering.

In contrast, when the empirical function is less than the theoretical one, this would suggest that events are typically surrounded by _fewer_ events at that distance than what would be expected from a null landscape. This is interpreted as dispersion.

The $\hat{K}$-function is implemented in the package `spatstat` as `Kest()`.

To see how this function works, plot `pp3.ppp` once more:
```{r}
plot(pp3.ppp)
```

Next, use `Kest()` to calculate and plot the $\hat{K}$-function:
```{r}
# `Kest()` function estimates nearest neighbors of a point on multiple scales, identifying more than just the distance to the first nearest neighbor. Here, we are applying the K-function to `pp3.ppp`. As before, ignore the correction; we will discuss this later
k_pattern3 <- Kest(pp3.ppp, correction = "none")
plot(k_pattern3)
```

As seen from the plot, the function is suggestive of clustering at smaller scales, but regularity at a larger scale.

Try this now with the last pattern:
```{r}
plot(pp5.ppp)
```

If you calculate and plot the $\hat{K}$-function:
```{r}
k_pattern5 <- Kest(pp5.ppp, correction = "none")
plot(k_pattern5)
```

You will see that the plot correctly suggests dispersion at the very small scale, followed by clustering at an intermediate scale. There are indeed clusters of nine events surrounded by empty space, before other clusters of regular events are detected at the largest scale, following a regular pattern.

Of the distance-based techniques that you have seen so far, $\hat{G}(x)$ and $\hat{F}(x)$ are often used as complements. The $\hat{K}(x)$ is useful when exploring multi-scale patterns.

This concludes the chapter, and our coverage of distance-based techniques.