andreas/README.Rmd at main · BigelowLab/andreas · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "andreas"
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Andreas (Andrew) Copernicus was the lesser known brother of [Nicolaus Copernicus](https://en.wikipedia.org/wiki/Nicolaus_Copernicus).  This package serves as suite of functions and scripts to access and manage data from the [Copernicus Marine Data Store](https://data.marine.copernicus.eu/products).  This package leverages the tools provided by the [`copernicus` R package](https://github.com/BigelowLab/copernicus).

This document is divided into two parts: "Fetching Remote Data" and "Working with Local Archives".  You don't need to fetch data is you already have a local archive.

# Fetching Remote Data

You only need this part if you are creating or maintaining a local archive of data. See this [wiki](https://github.com/BigelowLab/andreas/wiki/Fetching-a-new-data) page for an example of building a local dataset.

# Working with Local Archives

For most needs, you are interested in working with a previously downloaded dataset.

## The local archive

The local archive is a collection of one or more products (and their datasets) for a particular region of the world.  For example, we have an archive for the Northwest Atlantic that extends from Cape Hatteras to Flemish Cap ("chfc").  The archive is a directory called "chfc" within which there is one subdirectory for each product.  Within each product directory there may be series of subdirectories organized by year (*e.g.* "2022") and then by month-day (*e.g.* "0521").  Within the month-day subdirectories there will be one or more GeoTIFF files (one file for each variable).
At the product level you will find a text-based file called "database".  It is a comma delimited file that (in theory) keeps track of the files that have been archived.  We leverage this database to quickly find a subset of the entire database.


### Using the database

The database is very light and easy to filter for just the records you might need. Note that depth is a character data type; this provides you with flexibility to define depth as 'surface' or '50-75' or something like that.

Let's walk through reading the database, filtering it for a subset, reading the files and finally displaying.

```{r database, message = FALSE}
suppressPackageStartupMessages({
  library(andreas)
  library(copernicus)
  library(dplyr)
  library(stars)
  library(twinkle)
})

path = copernicus::copernicus_path("chfc", "GLOBAL_ANALYSISFORECAST_PHY_001_024")
db <- andreas::read_database(path) |>
  dplyr::glimpse()
```

It may look complicated, but this permits the storage of multiple datasets per product, and is a very flexible system.

```{r count}
db |> dplyr::count(id, depth, variable)
```

#### Filtering the database
Next we can filter the database to select just a few variables and dates.

```{r filter}
db = db |>
  dplyr::filter(variable %in% c("uo", "vo"),
                dplyr::between(date, as.Date("2024-07-01"), as.Date("2024-07-04")))
db
```

#### Reading rasters

Now we can read in the files.

```{r read_files}
s = andreas::read_andreas(db, path)
s
```
#### Extracting point data

It's pretty easy to extract values at various points.  We provide a sample of sites (some buoys and two airports).  Below we can extract point data in long form.

```{r extract}
sites = read_buoys()
values = extract_points(s, sites)
values
```

Or we can do the same in wide format.

```{r extract_wide}
values = extract_points(s, sites, form = "wide")
values
```

If you explore the data you'll notice that the last two points (both airports) return NA values which is expected for a marine data set.


### Static variables

Some products have static variables (model coordinates, depth, mask, etc).  We don't retain any on a usual basis except for `deptho`, `mask` and `deptho_lev` (model level at bottom).  We augment the base variables with a look up table, `lut`, which maps land-based pixels to their nearest water-based pixel. Others include `TPI`, `TRI`, `roughness`, `slope` and `aspect`.  Learn more about those [here](https://rspatial.github.io/terra/reference/terrain.html).

You can read static variables easily by providing just the variable names and the data path.

```{r static}
static = read_static(name = c("mask", "deptho"), path = path)
static
```

#### Remapping points located over land

The `lut` is useful when you are extracting point data from rasters, but some (or all!) of your points are close to water but not over water. The location of an animal washed up on a beach is a classic example (or a sturgeon report from nearshore) - they just miss being located over a "wet" pixel.  It happens.  Here's how you can find the closest available "wet" pixel given one or more points (some of which are on shore.)  The original locations are shown with open purple circles, and the remapped locations are shown with closed orange circles.

```{r remap}
lut = read_static(name = "lut", path = path)
y = remap_to_water_pixel(sites, lut)

plot(static['deptho'], axes = TRUE, reset = FALSE,
     extent = sf::st_union(sites) |> sf::st_buffer(dist = 5000),
     main = "")
points(sites, pch = 1, col = "purple", cex = 1.3)
points(y, pch = 20, col = "orange")
```