You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+35-5Lines changed: 35 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,6 @@
2
2
3
3
An implementation of the threat based [STAR biodiversity metric by Muir et al](https://www.nature.com/articles/s41559-021-01432-0) (also known as STAR(t)).
4
4
5
-
See [method.md](method.md) for a description of the methodology, or `scripts/run.sh` for how to execute the pipeline.
6
-
7
5
## Checking out the code
8
6
9
7
The code is available on github, and can be checked out from there:
@@ -23,7 +21,6 @@ There are some additional inputs required to run the pipeline, which should be p
23
21
24
22
The script also assumes you have a Postgres database with the IUCN Redlist database in it.
25
23
26
-
27
24
## Species data acquisition
28
25
29
26
There are two scripts for getting the species data from the Redlist. For those in the IUCN with access to the database version of the redlist, use `extract_species_data_psql.py`.
@@ -34,6 +31,20 @@ For those outside the IUCN, there is a script called `extract_species_data_redli
34
31
35
32
There are two ways to run the pipeline. The easiest way is to use Docker if you have it available to you, as it will manage all the dependencies for you. But you can check out and run it locally if you want to also, but it requires a little more effort.
36
33
34
+
Either way, the pipeline itself is ran using [Snakemake](https://snakemake.readthedocs.io/en/stable/), which is a tool designed to run data-science pipelines made up from many different scripts and sources of information. Snakemake will track dependancies making it easier to re-run the pipeline and only the bits that depend on what changed will rerun. However, in STAR the initial data processing of raster layers is very slow, so we've configured Snakemake to never re-generate those unless the generated rasters have been deleted manually.
35
+
36
+
Because sometimes you do not need to run all the pipeline for a specific job, the snakemake script has multiple targets you can invoke:
37
+
38
+
* prepaer: Generate the necessary input rasters for the STAR pipeline.
39
+
* species_data: Extract species data into GeoJSON files from Redlist database.
40
+
* aohs: Just generate the species AOHs and summary CSV.
41
+
* validation: Run model validation.
42
+
* occurrence_validation: Run occurrence validation - this can be VERY SLOW as it fetches occurrence data from GBIF.
43
+
* threats: Generate the STAR(t) raster layers.
44
+
* all: Do everything except occurrence validation.
45
+
46
+
There is a configuration file in `config/config.yaml` that is used to set experimental parameters such as which taxa to run the pipeline for.
47
+
37
48
### Running with Docker
38
49
39
50
There is included a docker file, which is based on the GDAL container image, which is set up to install everything ready to use. You can build that using:
@@ -42,15 +53,21 @@ There is included a docker file, which is based on the GDAL container image, whi
42
53
$ docker buildx build -t star .
43
54
```
44
55
56
+
Note that depending on how many CPU cores you provide, you will probably need to give Docker more memory that the out of the box setting (which is a few GB). We recommend giving it as much as you can allow.
57
+
45
58
You can then invoke the run script using this. You should map an external folder into the container as a place to store the intermediary data and final results, and you should provide details about the Postgres instance with the IUCN redlist:
46
59
47
60
```shell
48
61
$ docker run --rm -v /some/local/dir:/data \
62
+
-p 5432:5432 \
49
63
-e DB_HOST=localhost \
50
64
-e DB_NAME=iucnredlist \
51
65
-e DB_PASSWORD=supersecretpassword \
52
66
-e DB_USER=postgres \
53
-
star ./scripts/run.sh
67
+
-e GBIF_USERNAME=myusename \
68
+
-e GBIF_PASSWORD=mypassword \
69
+
-e GBIF_EMAIL=myemail \
70
+
star --cores 8 all
54
71
```
55
72
56
73
### Running without Docker
@@ -61,7 +78,6 @@ If you prefer not to use Docker, you will need:
61
78
* GDAL
62
79
* R (required for validation)
63
80
*[Reclaimer](https://github.com/quantifyearth/reclaimer/) - a Go tool for fetching data from Zenodo
64
-
*[Littlejohn](https://github.com/quantifyearth/littlejohn/) - a Go tool for running scripts in parallel
65
81
66
82
If you are using macOS please note that the default Python install that Apple ships is now several years out of date (Python 3.9, released Oct 2020) and you'll need to install a more recent version (for example, using [homebrew](https://brew.sh)).
0 commit comments