This repository contains the functionality to standardize the data of the European Seabirds at Sea (ESAS) to a Darwin Core Archive that can be harvested by OBIS and GBIF.
To republish the data:
- Clone this repository to your computer.
- Download all public ESAS data from ICES.
- Unzip the download and move the files to the repository in a
data/rawdirectory. The directory (and the files it contains) is ignored by git, so you will have to create it. - Open the repository in RStudio by opening the
esas2obis.Rprojfile. - Open the Darwin Core mapping script
dwc_mapping.Rmd. - Click
Run > Run Allto transform the data to Darwin Core files using SQL. This will take a while. - Verify that all steps in the the mapping script ran without errors.
- Verify in git or GitHub Desktop that the sample data are not affected (changes would indicate updates or issues in the mapping).
- Upload the Darwin Core files to the EurOBIS IPT.
- Validate the Darwin Core Archive (by EurOBIS staff).
- Publish the dataset to OBIS and GBIF (by EurOBIS staff).
- Dataset on IMIS: source for the metadata and landing page for the DOI (https://doi.org/10.14284/601)
- Dataset on the EurOBIS IPT: source for the data
- Dataset on OBIS
- Dataset on GBIF
ESAS data is structured in 4 hierarchical tables: campaigns, samples, positions and observations.
The Event core contains three types of events:
- Campaigns (
type=cruise) with aneventID, date range, and remarks. - Samples (
type=sample) with aneventID,parentEventID(the campaign), single date and remarks. - Positions (
type=subSample) with aneventID,parentEventID(the sample), datetime and location.
The eventIDs are created by concatenating the parent identifiers, e.g. <campaignID>_<sampleID>_<positionID> for a position. This makes them unique within the dataset and easy to understand.
Record-level terms such as institutionCode, datasetName, license and rightsHolder are included as well.
See the SQL file for the full transformation.
The Occurrence extension contains the observations, with the following terms:
eventID(the position) andoccurrenceID.basisOfRecord(alwaysHumanObservation) andoccurrenceStatus(alwayspresent).scientificName,scientificNameID(WoRMS identifier),kingdom(alwaysAnimalia) andvernacularName.individualCount,sex,lifeStage,behavior,associatedTaxa(also expressed as measurements or facts).occurrenceRemarks.
The occurrenceIDs are created similarly to the eventIDs, as <campaignID>_<sampleID>_<positionID>_<observationID>.
See the SQL file for the full transformation.
The EMOF extension contains all other ESAS data, with the following terms:
eventID: identifier of sample or position (there are no campaign measurements).occurrenceID(where applicable): identifier of the occurrence.measurementType: lowercase description of the measurement.measurementTypeID(where applicable): link to a definition of the measurement. Where possible, we use the BODC Parameter Usage Vocabulary (P01) or fall back to ESAS vocabularies maintained by ICES (e.g. https://vocab.ices.dk/services/rdf/collection/UseOfBinoculars).measurementValue: human readable value or description, lowercased where appropriate.measurementValueID(where applicable): IRI for the value. These mostly link to values in ESAS vocabularies maintained by ICES (e.g. https://vocab.ices.dk/services/rdf/collection/UseOfBinoculars/2), except for platform code (C17), sex (S10) and life stage (S11).measurementUnit(where applicable): unit of the measurement.measurementUnitID: link to a definition of the unit, with XXXX for not applicable and UUUU for dimensionless (e.g.individualCount).
The ESAS terms behaviour and association can contain multiple values for a single observation and are split into maximum 3 measurements or facts records.
See Table 1 for an overview and the SQL file for the full transformation.
| table | measurement or fact | type | example |
|---|---|---|---|
| sample | platform code | vocab | BELGICA |
| sample | platform class | vocab | ship |
| sample | platform side | vocab | left |
| sample | platform height | number | |
| sample | transect width | integer | 300 |
| sample | sampling method | vocab | ship-based transect method with distance estimation and snapshot for flying birds |
| sample | primary sampling | boolean | True |
| sample | target taxa | vocab | all species recorded (standard) |
| sample | distance bins | string | 0|50|100|200|300 |
| sample | use of binoculars | vocab | Binoculars used extensively for scanning ahead and to the side, naked eye used for close observations (e.g. for cetacean monitoring) |
| sample | number of observers | integer | 2 |
| position | distance | number | 0.7 |
| position | area | number | 0.21 |
| position | wind force | vocab | moderate breeze |
| position | visibility | vocab | C |
| position | glare | vocab | weak |
| position | sun angle | integer | |
| position | cloud cover | vocab | |
| position | precipitation | vocab | none |
| position | ice cover | integer | 0 |
| position | observation conditions | vocab | |
| observation | group identifier | string | 12 |
| observation | in transect | boolean | True |
| observation | individual count | integer | 1 |
| observation | observation distance | vocab | 100-200 |
| observation | life stage | vocab | adult |
| observation | moult | vocab | active primary moult |
| observation | plumage | vocab | non-breeding (winter) plumage |
| observation | sex | vocab | female |
| observation | travel direction | vocab | 45 |
| observation | prey | vocab | medium fish, unidentified (ca. 2-5x bill length) |
| observation | association x 3 | vocab | associated with observation base |
| observation | behaviour x 3 | vocab | scavenging |
The repository structure is based on Cookiecutter Data Science and the Checklist recipe. Files and directories indicated with GENERATED should not be edited manually.
├── README.md : Description of this repository
├── LICENSE : Repository license
├── esas2obis.Rproj : RStudio project file
├── .gitignore : Files and directories to be ignored by git
│
├── src
│ └── dwc_mapping.Rmd : Darwin Core mapping script
|
├── sql : Darwin Core transformations
│ ├── dwc_event.sql
│ ├── dwc_occurrence.sql
│ └── dwc_mof.sql
|
└── data
├── processed : Darwin Core output of mapping script GENERATED
└── processed_sample : Darwin Core sample output of mapping script for git comparison GENERATED
MIT License for the code and documentation in this repository. The included data is released under another license.