Skip to content

Latest commit

 

History

History
130 lines (101 loc) · 36.3 KB

File metadata and controls

130 lines (101 loc) · 36.3 KB

Animal-associated microbiome biological/environmental metadata

  Minimal biological/environmental metadata for animal-associated microbiome

Category MIXS ID Metadata field Brief description Definition Example of Annotation Source
Sample metadata [“ENA Marine Microalgae Checklist; Checklist: ERC000043”] collected_by Who collected the sample Name of person or institute that collected the sample Freie Universität Berlin “ENA Marine Microalgae Checklist; Checklist: ERC000043”
[MIXS:0000001] samp_size Amount or size of the collected sample The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected 3 g feces “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000011] collection_date Date at which the sample was collected The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated. [ISO8601] compliant 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are [ISO8601] compliant “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000026] source_mat_id identifier(s) of source material A unique identifier assigned to a [material sample] used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2) DOG_FECAL_0001 “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000092] project_name project name under which sampling and sequencing was done Name of the project within which the sequencing was organized Canine Gut Microbiome Sequencing Project
[MIXS:0000110] samp_store_temp Temperature at which the sample was stored Temperature at which sample was stored, e.g. -80 ˚C -80 ˚C
[MIXS:0000113] temp temperature Temperature of the sample at the time of sampling 37 ˚C “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000183] salinity salinity The total concentration of all dissolved salts in a liquid or solid sample. While salinity can be measured by a complete chemical analysis, this method is difficult and time consuming. More often, it is instead derived from the conductivity measurement. This is known as practical salinity. These derivations compare the specific conductance of the sample to a salinity standard such as seawater 0 practical salinity unit (PSU) “GSC MIxS: Host-associatedMIMS”, “GSC MIxS: Human-associatedMIMS”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”
[MIXS:0000249] samp_dis_stage Disease stage of sampled host Stage of the disease at the time of sample collection, e.g. inoculation, penetration, infection, growth and reproduction, dissemination of pathogen infection
[MIXS:0000752] misc_param miscellaneous parameter Any other measurement performed or parameter collected, that is not listed here household pet; kibble diet
[MIXS:0000753] oxy_stat_samp oxygenation status of sample Oxygenation status of sample anaerobic
[MIXS:0000754] perturbation perturbation Type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with perturbation regimen including how many times the perturbation was repeated, how long each perturbation lasted, and the start and end time of the entire perturbation period; can include multiple perturbation types praziquantel 5 mg kg⁻¹ PO, 24 h pre-sampling
[MIXS:0001107] samp_name sample name A local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. Use of a globally unique identifier for the field source_mat_id is recommended in addition to sample_name Dog_001_Feces_2008-01-23 “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0001216] microb_cult_med Microbiological culture medium (applicable only if microorganism can be cultivated) Composition of processed material providing the needed nourishment for microorganisms or cells to grow in vitro. This field accepts terms listed under culture medium [OBI:0000079]. If the proper descriptor is not listed please use text to describe the culture medium minimal defined medium [MCO:0000881] “MIMS: Metagenome/Environmental, Human-Associated; Version 6.0 Package”, MSI-ECWSG (Morrison et al. (2007))
[MIXS:0001317] samp_store_sol Solution in which the sample was stored Solution within which sample was stored, if any RNALater [NCIT:C63348]
[MIXS:0001320] samp_taxon_id Taxonomical identifier of sample NCBI taxon id of the sample. Maybe be a single taxon or mixed taxa sample. Use ‘synthetic metagenome for mock community/positive controls, or ’blank sample’ for negative controls. Expected_value: [NCBI taxonomy ID] Gut Metagenome [NCBITaxon:749906] “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[] microbial_isolate Can the microbial isolate be cultured in vitro. Y/N N
Site metadata [MIXS:0000009] lat_lon geographic location (latitude and longitude) The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees, limited to 8 decimal points, and in WGS84 system 52.454456 13.293950
[MIXS:0000010] geo_loc_name geographic location (country and/or sea,region) Geographic location (country and/or sea,region). The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the [INSDC country list], or the [GAZ ontology] Germany: Berlin “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000012] env_broad_scale Broad-scale environmental context Report the major environmental system the sample or specimen came from. Systems(s) identifiers should provide a coarse, general environmental context of where the sampling was done. Recommended use of EnvO s biome class: [ENVO_00000428]. If more than one term applies to the field, | should be used to separate them. urban biome [ENVO:01000249] “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000013] env_local_scale Local-scale environmental context Report the entity or entities which are in the sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. Entry should be of a smaller environmental context than env_broad_scale. Terms, such as anatomical sites, from other [OBO Library] ontologies which interoperate with EnvO (e.g. [UBERON]) are accepted in this field. If more than one term applies to the field, | should be used to separate them. household environment [ENVO:03501339] “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000014] env_medium Environmental medium Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. Recommended use of EnvO’s subclasses of environmental material [ENVO:00010483]. Terms from other [OBO ontologies] are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top). If more than one term applies to the field, | should be used to separate them. fecal material [ENVO:00002003] “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”
[MIXS:0000018] depth depth The vertical distance below local surface. For sediment or soil samples depth is measured from sediment or soil surface, respectively. Depth can be reported as an interval for subsurface samples 0 m “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS: Host-associatedMIMS”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000093] elev elevation Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit. Origin elevation in m 34 m “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS: Host-associatedMIMS”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000094] alt altitude Heights of objects such as airplanes, space shuttles, rockets, atmospheric balloons and heights of places such as atmospheric layers and clouds. It is used to measure the height of an object which is above the earth's surface. In this context, the altitude measurement is the vertical distance between the earth's surface above sea level and the sampled position in the air not applicable “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS: Host-associatedMIMS”, “GSC MIXS: MIMAG”, “GSC MIXS: MIGSBacteria”, “GSC MIMS: Metagenome or Environmental”, “Minimum Information about a Single Ampligied Genome (MiSAG)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
Site/host metadata [MIXS:0000751] chem_administration List of chemical administered to sampled host or site List of chemical compounds administered to host or on site where sampling occurred. Can include multiple compounds separated by |. For compounds consult [chemical entities of biological interest ontology (chebi) (v 163)] praziquantel[CHEBI:45267] “GSC MIxS: Host-associatedMIMS”, “GSC MIxS: Human-associatedMIMS”
Host metadata [MIXS:0000031] host_disease_stat Disease status of the sampled host List of diseases with which the host has been diagnosed; can include multiple diagnoses. The value of the field depends on host, non-human host diseases are free text. cestode infection (Dipylidium caninum) OR healthy “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”, “GSC MIxS: Human-associatedMIMS”, “GSC MIXS: MIGSBacteria”, “Minimum Information about Viral Genome Sequence (MigsVi)”, “Minimum Information about an Uncultivated Virus Genome (Miuvig)”
[MIXS:0000248] host_common_name Common name of the sampled host Common name of the host dog “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”
[MIXS:0000250] host_taxid Taxonomy identifier of sampled host NCBI taxon id of the host [NCBI taxonomy ID] Canis lupus familiaris [NCBI:txid9615] “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, MSI-ECWSG (Morrison et al. (2007))
[MIXS:0000251] host_life_stage Life stage of the sampled host Description of life stage of host adult
[MIXS:0000255] host_age Age of sampled host Age of host at the time of sampling; relevant scale depends on species and study, e.g. Could be seconds for amoebae or centuries for trees 5 y “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”, “GSC MIxS: Human-associatedMIMS”
[MIXS:0000256] host_length Length of sampled host The length of host 45 cm “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”
[MIXS:0000260] host_color Color of sampled host The color of host brown
[MIXS:0000261] host_shape Morphological shape of sampled host Morphological shape of host Slender [PATO:0002212]
[MIXS:0000263] host_tot_mass Total mass of the sampled host Total mass of the host at collection, the unit depends on host 22 kg “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”, “GSC MIxS: Human-associatedMIMS”
[MIXS:0000264] host_height Height of sampled host The height of subject 55 cm “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”, “GSC MIxS: Human-associatedMIMS”
[MIXS:0000274] host_body_temp Body temperature of sampled host Core body temperature of the host when sample was collected 38.5˚C
[MIXS:0000365] host_genotype Observed genotype of sampled host Observed genotype domestic dog reference genome CanFam3.1
[MIXS:0000859] genetic_mod genetic modification Genetic modifications of the genome of an organism, which may occur naturally by spontaneous mutation, or be introduced by some experimental means, e.g. specification of a transgene or the gene knocked-out or details of transient transfection none
[MIXS:0000861] host_subject_id Identifier assigned to sampled host A unique identifier by which each subject can be referred to, de-identified Dog_001
[MIXS:0000862] urobiom_sex Physical sex of the sampled host Physical sex of the host Male [PATO:0000384] “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”
[MIXS:0000866] host_body_habitat host body habitat Original body habitat where the sample was obtained from digestive tract [UBERON:0001555]
[MIXS:0000867] host_body_site Sampled body site of the host Name of body site where the sample was obtained from, such as a specific organ or tissue (tongue, lung etc…). Recomended use of [FMA] or [UBERON] ontologies colon [UBERON:0001155] “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”, “GSC MIxS: Human-associatedMIMS”
[MIXS:0000869] host_diet Diet of sampled host Type of diet depending on the host, for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types omnivore [ecocore:00000082] “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”, “GSC MIxS: Human-associatedMIMS”
[MIXS:0000871] host_growth_cond Growth conditions of sampled host Literature reference giving growth conditions of the host individual housing [XCO:0000034]
[MIXS:0000874] host_phenotype Identified phenotype of sampled host Phenotype of human or other host. Use terms from the phenotypic quality ontology (pato) or the Human Phenotype Ontology (HP) Body condition score 5/9
[MIXS:0000875] gravidity Gravidity of sampled host Whether or not subject is gravid, and if yes date due or date post-conception, specifying which is used Non-gravid
[MIXS:0000888] host_body_product Body product that was examined and sampled from host Substance produced by the body, e.g. Stool, mucus, where the sample was obtained from. Use terms from the foundational model of anatomy ontology [FMA] or Uber-anatomy ontology [UBERON]. feces [UBERON:0001988] “GSC MIxS: Host-associatedMIMS”, “ENA Host Associated Checklist; Checklist: ERC000013”, “GSC MIxS Human Associated; ENA Checklist: ERC000014”, “GSC MIxS: Human-associatedMIMS”
[MIXS:0001298] host_symbiont observed symbionts of sampled host The taxonomic name of the organism(s) found living in mutualistic, commensalistic, or parasitic symbiosis with the specific host. The sampled symbiont can have its own symbionts. For example, parasites may have hyperparasites (=parasites of the parasite) Bacteroides vulgatus [[NCBI:txid821]]
[MIXS:0001307] type_of_symbiosis type of symbiosis with sampled host Type of biological interaction established between the symbiotic host organism being sampled and its respective host Commensalism [ECOCORE:00000025]
[MIXS:0001308] host_specificity Specificity of symbiont of the sampled host Level of specificity of symbiont-host interaction: e.g. generalist (symbiont able to establish associations with distantly related hosts) or species-specific generalist
[MIXS:0001313] host_cellular_loc Cellular location of symbiont within the sampled host The localization of the symbiotic host organism within the host from which it was sampled: e.g. intracellular if the symbiotic host organism is localized within the cells or extracellular if the symbiotic host organism is localized outside of cells lumen of intestine [UBERON:0018543]

Animal-associated - Ontology recommendations

NCBI organismal classification - NCBITAXON - An ontology representation of the NCBI organismal taxonomy.

Biological Spatial Ontology - BSPO - An ontology for respresenting spatial concepts, anatomical axes, gradients, regions, planes, sides and surfaces. These concepts can be used at multiple biological scales and in a diversity of taxa, including plants, animals and fungi. The BSPO is used to provide a source of anatomical location descriptors for logically defining anatomical entity classes in anatomy ontologies.

Uber-anatomy ontology - UBERON - Uberon is an integrated cross-species anatomy ontology representing a variety of entities classified according to traditional anatomical criteria such as structure, function and developmental lineage. The ontology includes comprehensive relationships to taxon-specific anatomical ontologies, allowing integration of functional, phenotype and expression data.

Cell Ontology - CL - The Cell Ontology is a structured controlled vocabulary for cell types in animals.

Neuro Behavior Ontology - NBO - An ontology of human and animal behaviours and behavioural phenotypes.

The BRENDA Tissue Ontology - BTO - A structured controlled vocabulary for the source of an enzyme comprising tissues, cell lines, cell types and cell cultures.

Gene Ontology - GO - The Gene Ontology (GO) provides a framework and set of concepts for describing the functions of gene products from all organisms.

Chemical Entities of Biological Interest - ChEBI - An open-access database and ontology of chemical entities. The chemical entities in ChEBI are either naturally occurring molecules or synthetic compounds used to intervene in the processes of living organisms. ChEBI uses the nomenclature, symbolism and terminology endorsed by the International Union of Pure and Applied Chemistry (IUPAC) and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). ChEBI also incorporates an ontological classification, whereby the relationships between chemical entities or classes of entities and their parents and/or children are defined; this enables queries based for example on chemical class and role.

The Environment Ontology - ENVO - ENVO is an ontology which represents knowledge about environments,environmental processes, ecosystems, habitats, and related entities.

An ontology of core ecological entities - ECOCORE - An ontology of core ecological entities.

Foundational Model of Anatomy Ontology - FMA - The FMA is a domain ontology that represents a coherent body of explicit declarative knowledge about human anatomy. Its ontological framework can be applied and extended to all other species. The Foundational Model of Anatomy (FMA) ontology is one of the information resources integrated in the distributed framework of the Anatomy Information System developed and maintained by the Structural Informatics Group at the University of Washington.

For readers of this repository, confused by the use of EnvO s ontologies, we recommend they read the EnvO s use documentation here: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS.

References

“ENA Host Associated Checklist; Checklist: ERC000013.” https://www.ebi.ac.uk/ena/browser/view/ERC000013.

“ENA Marine Microalgae Checklist; Checklist: ERC000043.” https://www.ebi.ac.uk/ena/browser/view/ERC000043.

“GSC MIMS: Metagenome or Environmental.” https://genomicsstandardsconsortium.github.io/mixs/0010007/.

“GSC MIxS Human Associated; ENA Checklist: ERC000014.” https://www.ebi.ac.uk/ena/browser/view/ERC000014.

“GSC MIxS: Host-associatedMIMS.” https://genomicsstandardsconsortium.github.io/mixs/0016002/.

“GSC MIxS: Human-associatedMIMS.” https://genomicsstandardsconsortium.github.io/mixs/0016003/.

“GSC MIXS: MIGSBacteria.” https://genomicsstandardsconsortium.github.io/mixs/0010003/.

“GSC MIXS: MIMAG.” https://genomicsstandardsconsortium.github.io/mixs/0010011/.

“MIMS: Metagenome/Environmental, Human-Associated; Version 6.0 Package.” https://www.ncbi.nlm.nih.gov/biosample/docs/packages/MIMS.me.human-associated.5.0/.

“Minimum Information about a Single Ampligied Genome (MiSAG).” https://genomicsstandardsconsortium.github.io/mixs/0010010/.

“Minimum Information about an Uncultivated Virus Genome (Miuvig).” https://genomicsstandardsconsortium.github.io/mixs/0010012/.

“Minimum Information about Viral Genome Sequence (MigsVi).” https://genomicsstandardsconsortium.github.io/mixs/0010005/.

Morrison, Norman, Daniel Bearden, Jacob G. Bundy, Timothy Collette, Fraser Currie, Matthew Davey, Migdalia Dominguez, et al. 2007. “Standard Reporting Requirements for Biological Samples in Metabolomics Experiments: Environmental Context.” Metabolomics 3 (2): 203–10. https://doi.org/10.1007/s11306-007-0067-1.