| SEP | 10 |
|---|---|
| Title | simplify description of sequence features and sub-parts |
| Authors | Raik Gruenberg <raik.gruenberg at gmail com> |
| Editor | James McLaughlin |
| Type | Data Model |
| SBOL Version | 3.0 |
| Replaces | |
| Status | Accepted |
| Created | 20-Sep-2016 |
| Last modified | 31-Aug-2019 |
| Issue | #25 |
There are two very different types of 'part annotation'. (1) part composition relationships -- These always point to an existing (and presumably re-usable) sub-component. Sequence location or indeed sequence information may or may not be available. (2) Classic sequence feature annotations -- As known from the genbank format, these only apply to clearly specified sequence regions but often do not point to meaningful sub-parts.
Currently, Component alone is sufficient to describe sub-part relations without any sequence information. This however is the exception in synthetic biology practice. Both SequenceAnnotation and Component are needed for SBOL encoding of actual genetic designs with parts and sub-parts (because Component lacks a location field). Conversely, simple sequence features can be described using SequenceAnnotation alone (as of SBOL 2.0) but this possibility is not widely known and additional Component and ComponentDefinition are often created instead.
We propose to modify Component and SequenceAnnotation such that Component is solely responsible for the description of part - subpart relationships (with or without sequence) and SequenceAnnotation is solely responsible for the description of genbank-style sequence features. SequenceAnnotation should be renamed to SequenceFeature.
- 1. Rationale
- 1.1 current situation
- 1.2 Goals of the proposal
- 2. Specification
- 2.1 Add
locationfield toComponent - 2.2 Rename
SequenceAnnotationtoSequenceFeature - 2.3 Restrict
SequenceFeatureto sequence feature annotation - 2.4 Let
SequenceConstraint.objectpoint toSequenceFeature
- 2.1 Add
- 3. Example or Use Case
- 4. Backwards Compatibility
- 4.1 suggested transition path
- 4.2 Conversion of SBOL 2.1 records to 3.0
- 4.3 Backwards conversion of SBOL 3.x to 2.1 records
- 5. Discussion
- 6. Competing SEPs
- References
- Copyright
The current SequenceAnnotation class has a dual purpose:
(1) Its primary role is to specify the location of "sub-parts" within the sequence of a parent ComponentDefinition. To this end, SequenceAnnotation links one or more Location records with a Component. The Component, in turn, refers to a ComponentDefinition (via its definition field). This ComponentDefinition is the description of the actual sub-part. Actual physical composition is therefore defined like this:
ComponentDefinition -[sequenceAnnotation]-> SequenceAnnotation -[component]-> Component -[definition]-> ComponentDefinition
The directionality (which one is parent and which one is a sub-part) is frequently confused. Moreover, the parent ComponentDefinition also directly links to the sub-part Component via a component field. This is necessary so that composition can be described before any sequences (and thus sequence locations) are known. An additional chain of references is therefore needed, in parallel to the one shown above:
ComponentDefintion -[component]-> Component -[definition]-> ComponentDefinition
Current SBOL 2.1 part - subpart relations are summarized in the following figure:

Adding to this redundancy, both Component and SequenceAnnotation may have role properties that diverge from the role (functional classification) of the target ComponentDefinition. Whether a diverging role is attached to Component or SequenceAnnotation is an arbitrary choice. This invites conflicting implementations and interpretations of this field.
Evidently, SBOL makes the description of "undefined", "loose bag", part composition without any sequence information relatively easy. By contrast, the description of actual genetic designs with actual sequences is surprisingly complex and redundant. This is unfortunate because the latter is, by and far, the overwhelming use case of SBOL. It also hinders adoption by sequence-level designers and tool developers.
(2) The secondary role of SequenceAnnotation is to simply annotate regions of interest within a given sequence. Arguably, this should be its primary role (hence the name) as it is a very common use case in practice. A SequenceAnnotation without component can be created and linked to a region of, e.g., DNA. SequenceAnnotation inherits name and description fields from Identified and is therefore sufficient for the description of "flat" sequence features. In practice however, most tools mix SequenceAnnotation and Component even for simple sequence features:
(1) Restrict the use of SequenceAnnotation to annotations of features which do not fall into the part - subpart category. As a welcome side effect, this should make it much easier to move back and forth between SBOL and large bodies of existing genbank-formatted information and related software.
(2) Simplify the part-subpart relationship via Component so that it does not any longer require SequenceAnnotation.
(3) Create a syntactic parallel between sequence/physical and functional part-subpart relations in SBOL -- The Component class will be equivalent in syntax and meaning to the existing Participation class. For programmers, the pattern ComponentDefinition -> Component(role) -> ComponentDefinition will look and feel like the already established pattern Interaction -> Participation(role) -> ComponentDefinition.
(4) Remove ambiguity as to how things can / should be expressed at the sequence layer to aid meaningful data exchange.
Add the following optional field to Component:
- [0..n]
locationpointing to aLocationon the parentComponentDefinitionsequence; iflocationis missing, this indicates a part / sub-part relationship for which sequence details have not (yet) been determined.
The Location record(s) specified by a Component are subject to the same restrictions currently in place for SequenceAnnotation Location. Concretely, two Location records attached to the same Component MUST NOT overlap in their range as it would not be clear what that means. The Location of two separate Components may overlap.
- rename class
SequenceAnnotationtoSequenceFeature - rename
sequenceAnnotationfield ofComponentDefinitiontosequenceFeature
Remove the following fields from SequenceFeature (formerly SequenceAnnotation):
component-- SequenceAnnotation is not any longer used for part - subpart relationsroleIntegration-- there is no sub-part/definition thatrolefields may be in conflict with
Update the specification to clarify usage of existing fields:
- [0..n]
rolepointing to a SequenceOntology term (optional), corresponds to genbank type field - [0..1]
namecorresponding to genbank name field (optional but now RECOMMENDED) - [0..1]
descriptioncorresponds to genbank description field (optional)
Moreover, a validation rule is needed: SequenceFeature can only be used
if an actual sequence record is specified for the parent ComponentDefinition.
SequenceConstraint.object and SequenceConstraint.subject can point to either ComponentInstance derivatives
(as before) or to SequenceFeature.
This change allows to anchor constraints on sequence regions that are not actually sub-parts. Examples may be start / stop codons, transcription start sites or specific mutations.
Example use cases for the modified SequenceAnnotation are feature annotations such as START or STOP codons, mutations, highlighting regions referred to in a paper, sequence conflicts, etc, all mainly intended for human consumption. Over the evolution of a design, sequence features may later be formalized into re-usable subparts (i.e. 'Component's) It is therefore conceivable that a sequence editor reads in a genbank file with many sequence features and offers the user the easy conversion of some of those features into sub-parts. This, in fact, is a workflow already used and supported by the Benchling Sequence editor (http://benchling.com).
Implement all changes at once in SBOL v 3.0.
- remove intermediate
SequenceAnnotationand moveSequenceAnnotation.locationtoComponent - optionally, try to flatten
SequenceAnnotation-Component-ComponentDefinitionchains of trivial annotations intoSequenceFeaturerecords
-
conversion of localized
Component:(1) create
SequenceAnnotationrecord pointing to subpartComponent(2) move
locationfromComponenttoSequenceAnnotation -
conversion of non-localized
Component:no change required
-
conversion of
SequenceFeature(1) rename
SequenceFeaturetoSequenceAnnotation(2) rename
ComponentDefinition.sequenceFeaturefield tosequenceAnnotation
As an added benefit, the proposed change creates a symmetry between the sequence
and the functional layer of SBOL: Component is now the equivalent of
Participation. The former describes a physical part- subpart relation whereas
the latter describes a functional part - subpart relation. Both specify one or
more role properties, both point to a (sub)ComponentDefinition. Currently,
this parallel is obfuscated by the multiple direct and indirect references
between parent and sub-part ComponentDefinition.
- It was pointed out that
SequenceAnnotationalready can have its ownnameanddescriptionfields as it is derrived fromIdentified. The SEP was changed accordingly. - Renaming
SequenceAnnotationtoSequenceFeaturewas universally considered a good idea (for symmetry with genbank, bioinformatics practics and in order to avoid confusion with "Annotation" in SBOL and SBML).
-
At COMBINE, it was suggested that
SequenceConstraintshould also be allowed to point toSequenceFeature. This would avoid construction ofComponent-ComponentDefinitionchains for, e.g. mutations or other simple features that are not sub-parts but nevertheless restrict/orient the positioning of other Components. This change has been incorporated into the SEP. -
Originally, this link
SequenceConstraint->SequenceFeaturelink was restricted to theSequenceConstraint.objectfield. This was meant to enforce thatComponents(sub-parts) can be anchored to sequence features but not the other way round. However, the types of constraints allowed assume that the directionality of aSequenceConstraintcan be freely chosen. We can say that part Apreceedspart B but we cannot say that part A "follows" part B. In SBOL, the latter is expressed as "part Bpreceedspart A" (i.e.subjectandobjectof theSequenceConstrainare reversed). For this reason, bothobjectandsubjectof the constraint need to be allowed pointing toSequenceFeature. -
At COMBINE, it was suggested to rename
SequenceConstraintintoComponentConstraint-- this should be put into a separate SEP. -
It was suggested to put additional restrictions on the use of
Component.locationso that a fully specified sequence can more easily be pieced together from Component sequences. This raises an important issue with the current data model, which does not allow an easy distinction between partially defined and fully specified sequences. However, the editors consider this as an orthogonal problem which should be adressed separately.
- The original SEP (see github history) suggested a step-wise introduction starting with SBOL 2.2. This would have led to a hybrid data model where both usage patterns could co-exist and was eventually considered too complex. Instead, the SEP is considered as a clean backward-incompatible change for SBOL v 3.0.
The following SEPs make complementary suggestions for further simplification of the SBOL data model:
- SEP 15 (Issue) -- rename Component -> SubPart and ComponentDefinition -> Component
- SEP 25 (Issue) -- merge Module(Definition) with Component(Definition) and remove FunctionalComponent
None.

To the extent possible under law,
SBOL developers
has waived all copyright and related or neighboring rights to
SEP 010.
This work is published from:
United States.

