Skip to content

Some suggestions and proposals for annotations in SpatialData #975

@selmanozleyen

Description

@selmanozleyen

Hi,

I'd like to first start this conversation then create more specific issues in points you agree with me. I have some suggestions about modifying and generalizing the internals of SpatialData annotations.

One row can only link to one spatial element

Image

Currently, a row in a table can at most only annotate one type of a spatial element. E.g. If sdata['table'][i] annotates sdata['shape'][i], then sdata['table'][i] can't annotate sdata['label'][i].

Take this test code I wrote for example #946

sdata = concatenate(
    {
        "labels": blobs_annotating_element("blobs_labels"),
        "shapes": blobs_annotating_element("blobs_circles"),
        "points": blobs_annotating_element("blobs_points"),
        "multiscale_labels": blobs_annotating_element("blobs_multiscale_labels"),
    },
    concatenate_tables=True,
)
third_elems = sdata.tables["table"].obs["instance_id"] == 3
subset_sdata = subset_sdata_by_table_mask(sdata, "table", third_elems)
# here elements with instance_id 3 are more than one in the table
# just to be able to annotate a cell in another region I had to duplicate the count information etc

My conclusion

Because we store each row-to-row mapping in the table itself we end up having to duplicate count information because we "explode" the table.

One row can only link to one item of a spatial element

One-to-many relationship is something we'd like to actually have for points I think. We already have this implicitly for the labels. And we can support this by just generalizing the current annotation scheme.

My suggestion to solve both issues

Ultimately we want a mapping {src_key: {dst_element_name: (dst_access, dst_kind, link_kind, dst_instance_key)}}.

  • dst_access is the access method of the dst element, for example "value" or "key". Currently for labels we use "value" since there is no columns in a raster image and for shapes and points we use "key" since we have a column in the table
  • dst_kind is the kind of the dst element, for example "labels", "shapes", "points".
  • link_kind is the kind of the link, for example "one-to-one", "one-to-many".
  • dst_instance_key is the key of the dst element if dst_access is "key".

Currently dst_kind serves no purpose as we define the kind of linking we want but I added it for future flexibility.

User interface might look like this.

mapping = {
    "instance_id": {
        "blobs_labels": ("value", "label", "one-to-one", None), 
        "blobs_circles": ("key",   "shape", "one-to-one", ("shape_id",)),
        "parts_of_a_cell": ("key",   "shape", "one-to-many", ("shape_id",)),
        "blobs_points": ("key",   "point", "one-to-many", ("contained_in_shape_id",)),
    },
}
add_links(sdata, "table", mapping)

Stored in exploded normalized form for example sdata.tables["table"].uns["row_mappings"]

| src_instance_key | dst_elem_name | dst_instance_key | dst_access | dst_kind | link_kind |
| "instance_id" | "blobs_labels" | ... | "value" | "label" | "one-to-one" |
| "instance_id" | "blobs_circles" | ... | "key" | "shape" | "one-to-one" |
| "instance_id" | "parts_of_a_cell" | ... | "key" | "shape" | "one-to-many" |
| "instance_id" | "blobs_points" | ... | "key" | "point" | "one-to-many" |

I think we can manage these changes in a backwards compatible way and this will open up a lot of possibilities for future extensions.

Bonus points: we would have easier time achieving this #293 (comment) as well since the mapping descriptions is much smaller than adding a column to the .obs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions