Context
Cross-level translation shipped with single-position input only (team decision,
2026-06-02): only single VRS Alleles representing a single-position change are
translated to the levels the assay did not map. Every multivariant input is
persisted at its authoritative assay level with no cross-level fill, and the API
records a cross_level_translation annotation status of skipped for it. This
issue tracks lifting that limitation.
Part of epic VariantEffect/mavedb-api#746.
When single-variant support shipped, the machinery that previously attempted some of
these cases — translate_from_protein_range, _adjacent_protein_haplotype_range,
_AA_TO_CODONS, MAX_RANGE_CODON_POSITIONS, and the non-single-allele branches in
mapping_records._build_record — was removed. This is therefore a redesign, not a
guard relaxation: _build_record currently routes every multivariant input (and
multi-AA single-Allele delins) to assay-level-only persistence.
Scope note: this is about multivariant input. A single-position change whose codon
edits land on non-adjacent bases already correctly emits a genomic g.[a;b] Haplotype
output — that ships and is not in scope here.
In-scope cases
Case 1: Adjacent protein haplotype / multi-AA delins
p.[Ala2Val;Pro3Gly] and p.Ala2_Pro3delinsVG are semantically identical — adjacent
amino acid positions occupy adjacent coding positions with no nucleotide gap. These
normalize to a single contiguous coding allele (c.4_9delinsXXXXXX), not a Haplotype,
and need reverse_translate_hgvs_p extended (or a dedicated range path) to accept range
inputs. (Previously scoped out of this issue; folded back in now that single-variant-only
shipped.)
Case 2: Non-adjacent protein haplotype
p.[Ala2Val;Gly4Asp] — positions 2 and 4 with position 3 unchanged. Coding positions
4–6 and 10–12 have a nucleotide gap at 7–9. Each member is reverse-translated
independently and the results are combined into a coding Haplotype. Unlike Case 1, the
output is a Haplotype of coding Alleles, not a flat Allele.
Case 3: Multi-member nucleotide haplotype
A genomic or coding haplotype assay variant (e.g. g.[a;b;c] or c.[a;b;c]).
Cross-level translation requires per-member c→g or g→c (deterministic,
member-independent) followed by c→p. The c→p step is only correct when no two members
share a codon — if two variants land in the same codon, independent
AssemblyMapper.c_to_p calls give wrong results and the combined coding sequence must be
translated as a unit instead (detected by comparing (position-1)//3 codon indices).
Open design question — bounding the fan-out
The previous implementation used an arbitrary 3-member cap. The team's direction is to
design the bound deliberately rather than ship an unexplained constant — a cap may still
be the answer, but as a conclusion, not a default. Inputs to the decision:
- Protein haplotypes: the Cartesian product of per-position candidates grows
multiplicatively. Per-position candidate counts are typically 1–5 (valid edits to the
reference codon, not all codons encoding the target amino acid).
- The database has ~3.2M variants matching
p.[%] and score sets with up to 7-member
protein haplotypes — so any bound must be justified against the realistic distribution,
not just the common case.
- Options to weigh: a member cap, a candidate-count/product budget, or per-case bounds.
Whatever bound is chosen, skipped inputs must be logged (INFO) and persisted at the native
assay level, and the API's cross_level_translation status should reflect the skip.
Implementation
Translator (translate.py)
- Reintroduce a range path for Case 1 (single contiguous coding allele from a multi-AA /
adjacent-haplotype change).
translate_from_protein_haplotype(members, transcript) — per-member reverse
translation, product of candidates, Haplotype assembly per product (Case 2).
translate_from_nucleotide_haplotype(members, transcript) — per-member c→g / g→c lift;
c→p skipped if any two members share a codon (Case 3).
- Route through
translate_other_levels based on whether the input is a single Allele, a
multi-AA Allele, or a Haplotype.
_build_record (mapping_records.py)
- Replace the single-position /
assay_is_single_allele routing with dispatch to the new
paths, subject to the chosen bound.
- Set
translation_attempted=True on the draft for inputs that are now translated (drives
the API cross_level_translation status).
Acceptance criteria
- Multi-AA delins / adjacent protein haplotypes (≤ bound) produce a single contiguous
coding allele + genomic projection.
- Non-adjacent protein haplotypes (≤ bound) produce coding and genomic Haplotype drafts.
- Multi-member nucleotide haplotypes (≤ bound) produce cross-level drafts; c→p is skipped
(protein omitted) when any two members share a codon.
- Inputs above the bound are persisted at the native assay level only, with an INFO log and
a skipped cross_level_translation status.
- Integration tests cover: multi-AA delins, 2-member non-adjacent protein haplotype,
2-member nucleotide haplotype (non-sharing codons), 2-member nucleotide haplotype
(sharing a codon → protein skipped), and an over-bound input (native level only).
Context
Cross-level translation shipped with single-position input only (team decision,
2026-06-02): only single VRS
Alleles representing a single-position change aretranslated to the levels the assay did not map. Every multivariant input is
persisted at its authoritative assay level with no cross-level fill, and the API
records a
cross_level_translationannotation status ofskippedfor it. Thisissue tracks lifting that limitation.
Part of epic VariantEffect/mavedb-api#746.
When single-variant support shipped, the machinery that previously attempted some of
these cases —
translate_from_protein_range,_adjacent_protein_haplotype_range,_AA_TO_CODONS,MAX_RANGE_CODON_POSITIONS, and the non-single-allele branches inmapping_records._build_record— was removed. This is therefore a redesign, not aguard relaxation:
_build_recordcurrently routes every multivariant input (andmulti-AA single-
Alleledelins) to assay-level-only persistence.Scope note: this is about multivariant input. A single-position change whose codon
edits land on non-adjacent bases already correctly emits a genomic
g.[a;b]Haplotypeoutput — that ships and is not in scope here.
In-scope cases
Case 1: Adjacent protein haplotype / multi-AA delins
p.[Ala2Val;Pro3Gly]andp.Ala2_Pro3delinsVGare semantically identical — adjacentamino acid positions occupy adjacent coding positions with no nucleotide gap. These
normalize to a single contiguous coding allele (
c.4_9delinsXXXXXX), not a Haplotype,and need
reverse_translate_hgvs_pextended (or a dedicated range path) to accept rangeinputs. (Previously scoped out of this issue; folded back in now that single-variant-only
shipped.)
Case 2: Non-adjacent protein haplotype
p.[Ala2Val;Gly4Asp]— positions 2 and 4 with position 3 unchanged. Coding positions4–6 and 10–12 have a nucleotide gap at 7–9. Each member is reverse-translated
independently and the results are combined into a coding Haplotype. Unlike Case 1, the
output is a Haplotype of coding Alleles, not a flat Allele.
Case 3: Multi-member nucleotide haplotype
A genomic or coding haplotype assay variant (e.g.
g.[a;b;c]orc.[a;b;c]).Cross-level translation requires per-member c→g or g→c (deterministic,
member-independent) followed by c→p. The c→p step is only correct when no two members
share a codon — if two variants land in the same codon, independent
AssemblyMapper.c_to_pcalls give wrong results and the combined coding sequence must betranslated as a unit instead (detected by comparing
(position-1)//3codon indices).Open design question — bounding the fan-out
The previous implementation used an arbitrary 3-member cap. The team's direction is to
design the bound deliberately rather than ship an unexplained constant — a cap may still
be the answer, but as a conclusion, not a default. Inputs to the decision:
multiplicatively. Per-position candidate counts are typically 1–5 (valid edits to the
reference codon, not all codons encoding the target amino acid).
p.[%]and score sets with up to 7-memberprotein haplotypes — so any bound must be justified against the realistic distribution,
not just the common case.
Whatever bound is chosen, skipped inputs must be logged (INFO) and persisted at the native
assay level, and the API's
cross_level_translationstatus should reflect the skip.Implementation
Translator (
translate.py)adjacent-haplotype change).
translate_from_protein_haplotype(members, transcript)— per-member reversetranslation, product of candidates, Haplotype assembly per product (Case 2).
translate_from_nucleotide_haplotype(members, transcript)— per-member c→g / g→c lift;c→p skipped if any two members share a codon (Case 3).
translate_other_levelsbased on whether the input is a single Allele, amulti-AA Allele, or a Haplotype.
_build_record(mapping_records.py)assay_is_single_allelerouting with dispatch to the newpaths, subject to the chosen bound.
translation_attempted=Trueon the draft for inputs that are now translated (drivesthe API
cross_level_translationstatus).Acceptance criteria
coding allele + genomic projection.
(protein omitted) when any two members share a codon.
a
skippedcross_level_translationstatus.2-member nucleotide haplotype (non-sharing codons), 2-member nucleotide haplotype
(sharing a codon → protein skipped), and an over-bound input (native level only).