feat: amend new fields to interval schema#317
feat: amend new fields to interval schema#317project-defiant wants to merge 5 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds two new fields to the Interval schema to enhance genomic interval data representation: biosampleFromSourceId to capture the original biosample identifier from data sources, and qualityControls to store quality control flags.
Changes:
- Added
biosampleFromSourceIdfield to store the original biosample identifier as defined by datasource providers - Added
qualityControlsfield to track quality control flags for interval data - Updated both the entity model and GraphQL schema definitions to maintain consistency
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| app/models/entities/Intervals.scala | Added two new required fields to the Interval case class: biosampleFromSourceId (String) and qualityControls (Vector[String]) |
| app/models/gql/Objects.scala | Added corresponding GraphQL DocumentField definitions for the new fields with appropriate descriptions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
app/models/entities/EnhancerToGene.scala:27
- The PR description states this change only adds two new fields (biosampleFromSourceId and qualityControls) to the Intervals dataset. However, the actual changes include a significant refactoring that renames "Interval" to "EnhancerToGene" throughout the codebase. This is a breaking API change that should be clearly documented in the PR description. If this refactoring is intentional, please update the PR description to reflect the full scope of changes. If it's unintentional, consider splitting the new field additions from the renaming refactoring into separate PRs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case class ClickhouseSettings( | ||
| defaultDatabaseName: String, | ||
| intervals: DbTableSettings, | ||
| enhancerToGene: DbTableSettings, |
There was a problem hiding this comment.
This configuration field rename from "intervals" to "enhancerToGene" will require updating any existing application.conf files in production environments. This should be documented in the PR description or migration guide, along with any necessary database table renaming if the table name changed from "intervals" to "enhancer_to_gene".
| case class EnhancerToGeneQuery(chromosome: String, | ||
| start: Int, | ||
| end: Int, | ||
| tableName: String, | ||
| offset: Int, | ||
| size: Int | ||
| ) extends Queryable |
There was a problem hiding this comment.
The file name "IntervalsQuery.scala" does not match the renamed case class "EnhancerToGeneQuery". Following the established codebase convention where query files are named after their contained case classes (e.g., TargetsQuery.scala contains TargetsQuery, QW2V.scala contains QW2V), this file should be renamed to "EnhancerToGeneQuery.scala".
| val tableName = getTableWithPrefixOrDefault(defaultOTSettings.clickhouse.enhancerToGene.name) | ||
| val page = pagination.getOrElse(Pagination.mkDefault).offsetLimit | ||
| val intervalsQuery = IntervalsQuery( | ||
| val intervalsQuery = EnhancerToGeneQuery( |
There was a problem hiding this comment.
The variable name "intervalsQuery" is inconsistent with the renamed case class "EnhancerToGeneQuery". For better code clarity and consistency with the refactoring, this variable should be renamed to "enhancerToGeneQuery".
| @@ -998,11 +998,11 @@ class Backend @Inject() (implicit | |||
| logger.info(s"Total intervals found: $total") | |||
There was a problem hiding this comment.
The log message "Total intervals found" uses outdated terminology. It should be updated to reflect the new naming convention, such as "Total enhancer to gene mappings found" or similar, to be consistent with the refactoring throughout the codebase.
| logger.info(s"Total intervals found: $total") | |
| logger.info(s"Total enhancer to gene mappings found: $total") |
| .executeQuery[Interval, Query](intervalsQuery.query) | ||
| .map(intervals => Intervals(total, intervals)) | ||
| .executeQuery[EnhancerToGene, Query](intervalsQuery.query) | ||
| .map(intervals => EnhancerToGenes(total, intervals)) |
There was a problem hiding this comment.
The lambda parameter "intervals" should be renamed to "enhancerToGenes" to be consistent with the refactoring and to better reflect what the data represents. This improves code readability and maintains naming consistency throughout the method.
| .map(intervals => EnhancerToGenes(total, intervals)) | |
| .map(enhancerToGenes => EnhancerToGenes(total, enhancerToGenes)) |
| "enhancerToGenes", | ||
| enhancerToGenesImp, | ||
| description = Some( | ||
| "Regulatory enhancer/promoter regions to gene (target) predictions overlapping with this variant's location. These intervals link regulatory regions to target genes based on experimental data for specific tissues or cell types." |
There was a problem hiding this comment.
The description still uses the old terminology "These intervals link" instead of reflecting the new naming convention. Consider updating to be more consistent with the refactoring, such as "These enhancer-to-gene mappings link" or keeping "intervals" as a more general descriptive term while using "enhancerToGenes" for the field name. The current mix may cause confusion.
| "Regulatory enhancer/promoter regions to gene (target) predictions overlapping with this variant's location. These intervals link regulatory regions to target genes based on experimental data for specific tissues or cell types." | |
| "Enhancer-to-gene mappings that link regulatory enhancer/promoter regions to target genes when they overlap this variant's location. These predictions are derived from experimental data for specific tissues or cell types." |
| @@ -1491,6 +1491,10 @@ object Objects extends OTLogging { | |||
| ), | |||
| DocumentField("studyId", "Identifier of the study providing the experimental data"), | |||
| DocumentField("biosampleName", "Name of the biosample where the interval was identified"), | |||
There was a problem hiding this comment.
The field description for "biosampleName" still refers to "interval" in the phrase "where the interval was identified". For consistency with the refactoring, this should be updated to refer to "regulatory region" or "enhancer/promoter region" instead of "interval".
| DocumentField("biosampleName", "Name of the biosample where the interval was identified"), | |
| DocumentField("biosampleName", | |
| "Name of the biosample where the regulatory region was identified" | |
| ), |
| ctx.value.position, | ||
| ctx.value.position, | ||
| ctx.arg(pageArg) | ||
| ) |
There was a problem hiding this comment.
This PR introduces a breaking API change by renaming the GraphQL field from "intervals" to "enhancerToGenes". This will break existing GraphQL queries that use the old field name. Consider: 1) Providing a deprecation period where both field names are supported, or 2) Documenting this as a breaking change in the PR description with migration instructions for API consumers, or 3) Bumping the API version if applicable.
| ) | |
| ) | |
| ), | |
| Field( | |
| "intervals", | |
| enhancerToGenesImp, | |
| description = Some( | |
| "Deprecated: use `enhancerToGenes` instead. Regulatory enhancer/promoter regions to gene (target) predictions overlapping with this variant's location." | |
| ), | |
| arguments = pageArg :: Nil, | |
| complexity = Some(complexityCalculator(pageArg)), | |
| deprecationReason = Some("Use `enhancerToGenes` instead."), | |
| resolve = ctx => | |
| ctx.ctx.getEnhancerToGenes(ctx.value.chromosome, | |
| ctx.value.position, | |
| ctx.value.position, | |
| ctx.arg(pageArg) | |
| ) |
Context
This PR adds the new fields
biosampleFromSourceIdqualityControlsto the Intervals dataset.