Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🔇 Additional comments (6)
📝 WalkthroughWalkthroughThe pull request adds COCO annotation import and export support to the DagsHub data engine. Changes include updating the annotation importer to handle COCO format, adding a new Changes
Sequence DiagramsequenceDiagram
participant User
participant QueryResult
participant FieldResolver
participant AnnotationCollector
participant ImageDownloader
participant COCOExporter
participant FileSystem
User->>QueryResult: export_as_coco(download_dir, annotation_field, ...)
QueryResult->>FieldResolver: _resolve_annotation_field(annotation_field)
FieldResolver-->>QueryResult: resolved_field_name
QueryResult->>AnnotationCollector: Collect all annotations from resolved field
AnnotationCollector-->>QueryResult: annotations list
alt annotations exist
QueryResult->>ImageDownloader: Download images to download_dir/data
ImageDownloader-->>QueryResult: Downloaded image paths
QueryResult->>COCOExporter: export_to_coco_file(annotations, images, categories)
COCOExporter->>FileSystem: Write annotations.json
FileSystem-->>COCOExporter: Write confirmation
COCOExporter-->>QueryResult: Output path
QueryResult-->>User: Return Path to COCO annotations
else no annotations
QueryResult-->>User: Raise RuntimeError
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dagshub/data_engine/annotation/importer.py (1)
83-92:⚠️ Potential issue | 🟡 MinorAdd else clause to handle unsupported annotation types.
If
annotations_typeis not one of the supported values,annotation_dictremains unbound, causing anUnboundLocalErroron line 92.Proposed fix
elif self.annotations_type == "coco": annotation_dict, _ = load_coco_from_file(annotations_file) + else: + raise ValueError(f"Unsupported annotation type: {self.annotations_type}") return annotation_dict
♻️ Duplicate comments (1)
dagshub/__init__.py (1)
1-1:⚠️ Potential issue | 🔴 CriticalVersion changes should not be included in PRs.
The version bump has been flagged previously as a blocker issue. Version changes should be handled separately from feature PRs, typically during the release process.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 13ca3252-8076-4a28-a78a-8200f0235dc6
📒 Files selected for processing (9)
dagshub/__init__.pydagshub/auth/token_auth.pydagshub/data_engine/annotation/importer.pydagshub/data_engine/annotation/metadata.pydagshub/data_engine/model/query_result.pysetup.pytests/data_engine/annotation_import/test_coco.pytests/data_engine/conftest.pytests/mocks/repo_api.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
- GitHub Check: build (3.9)
- GitHub Check: build (3.13)
- GitHub Check: build (3.12)
🔇 Additional comments (11)
dagshub/auth/token_auth.py (1)
40-40: Readability improvement with no behavior change.Line 40 is an equivalent condition and reads more idiomatically in Python. No concerns here.
dagshub/data_engine/annotation/metadata.py (2)
284-284: Move import to module level.The import should be at the top of the file with other imports, not inside the function.
Proposed fix
Add to imports at top of file:
from dagshub_annotation_converter.converters.coco import load_coco_from_json_stringThen remove line 284.
274-294: LGTM on method implementation.The method correctly:
- Parses COCO JSON via the converter
- Rewrites all annotation filenames to the datapoint's path (expected behavior when adding annotations to a specific datapoint)
- Extends the existing annotations list
- Persists via
_update_datapoint()The filename rewriting is intentional here since you're associating external annotations with a specific datapoint.
tests/data_engine/annotation_import/test_coco.py (2)
20-23: Consider moving mock to shared fixture.This
mock_source_prefixfixture should be integrated into_create_mock_datasource()inconftest.pyso it works for all datasource fixture users, not just this test module.
1-218: Good test coverage for COCO functionality.The tests comprehensively cover:
- Import from disk and error handling for missing files
- Label Studio task conversion
add_coco_annotationfilename rewriting_resolve_annotation_fieldedge cases (explicit, auto, no fields, alphabetical ordering)- Export with various configurations (bbox coordinates, empty annotations error, custom classes, custom filename, multiple datapoints)
The helper functions (
_make_coco_json,_write_coco,_make_image_bbox,_make_qr) are well-structured and reusable.tests/mocks/repo_api.py (1)
116-118: LGTM!The mock
idproperty correctly mirrors the realRepoAPI.idsignature with a constant return value appropriate for test scenarios.tests/data_engine/conftest.py (1)
8-8: LGTM!Explicitly setting
source_type = DatasourceType.REPOSITORYensures the mocked datasource state correctly exercises REPOSITORY-specific code paths (e.g.,path_parts()revision extraction), which is necessary for the new COCO annotation tests.Also applies to: 29-29
dagshub/data_engine/model/query_result.py (3)
783-791: LGTM!Good refactoring to centralize annotation field resolution. The alphabetical sorting provides deterministic selection when multiple annotation fields exist, and the error message is clear when no fields are found.
871-917: LGTM on overall export_as_coco structure.The method follows the established pattern from
export_as_yolo: resolves annotation field, collects annotations, prefixes filenames with source prefix, downloads images, and exports to the target format. Error handling for empty annotations is appropriate.
901-903: No changes required. The code correctly assignsdict(classes)tocontext.categories, which matches the documented COCO format specification of{id: name}. The importedCategoriesclass is from the YOLO module and should not be used withCocoContext, as it is format-specific to YOLO, not COCO.> Likely an incorrect or invalid review comment.dagshub/data_engine/annotation/importer.py (1)
160-163: Defensive handling for None filenames.This change prevents
TypeErrorwhenremap_funcis called withNone. The fallback tonew_filenameis reasonable since the annotation belongs to the image keyed by that filename in the dictionary.However, a
Nonefilename from the converter could indicate a bug upstream. Consider logging when this fallback is triggered to aid debugging.
| # FIXME: roll back to main after merging | ||
| # "dagshub-annotation-converter>=0.1.12", | ||
| "dagshub-annotation-converter @ " | ||
| + "git+https://github.com/DagsHub/" | ||
| + "dagshub-annotation-converter@coco_converter#egg=dagshub-annotation-converter", |
There was a problem hiding this comment.
VCS branch dependency is not suitable for release.
Pinning to a Git branch (coco_converter) in install_requires will break reproducibility and may fail for end users if the branch is rebased, force-pushed, or deleted. This dependency pattern is acceptable for local development but should not be merged to main.
Ensure the upstream dagshub-annotation-converter PR is merged and a versioned release is published before merging this PR, then revert to a version-pinned dependency (e.g., dagshub-annotation-converter>=0.1.13).
Based on changes to dagshub-annotation-converter to the same effect: DagsHub/dagshub-annotation-converter#26
Implementation Details
The PR extends annotation support through a multi-layered approach:
Importer Module Enhancements (
importer.py):AnnotationTypeliteral to include "coco",Metadata Annotations Support (
metadata.py):add_coco_annotationmethod to load COCO-format JSON and integrate annotations into metadataQuery Result Export API (
query_result.py):export_as_coco()- generates COCO JSON with optional class mapping and custom output filename_resolve_annotation_fieldto auto-select available annotation field with alphabetical fallbackTest Coverage:
test_coco.py(218 lines): Import/export flows, filename rewriting, field resolution, class mapping, output customizationCode Quality and Testing Infrastructure:
ruff.tomllinting configuration with comprehensive rule set including:__init__.pyandconftest.pyDatasourceType.REPOSITORYassignment in test fixtures (tests/data_engine/conftest.py)idproperty toMockRepoAPItest utilitytoken_auth.py(no behavioral change - logical equivalence in boolean expression)Documentation Note: Per PR comments, the supported formats list at dagshub.com/docs/use_cases/annotation/#supported-formats-for-annotation_imports requires manual update to reflect COCO BBOX, and COCO Segmentation support.