Skip to content

Pull requests: datajuicer/data-juicer

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add normal map op, optimal flow op, and universal segmentation op for videos. dj:multimodal issues/PRs about multimodal data processing dj:op issues/PRs about some specific OPs enhancement New feature or request
#970 opened Apr 27, 2026 by Qirui-jiao Collaborator Loading…
[WIP] feat(agent): training-ready data recipes, learnable-value mappers, cross-model similarity agent related to agent dj:op issues/PRs about some specific OPs dj:post-tuning issues/PRs about post-tuning scenarios
#969 opened Apr 20, 2026 by yxdyc Collaborator Loading…
[WIP] feat: add persistent custom operator registry
#968 opened Apr 15, 2026 by cmgzn Collaborator Loading…
Add face keypoints/animal pose ops & Extend ops for frame-sequence input dj:op issues/PRs about some specific OPs enhancement New feature or request
#966 opened Apr 14, 2026 by Qirui-jiao Collaborator Loading…
refactor: declarative schema for configuration
#963 opened Apr 8, 2026 by cmgzn Collaborator Loading…
better parallelism in partitioned ray executor
#945 opened Mar 17, 2026 by cyruszhang Collaborator Draft
[WIP] feat: Integrate ElasticJuicer Core Modules
#934 opened Mar 11, 2026 by fengrui-z Collaborator Loading…
1 of 4 tasks
Feat: update vla ops and add val pipeline demo
#931 opened Mar 6, 2026 by Cathy0908 Collaborator Loading…
[WIP] arXiv/PDF to Markdown mappers + dj-op one-shot runner dj:op issues/PRs about some specific OPs
#917 opened Feb 14, 2026 by yxdyc Collaborator Loading…
[WIP] Multi-branch executor dj:core issues/PRs about the core functions of Data-Juicer enhancement New feature or request
#916 opened Feb 13, 2026 by yxdyc Collaborator Loading…
[WIP] feat: Add combined_logical_filter operator with AND/OR support dj:op issues/PRs about some specific OPs
#914 opened Feb 13, 2026 by yxdyc Collaborator Loading…
Feat: Support paimon, iceberg, hudi, delta lake, hdfs data source.
#911 opened Feb 11, 2026 by Dludora Collaborator Loading…
[WIP] Feat: Add RayImageBTSMinhashDeduplicator
#897 opened Jan 29, 2026 by Dludora Collaborator Loading…
Depth seg new op dj:op issues/PRs about some specific OPs
#862 opened Dec 22, 2025 by archernsy Loading…
[NewOp] Add group_diversity_filter op
#745 opened Jul 22, 2025 by lingzhq Collaborator Loading…
Add lidar object segmentation op
#736 opened Jul 14, 2025 by Qirui-jiao Collaborator Loading…
[WIP] add lidar object detection op
#721 opened Jun 26, 2025 by Cathy0908 Collaborator Loading…
Optimization framework dj:core issues/PRs about the core functions of Data-Juicer dj:efficiency regarding to efficiency issues and enhancements
#702 opened Jun 13, 2025 by cyruszhang Collaborator Draft
[NewOp] Add domain_diversity_selector based on DaaR principles
#699 opened Jun 12, 2025 by lingzhq Collaborator Loading…
[WIP] deduping benchmark suite
#607 opened Mar 4, 2025 by cyruszhang Collaborator Loading…
ProTip! Filter pull requests by the default branch with base:main.