Skip to content

Add data organization extension and restructure Data Efficacy documentation#20

Merged
dialogueeeeee merged 12 commits into
mainfrom
add-data-ordering
May 24, 2026
Merged

Add data organization extension and restructure Data Efficacy documentation#20
dialogueeeeee merged 12 commits into
mainfrom
add-data-ordering

Conversation

@dialogueeeeee
Copy link
Copy Markdown
Collaborator

Summary

This PR updates the DELT repository into a scalable Data Efficacy codebase and adds the ACL 2026 follow-up work, Demystifying Data Organization for Enhanced LLM Training.

Main changes:

  • Adds new data ordering methods for the ACL 2026 data organization work.
  • Keeps the original DELT scoring, selection, training, and evaluation pipeline intact.
  • Reorganizes documentation into paper-specific pages under docs/.
  • Updates the root README to introduce Data Efficacy, list supported works, and provide the shared pipeline usage.

Validation

  • Checked markdown links locally.
  • Checked whitespace with git diff --check.
  • Full pre-training, SFT, and evaluation experiments require the DELT GPU environment.

@dialogueeeeee
Copy link
Copy Markdown
Collaborator Author

@microsoft-github-policy-service agree company="Microsoft"

@dialogueeeeee dialogueeeeee merged commit 0477543 into main May 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants