FlagScale/docs/CHANGELOG.md at main · Teeeio/FlagScale

Change History

[2026/01] Released v1.0.0-alpha.0:
- Major updates: Refactored the code base by moving hardware-specific (multi-chip) support into plugin repositories such as TransformerEngine-FL and vllm-plugin-FL. These plugins build on top of FlagOS, a unified open-source AI system software stack. We also re-initialized the Git commit history to reduce repository size.
- Compatibility caveats: If you are using or upgrading from a version earlier than v1.0.0-alpha.0, please use the main-legacy branch. It will continue to receive critical bug fixes and minor updates for a period of time.
[2025/09] Released v0.9.0:
- Training & fine-tuning: Added LoRA for efficient fine-tuning, improved the autotuner for cross-chip heterogeneous training, and enabled distributed RWKV training.
- Inference & serving: Introduced DiffusionEngine for FLUX.1-dev, Qwen-Image, and Wan2.1-T2V, supporting multi-model automatic orchestration and dynamic scaling.
- Embodied AI: Full lifecycle support for Robobrain, Robotics, and PI0, plus semantic retrieval for MCP-based skills for RoboOS.
- Elasticity & fault tolerance: Detect task status automatically (errors, hangs, etc.) and periodically record them.
- Hardware & system: Broader chip support, upgraded patch mechanism with file-level diffs, and enhanced CICD for different chips.
[2025/04] Released v0.8.0:
- Introduced a new flexible and robust multi-backend mechanism and updated vendor adaptation methods.
- Enabled heterogeneous prefill-decoding disaggregation across vendor chips within a single instance via FlagCX (beta).
- Upgraded DeepSeek-V3 pre-training with the new Megatron-LM and added heterogeneous pre-training across different chips for MoE models like DeepSeek-V3.
[2025/02] Released v0.6.5:
- Added support for DeepSeek-V3 distributed pre-training (beta) and DeepSeek-V3/R1 serving across multiple chips.
- Introduced an auto-tuning feature for serving and a new CLI feature for one-click deployment.
- Enhanced the CI/CD system to support more chips and integrated the workflow of FlagRelease.
[2024/11] Released v0.6.0:
- Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.
- Added the full support for LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.
- Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.
- Implemented the auto-tuning feature and enhanced the CI/CD system.
[2024/4] Released v0.3:
- Achieved heterogeneous hybrid training of the Aquila2-70B-Expr model on a cluster using both NVIDIA and Iluvatar chips. Adapted the Aquila2 series to AI chips from six different manufacturers.
[2023/11] Released v0.2:
- Introduced training support for Aquila2-70B-Expr, enabling heterogeneous training across chips with the same or compatible architectures.
[2023/10] Released v0.1:
- Supported Aquila models with optimized training schemes for Aquila2-7B and Aquila2-34B, including parallel strategies, optimizations, and hyper-parameter settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change History

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Change History