Skip to content

Latest commit

 

History

History
79 lines (51 loc) · 3.97 KB

File metadata and controls

79 lines (51 loc) · 3.97 KB

Change History

  • [2026/01] Released v1.0.0-alpha.0:

    • Major updates: Refactored the code base by moving hardware-specific (multi-chip) support into plugin repositories such as TransformerEngine-FL and vllm-plugin-FL. These plugins build on top of FlagOS, a unified open-source AI system software stack. We also re-initialized the Git commit history to reduce repository size.

    • Compatibility caveats: If you are using or upgrading from a version earlier than v1.0.0-alpha.0, please use the main-legacy branch. It will continue to receive critical bug fixes and minor updates for a period of time.

  • [2025/09] Released v0.9.0:

    • Training & fine-tuning: Added LoRA for efficient fine-tuning, improved the autotuner for cross-chip heterogeneous training, and enabled distributed RWKV training.

    • Inference & serving: Introduced DiffusionEngine for FLUX.1-dev, Qwen-Image, and Wan2.1-T2V, supporting multi-model automatic orchestration and dynamic scaling.

    • Embodied AI: Full lifecycle support for Robobrain, Robotics, and PI0, plus semantic retrieval for MCP-based skills for RoboOS.

    • Elasticity & fault tolerance: Detect task status automatically (errors, hangs, etc.) and periodically record them.

    • Hardware & system: Broader chip support, upgraded patch mechanism with file-level diffs, and enhanced CICD for different chips.

  • [2025/04] Released v0.8.0:

    • Introduced a new flexible and robust multi-backend mechanism and updated vendor adaptation methods.

    • Enabled heterogeneous prefill-decoding disaggregation across vendor chips within a single instance via FlagCX (beta).

    • Upgraded DeepSeek-V3 pre-training with the new Megatron-LM and added heterogeneous pre-training across different chips for MoE models like DeepSeek-V3.

  • [2025/02] Released v0.6.5:

    • Added support for DeepSeek-V3 distributed pre-training (beta) and DeepSeek-V3/R1 serving across multiple chips.

    • Introduced an auto-tuning feature for serving and a new CLI feature for one-click deployment.

    • Enhanced the CI/CD system to support more chips and integrated the workflow of FlagRelease.

  • [2024/11] Released v0.6.0:

    • Introduced general multi-dimensional heterogeneous parallelism and CPU-based communication between different chips.

    • Added the full support for LLaVA-OneVision, achieving SOTA results on the Infinity-MM dataset.

    • Open-sourced the optimized CFG implementation and accelerated the generation and understanding tasks for Emu3.

    • Implemented the auto-tuning feature and enhanced the CI/CD system.

  • [2024/4] Released v0.3:

    • Achieved heterogeneous hybrid training of the Aquila2-70B-Expr model on a cluster using both NVIDIA and Iluvatar chips. Adapted the Aquila2 series to AI chips from six different manufacturers.
  • [2023/11] Released v0.2:

    • Introduced training support for Aquila2-70B-Expr, enabling heterogeneous training across chips with the same or compatible architectures.
  • [2023/10] Released v0.1:

    • Supported Aquila models with optimized training schemes for Aquila2-7B and Aquila2-34B, including parallel strategies, optimizations, and hyper-parameter settings.