Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces training configurations and run scripts for flux2_klein and longcat_image models, alongside refactoring model loaders to extract load_transformer methods. Feedback highlights critical issues: the newly added models lack the required load_full_weights_for_resume method, which will cause crashes when resuming training. Additionally, the configuration files contain absolute local paths that hinder portability, and the run scripts should allow overriding CUDA_VISIBLE_DEVICES for better flexibility. Finally, a copy-paste error in run_longcat_image_t2i_lora.sh needs to be corrected as the GPU allocation does not match the single-process configuration.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def load_transformer(self, model_path=None): | ||
| model_path = model_path or self.config["model"]["pretrained_model_name_or_path"] | ||
| return Flux2Transformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype) |
There was a problem hiding this comment.
The BaseModel class defines load_full_weights_for_resume as a required method for resuming full training, raising NotImplementedError if not implemented. Since Flux2KleinModel does not implement this method, attempting to resume full training will crash. Implementing it using the newly parameterized load_transformer method resolves this issue.
| def load_transformer(self, model_path=None): | |
| model_path = model_path or self.config["model"]["pretrained_model_name_or_path"] | |
| return Flux2Transformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype) | |
| def load_transformer(self, model_path=None): | |
| model_path = model_path or self.config["model"]["pretrained_model_name_or_path"] | |
| return Flux2Transformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype) | |
| def load_full_weights_for_resume(self, resume_ckpt_path): | |
| self.transformer = self.load_transformer(resume_ckpt_path) |
| def load_transformer(self, model_path=None): | ||
| model_path = model_path or self.config["model"]["pretrained_model_name_or_path"] | ||
| transformer = LongCatImageTransformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype) | ||
| attention_backend = self.config["model"].get("attention_backend", None) | ||
| if attention_backend is not None: | ||
| self.transformer.set_attention_backend(attention_backend) | ||
| transformer.set_attention_backend(attention_backend) | ||
| return transformer |
There was a problem hiding this comment.
The BaseModel class defines load_full_weights_for_resume as a required method for resuming full training. Since LongCatImageModel does not implement this method, attempting to resume full training will crash. Implementing it using the newly parameterized load_transformer method resolves this issue.
| def load_transformer(self, model_path=None): | |
| model_path = model_path or self.config["model"]["pretrained_model_name_or_path"] | |
| transformer = LongCatImageTransformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype) | |
| attention_backend = self.config["model"].get("attention_backend", None) | |
| if attention_backend is not None: | |
| self.transformer.set_attention_backend(attention_backend) | |
| transformer.set_attention_backend(attention_backend) | |
| return transformer | |
| def load_transformer(self, model_path=None): | |
| model_path = model_path or self.config["model"]["pretrained_model_name_or_path"] | |
| transformer = LongCatImageTransformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype) | |
| attention_backend = self.config["model"].get("attention_backend", None) | |
| if attention_backend is not None: | |
| transformer.set_attention_backend(attention_backend) | |
| return transformer | |
| def load_full_weights_for_resume(self, resume_ckpt_path): | |
| self.transformer = self.load_transformer(resume_ckpt_path) |
| @@ -0,0 +1,92 @@ | |||
| model: | |||
| name: flux2_klein | |||
| pretrained_model_name_or_path: /data/nvme1/models/FLUX.2-Klein-9B | |||
| @@ -0,0 +1,95 @@ | |||
| model: | |||
| name: longcat_image | |||
| pretrained_model_name_or_path: /data/nvme1/models/meituan-longcat/LongCat-Image | |||
There was a problem hiding this comment.
| model: | ||
| name: qwen_image | ||
| pretrained_model_name_or_path: /path/to/Qwen/Qwen-Image-2512 | ||
| pretrained_model_name_or_path: /data/nvme1/wushuo/hf_models/models/Qwen-Image-2512 |
There was a problem hiding this comment.
The configuration file has been updated with absolute paths specific to a local environment (/data/nvme1/...). This reduces the portability of the configuration and can cause immediate failures for other users. It is highly recommended to use placeholder paths (e.g., /path/to/...) as was done previously, allowing users to configure their own paths.
pretrained_model_name_or_path: /path/to/Qwen/Qwen-Image-2512| @@ -0,0 +1,9 @@ | |||
| #!/bin/bash | |||
| # full parameters train use fsdp2 by default | |||
| export CUDA_VISIBLE_DEVICES=0,1,2,3 | |||
There was a problem hiding this comment.
| @@ -0,0 +1,8 @@ | |||
| #!/bin/bash | |||
|
|
|||
| export CUDA_VISIBLE_DEVICES=2 | |||
| @@ -0,0 +1,9 @@ | |||
| #!/bin/bash | |||
| # full parameters train use fsdp2 by default | |||
| export CUDA_VISIBLE_DEVICES=4,5,6,7 | |||
There was a problem hiding this comment.
| # full parameters train use fsdp2 by default | ||
| export CUDA_VISIBLE_DEVICES=4,5,6,7 | ||
| export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True |
There was a problem hiding this comment.
There are a couple of issues in this script:
- The comment on line 2 refers to 'full parameters train use fsdp2 by default', but this is a LoRA training script.
CUDA_VISIBLE_DEVICESis set to 4 GPUs (4,5,6,7), but--nproc_per_nodeis set to1. This is likely a copy-paste error from the full training script. It should target a single GPU by default, and ideally allow user override via${CUDA_VISIBLE_DEVICES:-...}.
| # full parameters train use fsdp2 by default | |
| export CUDA_VISIBLE_DEVICES=4,5,6,7 | |
| export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True | |
| export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-4} | |
| export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True |
No description provided.