Skip to content

[Train]: add longcat & flux full scripts#1171

Open
Watebear wants to merge 1 commit into
mainfrom
train
Open

[Train]: add longcat & flux full scripts#1171
Watebear wants to merge 1 commit into
mainfrom
train

Conversation

@Watebear

Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces training configurations and run scripts for flux2_klein and longcat_image models, alongside refactoring model loaders to extract load_transformer methods. Feedback highlights critical issues: the newly added models lack the required load_full_weights_for_resume method, which will cause crashes when resuming training. Additionally, the configuration files contain absolute local paths that hinder portability, and the run scripts should allow overriding CUDA_VISIBLE_DEVICES for better flexibility. Finally, a copy-paste error in run_longcat_image_t2i_lora.sh needs to be corrected as the GPU allocation does not match the single-process configuration.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +39 to +41
def load_transformer(self, model_path=None):
model_path = model_path or self.config["model"]["pretrained_model_name_or_path"]
return Flux2Transformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The BaseModel class defines load_full_weights_for_resume as a required method for resuming full training, raising NotImplementedError if not implemented. Since Flux2KleinModel does not implement this method, attempting to resume full training will crash. Implementing it using the newly parameterized load_transformer method resolves this issue.

Suggested change
def load_transformer(self, model_path=None):
model_path = model_path or self.config["model"]["pretrained_model_name_or_path"]
return Flux2Transformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype)
def load_transformer(self, model_path=None):
model_path = model_path or self.config["model"]["pretrained_model_name_or_path"]
return Flux2Transformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype)
def load_full_weights_for_resume(self, resume_ckpt_path):
self.transformer = self.load_transformer(resume_ckpt_path)

Comment on lines +40 to +46
def load_transformer(self, model_path=None):
model_path = model_path or self.config["model"]["pretrained_model_name_or_path"]
transformer = LongCatImageTransformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype)
attention_backend = self.config["model"].get("attention_backend", None)
if attention_backend is not None:
self.transformer.set_attention_backend(attention_backend)
transformer.set_attention_backend(attention_backend)
return transformer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The BaseModel class defines load_full_weights_for_resume as a required method for resuming full training. Since LongCatImageModel does not implement this method, attempting to resume full training will crash. Implementing it using the newly parameterized load_transformer method resolves this issue.

Suggested change
def load_transformer(self, model_path=None):
model_path = model_path or self.config["model"]["pretrained_model_name_or_path"]
transformer = LongCatImageTransformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype)
attention_backend = self.config["model"].get("attention_backend", None)
if attention_backend is not None:
self.transformer.set_attention_backend(attention_backend)
transformer.set_attention_backend(attention_backend)
return transformer
def load_transformer(self, model_path=None):
model_path = model_path or self.config["model"]["pretrained_model_name_or_path"]
transformer = LongCatImageTransformer2DModel.from_pretrained(model_path, subfolder="transformer").to(self.device, dtype=self.running_dtype)
attention_backend = self.config["model"].get("attention_backend", None)
if attention_backend is not None:
transformer.set_attention_backend(attention_backend)
return transformer
def load_full_weights_for_resume(self, resume_ckpt_path):
self.transformer = self.load_transformer(resume_ckpt_path)

@@ -0,0 +1,92 @@
model:
name: flux2_klein
pretrained_model_name_or_path: /data/nvme1/models/FLUX.2-Klein-9B

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid committing absolute local paths (e.g., /data/nvme1/...) to the repository as they are environment-specific and break portability. Please use placeholder paths like /path/to/... instead.

    pretrained_model_name_or_path: /path/to/FLUX.2-Klein-9B

@@ -0,0 +1,95 @@
model:
name: longcat_image
pretrained_model_name_or_path: /data/nvme1/models/meituan-longcat/LongCat-Image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid committing absolute local paths (e.g., /data/nvme1/...) to the repository as they are environment-specific and break portability. Please use placeholder paths like /path/to/... instead.

    pretrained_model_name_or_path: /path/to/meituan-longcat/LongCat-Image

model:
name: qwen_image
pretrained_model_name_or_path: /path/to/Qwen/Qwen-Image-2512
pretrained_model_name_or_path: /data/nvme1/wushuo/hf_models/models/Qwen-Image-2512

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The configuration file has been updated with absolute paths specific to a local environment (/data/nvme1/...). This reduces the portability of the configuration and can cause immediate failures for other users. It is highly recommended to use placeholder paths (e.g., /path/to/...) as was done previously, allowing users to configure their own paths.

    pretrained_model_name_or_path: /path/to/Qwen/Qwen-Image-2512

@@ -0,0 +1,9 @@
#!/bin/bash
# full parameters train use fsdp2 by default
export CUDA_VISIBLE_DEVICES=0,1,2,3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Allowing the user to override CUDA_VISIBLE_DEVICES via environment variables makes the script much more flexible for multi-user or multi-GPU environments.

Suggested change
export CUDA_VISIBLE_DEVICES=0,1,2,3
export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0,1,2,3}

@@ -0,0 +1,8 @@
#!/bin/bash

export CUDA_VISIBLE_DEVICES=2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Allowing the user to override CUDA_VISIBLE_DEVICES via environment variables makes the script much more flexible.

Suggested change
export CUDA_VISIBLE_DEVICES=2
export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}

@@ -0,0 +1,9 @@
#!/bin/bash
# full parameters train use fsdp2 by default
export CUDA_VISIBLE_DEVICES=4,5,6,7

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Allowing the user to override CUDA_VISIBLE_DEVICES via environment variables makes the script much more flexible for multi-user or multi-GPU environments.

Suggested change
export CUDA_VISIBLE_DEVICES=4,5,6,7
export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-4,5,6,7}

Comment on lines +2 to +4
# full parameters train use fsdp2 by default
export CUDA_VISIBLE_DEVICES=4,5,6,7
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a couple of issues in this script:

  1. The comment on line 2 refers to 'full parameters train use fsdp2 by default', but this is a LoRA training script.
  2. CUDA_VISIBLE_DEVICES is set to 4 GPUs (4,5,6,7), but --nproc_per_node is set to 1. This is likely a copy-paste error from the full training script. It should target a single GPU by default, and ideally allow user override via ${CUDA_VISIBLE_DEVICES:-...}.
Suggested change
# full parameters train use fsdp2 by default
export CUDA_VISIBLE_DEVICES=4,5,6,7
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-4}
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant