support mtp for gemma4 by WANDY666 · Pull Request #1316 · ModelTC/LightLLM

WANDY666 · 2026-05-22T08:15:24Z

No description provided.

…rt_gemma4

…into support_gemma4

…rt_gemma4

gemini-code-assist

Code Review

This pull request introduces comprehensive support for the Gemma-4 model family, including multimodal vision capabilities and Multi-Token Prediction (MTP) assistant models. Key technical additions include heterogeneous attention mechanisms for sliding window and full attention layers, tanh-approximate GELU activations in MoE kernels, and a specialized eagle_frozen_kv MTP mode. The implementation also features a new reasoning parser for Gemma-4's Harmony-like format and updates to various Triton kernels. Feedback on the code changes suggests adopting more idiomatic PyTorch advanced indexing for row selection in the MTP post-layer inference and improving robustness by replacing bare except blocks with except Exception in configuration utilities.

gemini-code-assist · 2026-05-22T08:17:21Z

+            token_num, num_selected, H
+        )
+        # Sparse logits: dot product per token vs its selected rows.


Using advanced indexing is more idiomatic and readable than index_select followed by a view when selecting rows from a weight matrix. PyTorch's advanced indexing handles this pattern efficiently.

Suggested change

token_num, num_selected, H

)

# Sparse logits: dot product per token vs its selected rows.

selected_embeddings = lm_head_w[selected_vocab]

gemini-code-assist · 2026-05-22T08:17:21Z

+            return [eos_token_id]
+        elif isinstance(eos_token_id, list):
+            return list(eos_token_id)
+    except:


Avoid using bare except: as it can catch unexpected errors like KeyboardInterrupt or SystemExit, making debugging difficult. Use except Exception: instead.

Suggested change

except:

except Exception:

gemini-code-assist · 2026-05-22T08:17:22Z

+        if model_type in ["gemma4"]:
+            logger.info("Gemma4 uses tanh-approximate-gelu for FFN")
+            return True
+    except:


Avoid using bare except: as it can catch unexpected errors. Use except Exception: instead to follow best practices for error handling.

Suggested change

except:

except Exception:

WANDY666 and others added 30 commits April 30, 2026 02:47

support 31B

21c3eeb

fix

99b790c

Merge branch 'main' of https://github.com/ModelTC/LightLLM into suppo…

4c30c73

…rt_gemma4

support moe

15a5379

support e4b (PLE and shared_kv)

83f4983

support visual module

d969a5f

optimize sliding window

08f066d

fix

7678de8

simplify

63c658a

minor improvements

300e577

fix

50822f0

fix attention cuda graph

b4b13cc

fused gelu gate up

f19074b

add out_dtype

5b61450

minor improvements

c0ca212

fix eos_token_ids

9499a00

for HF format

de7e220

Merge branch 'main' of https://github.com/ModelTC/LightLLM into suppo…

bfc59ff

…rt_gemma4

fix window_size

109d27c

fix window_size

2ea258e

fix

b297af5

add reasoning_parser for gemma4

7a81e85

[fix]ple support cudagraph

d619534

fix PLE illegal memory access

c2578c0

support sliding_window_right

d744cbc

fix notes

05a0db8

tune in H200

6f1bd2e

fix

90643db

fix

a2b74ab

fix

e606e05

hiworldwzj and others added 25 commits May 20, 2026 02:32

fix

afa0194

fix

46ce6af

fix

0188c10

fix

393ec69

fix

e96c2b7

fix

f806326

fix

c5b2b81

fix

3bd46d7

Merge branch 'support_gemma4' of https://github.com/ModelTC/LightLLM …

91051f0

…into support_gemma4

fix

fb75045

fix

7c664c3

fix

0d35e8b

fix

74a4b1f

fix

d2df0a0

fix

3491641

fix

c8812f2

fix

8f160b5

fix

ee92fee

fix

131a163

fix

6d7729f

fix

87da477

Merge branch 'main' of https://github.com/ModelTC/LightLLM into suppo…

819497c

…rt_gemma4

format

c57e062

finish

6ebf9db

fix

e682f9b

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

WANDY666 added 4 commits May 22, 2026 08:41

Merge https://github.com/ModelTC/LightLLM into gemma4_mtp

c28f085

format

6099413

format

ba47045

format

33d7ceb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support mtp for gemma4#1316

support mtp for gemma4#1316
WANDY666 wants to merge 60 commits into
mainfrom
gemma4_mtp

WANDY666 commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WANDY666 commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants