Skip to content

Fix moderation filter checking wrong model in arena side-by-side views#3842

Open
Chessing234 wants to merge 1 commit intolm-sys:mainfrom
Chessing234:fix/moderation-filter-arena-bypass
Open

Fix moderation filter checking wrong model in arena side-by-side views#3842
Chessing234 wants to merge 1 commit intolm-sys:mainfrom
Chessing234:fix/moderation-filter-arena-bypass

Conversation

@Chessing234
Copy link
Copy Markdown

Summary

  • Fixes a copy-paste bug where all_conv_text_right read from states[0] (left model) instead of states[1] (right model) in two arena files
  • This meant the right-side model's conversation was never checked by the moderation filter, allowing content policy violations to bypass detection
  • The same bug was already fixed in gradio_block_arena_named.py (commit 34eca62) but missed in gradio_block_arena_anony.py and gradio_block_arena_vision_named.py

Fixes #3794

Changes

  • fastchat/serve/gradio_block_arena_anony.py line 310: states[0]states[1]
  • fastchat/serve/gradio_block_arena_vision_named.py line 245: states[0]states[1]

Test plan

  • Verify moderation filter triggers on right-side model content in anonymous arena
  • Verify moderation filter triggers on right-side model content in vision named arena
  • Confirm no regression in left-side model moderation

🤖 Generated with Claude Code

In both gradio_block_arena_anony.py and gradio_block_arena_vision_named.py,
`all_conv_text_right` was reading from `states[0]` instead of `states[1]`,
meaning the right-side model's conversation was never checked by the
moderation filter. This was already fixed in gradio_block_arena_named.py
(commit 34eca62) but missed in these two files.

Fixes lm-sys#3794

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Moderation Filter Bypass via Wrong State Index in Arena Side-by-Side Views

1 participant