Native Qwen3-Reranker CausalLM support in RerankCalculatorOV by ambeckley · Pull Request #4063 · openvinotoolkit/model_server

ambeckley · 2026-03-17T17:25:59Z

Summary

Adds native support for Qwen3-Reranker models (0.6B, 8B, all sizes) using CausalLM architecture, exported with --task text-generation
Auto-detects Qwen3 via model_type in config.json — no changes needed for existing reranker models
Applies server-side chat template formatting and CausalLM graph postprocessing (yes/no logit extraction via PrePostProcessor), so clients use the standard /v3/rerank API with no workarounds

Motivation

Qwen3-Reranker models use CausalLM architecture, not cross-encoder text-classification. The current workaround (#3578) requires community-modified seq-cls model. This PR enables all official Qwen3-Reranker model sizes to work natively through OVMS without client modifications. It is still backwards compatibility with tomaarsen/Qwen3-Reranker-*-seq-cls models.

Changes

src/rerank/rerank_servable.hpp

Added isQwen3, hasPositionIds, hasBeamIdx detection flags
Override applyPrePostProcessing() to:
- Parse config.json for model_type: "qwen3"
- Detect position_ids and beam_idx model inputs
- Check output dimensionality (3D = CausalLM, 2D = text-classification with warning)
- Look up yes/no token IDs via tokenizer
- Build PrePostProcessor graph: Slice last token → Squeeze → Gather yes/no logits → Subtract (yes - no), producing [batch, 1] output compatible with existing sigmoid scoring

src/rerank/rerank_calculator_ov.cc

Added Qwen3 chat template input formatting in PrepareInputsForRerankModel()
Compute position_ids from attention mask for CausalLM models
Zero-fill beam_idx for CausalLM models
Guard token_type_ids creation with !isQwen3 check (CausalLM uses position_ids, not token_type_ids, as 3rd input)

Model Export

Models must be exported with --task text-generation (not the default text-classification):

optimum-cli export openvino --model Qwen/Qwen3-Reranker-8B --task text-generation --weight-format int8 Qwen3-Reranker-8B-causal-int8-ov

The default text-classification export produces a model with an untrained random classification head that outputs garbage scores.

Test plan

Tested with Qwen3-Reranker-0.6B (int8) on CPU — correct relevance scores
Tested with Qwen3-Reranker-8B (int8) on CPU and Intel Arc GPU — correct relevance scores
Verified existing non-Qwen3 reranker models are unaffected (isQwen3 = false, no code path changes)
CI/unit tests (to be added if maintainers request)

Qwen3-Reranker models use CausalLM architecture instead of cross-encoder text-classification, requiring different input formatting and output postprocessing. This enables OVMS to natively serve Qwen3-Reranker models (all sizes: 0.6B, 8B) exported with --task text-generation via the standard /v3/rerank API, with no client-side workarounds needed. Changes: - Auto-detect Qwen3 models via model_type in config.json - Apply server-side chat template formatting for query-document pairs - Add CausalLM graph postprocessing (Slice/Squeeze/Gather/Subtract) to extract yes/no logits from 3D output, producing scores compatible with existing sigmoid scoring - Handle CausalLM-specific inputs (position_ids, beam_idx) - Guard token_type_ids to avoid conflicts with CausalLM input layout - Warn if model was exported as text-classification (random head weights) Tested with Qwen3-Reranker-0.6B and Qwen3-Reranker-8B (int8) on CPU and Intel Arc GPU, producing correct relevance scores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Qwen3-Reranker CausalLM support in RerankCalculatorOV#4063

Native Qwen3-Reranker CausalLM support in RerankCalculatorOV#4063
ambeckley wants to merge 1 commit intoopenvinotoolkit:mainfrom
ambeckley:ambeckley/native-qwen3-reranker-support

ambeckley commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ambeckley commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Model Export

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ambeckley commented Mar 17, 2026 •

edited

Loading