Skip to content

Native Qwen3-Reranker CausalLM support in RerankCalculatorOV#4063

Open
ambeckley wants to merge 1 commit intoopenvinotoolkit:mainfrom
ambeckley:ambeckley/native-qwen3-reranker-support
Open

Native Qwen3-Reranker CausalLM support in RerankCalculatorOV#4063
ambeckley wants to merge 1 commit intoopenvinotoolkit:mainfrom
ambeckley:ambeckley/native-qwen3-reranker-support

Conversation

@ambeckley
Copy link

@ambeckley ambeckley commented Mar 17, 2026

Summary

  • Adds native support for Qwen3-Reranker models (0.6B, 8B, all sizes) using CausalLM architecture, exported with --task text-generation
  • Auto-detects Qwen3 via model_type in config.json — no changes needed for existing reranker models
  • Applies server-side chat template formatting and CausalLM graph postprocessing (yes/no logit extraction via PrePostProcessor), so clients use the standard /v3/rerank API with no workarounds

Motivation

Qwen3-Reranker models use CausalLM architecture, not cross-encoder text-classification. The current workaround (#3578) requires community-modified seq-cls model. This PR enables all official Qwen3-Reranker model sizes to work natively through OVMS without client modifications. It is still backwards compatibility with tomaarsen/Qwen3-Reranker-*-seq-cls models.

Changes

src/rerank/rerank_servable.hpp

  • Added isQwen3, hasPositionIds, hasBeamIdx detection flags
  • Override applyPrePostProcessing() to:
    • Parse config.json for model_type: "qwen3"
    • Detect position_ids and beam_idx model inputs
    • Check output dimensionality (3D = CausalLM, 2D = text-classification with warning)
    • Look up yes/no token IDs via tokenizer
    • Build PrePostProcessor graph: Slice last token → Squeeze → Gather yes/no logits → Subtract (yes - no), producing [batch, 1] output compatible with existing sigmoid scoring

src/rerank/rerank_calculator_ov.cc

  • Added Qwen3 chat template input formatting in PrepareInputsForRerankModel()
  • Compute position_ids from attention mask for CausalLM models
  • Zero-fill beam_idx for CausalLM models
  • Guard token_type_ids creation with !isQwen3 check (CausalLM uses position_ids, not token_type_ids, as 3rd input)

Model Export

Models must be exported with --task text-generation (not the default text-classification):

optimum-cli export openvino --model Qwen/Qwen3-Reranker-8B --task text-generation --weight-format int8 Qwen3-Reranker-8B-causal-int8-ov

The default text-classification export produces a model with an untrained random classification head that outputs garbage scores.

Test plan

  • Tested with Qwen3-Reranker-0.6B (int8) on CPU — correct relevance scores
  • Tested with Qwen3-Reranker-8B (int8) on CPU and Intel Arc GPU — correct relevance scores
  • Verified existing non-Qwen3 reranker models are unaffected (isQwen3 = false, no code path changes)
  • CI/unit tests (to be added if maintainers request)

Qwen3-Reranker models use CausalLM architecture instead of cross-encoder
text-classification, requiring different input formatting and output
postprocessing. This enables OVMS to natively serve Qwen3-Reranker models
(all sizes: 0.6B, 8B) exported with --task text-generation via the
standard /v3/rerank API, with no client-side workarounds needed.

Changes:
- Auto-detect Qwen3 models via model_type in config.json
- Apply server-side chat template formatting for query-document pairs
- Add CausalLM graph postprocessing (Slice/Squeeze/Gather/Subtract)
  to extract yes/no logits from 3D output, producing scores compatible
  with existing sigmoid scoring
- Handle CausalLM-specific inputs (position_ids, beam_idx)
- Guard token_type_ids to avoid conflicts with CausalLM input layout
- Warn if model was exported as text-classification (random head weights)

Tested with Qwen3-Reranker-0.6B and Qwen3-Reranker-8B (int8) on
CPU and Intel Arc GPU, producing correct relevance scores.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant