Native Qwen3-Reranker CausalLM support in RerankCalculatorOV#4063
Open
ambeckley wants to merge 1 commit intoopenvinotoolkit:mainfrom
Open
Native Qwen3-Reranker CausalLM support in RerankCalculatorOV#4063ambeckley wants to merge 1 commit intoopenvinotoolkit:mainfrom
ambeckley wants to merge 1 commit intoopenvinotoolkit:mainfrom
Conversation
Qwen3-Reranker models use CausalLM architecture instead of cross-encoder text-classification, requiring different input formatting and output postprocessing. This enables OVMS to natively serve Qwen3-Reranker models (all sizes: 0.6B, 8B) exported with --task text-generation via the standard /v3/rerank API, with no client-side workarounds needed. Changes: - Auto-detect Qwen3 models via model_type in config.json - Apply server-side chat template formatting for query-document pairs - Add CausalLM graph postprocessing (Slice/Squeeze/Gather/Subtract) to extract yes/no logits from 3D output, producing scores compatible with existing sigmoid scoring - Handle CausalLM-specific inputs (position_ids, beam_idx) - Guard token_type_ids to avoid conflicts with CausalLM input layout - Warn if model was exported as text-classification (random head weights) Tested with Qwen3-Reranker-0.6B and Qwen3-Reranker-8B (int8) on CPU and Intel Arc GPU, producing correct relevance scores.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--task text-generationmodel_typeinconfig.json— no changes needed for existing reranker models/v3/rerankAPI with no workaroundsMotivation
Qwen3-Reranker models use
CausalLMarchitecture, not cross-encoder text-classification. The current workaround (#3578) requires community-modified seq-cls model. This PR enables all official Qwen3-Reranker model sizes to work natively through OVMS without client modifications. It is still backwards compatibility with tomaarsen/Qwen3-Reranker-*-seq-cls models.Changes
src/rerank/rerank_servable.hppisQwen3,hasPositionIds,hasBeamIdxdetection flagsapplyPrePostProcessing()to:config.jsonformodel_type: "qwen3"position_idsandbeam_idxmodel inputs[batch, 1]output compatible with existing sigmoid scoringsrc/rerank/rerank_calculator_ov.ccPrepareInputsForRerankModel()position_idsfrom attention mask for CausalLM modelsbeam_idxfor CausalLM modelstoken_type_idscreation with!isQwen3check (CausalLM uses position_ids, not token_type_ids, as 3rd input)Model Export
Models must be exported with
--task text-generation(not the defaulttext-classification):optimum-cli export openvino --model Qwen/Qwen3-Reranker-8B --task text-generation --weight-format int8 Qwen3-Reranker-8B-causal-int8-ovThe default text-classification export produces a model with an untrained random classification head that outputs garbage scores.
Test plan