Skip to content

chore: release v0.13.0 — Phi-3 support + unified server#81

Merged
unamedkr merged 1 commit intomainfrom
release/v0.13.0
Apr 12, 2026
Merged

chore: release v0.13.0 — Phi-3 support + unified server#81
unamedkr merged 1 commit intomainfrom
release/v0.13.0

Conversation

@unamedkr
Copy link
Copy Markdown
Collaborator

v0.13.0 Release

Version bump + release notes + release workflow update.

See full changelog in `docs/RELEASE_NOTES.md`.

Key changes since v0.12.1

Release checklist

  • Version bumped in pyproject.toml + __init__.py
  • Release notes written
  • Staged sdist header synced
  • Release workflow includes quant-server-unified
  • 35/35 tests pass
  • Tag v0.13.0 after merge
  • GitHub Release created from tag
  • PyPI publish workflow triggered

🤖 Generated with Claude Code

v0.13.0 highlights:
- Phi-3 / Phi-3.5 architecture fully supported (fused QKV/FFN, LongRoPE)
- Phi-3.5-mini Q8_0 as default model (2x faster than Q4_K_M on NEON)
- quant-server-unified (quant.h-based, no sync divergence)
- ChatML template marker filter (BPE-split stop detection)
- 16 chat-cache bugs eliminated across 2 audit passes
- ChatContextOverflow exception in Python

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unamedkr unamedkr merged commit b60ce4e into main Apr 12, 2026
2 of 3 checks passed
@unamedkr unamedkr deleted the release/v0.13.0 branch April 12, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SmolLM2-1.7B server inference regression after 91814d4 (Phi-3 CPU fallback)

1 participant