From ef9113e04f8c487a396bf8cb95b256d4655d7c4b Mon Sep 17 00:00:00 2001 From: quantumaikr Date: Sun, 12 Apr 2026 10:59:40 +0900 Subject: [PATCH] feat(feedback-quick-wins): act on 2026-04-12 external user report MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Four scoped fixes addressing the highest-impact items in docs/feedback/2026-04-12_0900.md. Each is independently useful and nothing experimental — Phi-3 architecture support is intentionally deferred to a separate PR. ## Changes ### P0-A — SmolLM2-1.7B as the recommended default External tester measured SmolLM2-1.7B at ~12.5 tok/s vs Llama-3.2-1B at ~2.3 tok/s on Apple M3. Same llama arch family, but vocab 49K vs 128K. The lm_head matmul (vocab × hidden_dim per token) is the bottleneck — fewer params don't help if the vocab is bigger. - Add SmolLM2-1.7B-Instruct (Q8) to `_MODEL_REGISTRY` - Add `smollm2:1.7b` and bare `smollm2` aliases (the bare alias now points at 1.7B; users wanting the demo model ask for `smollm2:135m`) - `cmd_chat_default` now uses SmolLM2-1.7B - Module + class docstrings + CLI help epilog all updated to reflect the new recommendation ### P0-B — Hard-fail load on unsupported architecture Previously: loading a Phi-3 GGUF reported `loaded N layers (0 self_attn)` in the success log and returned a model that produced page after page of garbage tokens. Phi-3 uses fused `attn_qkv` projection which the loader doesn't recognize. Now: when `tq_load_gguf` finishes a model with zero standard self_attn layers AND no DeltaNet weights, it logs a clear ERROR naming the architecture and returns NULL. Callers see the failure immediately instead of debugging garbage output. ``` tq_load_gguf: ERROR — model architecture 'phi3' is not supported. Detected 0 self_attn layers and no DeltaNet weights. ... ``` ### P0-C — ChatML template marker filter External tester reported `<|im_start|>`, `<|im_end|>`, `assistant` etc. leaking into chat output. Root cause: BPE tokenizers fragment these markers across multiple tokens, so the existing per-token strstr check in the generation loop never matches. Fix: a 32-byte lookahead filter inside `chat_accum_callback`. The filter buffers the most recent text, scans for known markers, and: - `<|im_start|>` at the very start of the response → strip the `<|im_start|>assistant\n` header (model is echoing the chat prompt) - any END marker (`<|im_end|>`, `<|eot_id|>`, ``, `<|endoftext|>`, `<|im_start|>` mid-response, `<|start_header_id|>`, `<|eom_id|>`) → emit clean prefix, set `stop_requested`, fast-path loop checks the flag and breaks Streaming latency cost: ~CHAT_LOOKAHEAD bytes (32) of in-flight buffer. Verified by a standalone harness that drives the filter with simulated token streams (8 cases including BPE-split markers — all pass). ### P1-C — `docs/supported_models.md` New page documenting the architecture compatibility matrix, the vocab size → speed relationship, why Phi-3 is hard, and how to report a broken model. Linked from the feedback file. ## Verified - ctest --test-dir build → 35/35 passed - cmake --build build → all targets clean (no new warnings) - wasm/build.sh → 320K bundle rebuilt - Standalone chat_accum filter test → 8/8 passed - Python `from quantcpp import Model` + `available_models()` works - `quantcpp --help` epilog reflects new defaults quant.h and src/engine/tq_generate.c kept in lockstep (filter logic mirrored byte-for-byte). ## Deferred - Phi-3 (`attn_qkv` / `gate_up_proj`) loader support — separate PR with prototype + validation gate - Server fallback in pure Python (so `quantcpp serve` works without a CMake build) — separate PR - Server request queueing / 429 — separate PR Co-Authored-By: Claude Opus 4.6 (1M context) --- bindings/python/quantcpp/__init__.py | 35 +++- bindings/python/quantcpp/cli.py | 29 ++-- docs/feedback/2026-04-12_0900.md | 195 ++++++++++++++++++++++ docs/supported_models.md | 117 +++++++++++++ quant.h | 239 +++++++++++++++++++++++++-- src/engine/tq_generate.c | 174 +++++++++++++++++-- wasm/quant.wasm | Bin 293858 -> 297397 bytes 7 files changed, 739 insertions(+), 50 deletions(-) create mode 100644 docs/feedback/2026-04-12_0900.md create mode 100644 docs/supported_models.md diff --git a/bindings/python/quantcpp/__init__.py b/bindings/python/quantcpp/__init__.py index e9559ef..bb8061a 100644 --- a/bindings/python/quantcpp/__init__.py +++ b/bindings/python/quantcpp/__init__.py @@ -4,11 +4,18 @@ Quick start: from quantcpp import Model - m = Model.from_pretrained("Llama-3.2-1B") + m = Model.from_pretrained("SmolLM2-1.7B") print(m.ask("What is gravity?")) -Note: SmolLM2-135M downloads faster but produces low-quality output. -Use Llama-3.2-1B (~750 MB, one-time download) for good results. +Model selection guide: + SmolLM2-1.7B (1.7 GB, vocab 49K) — recommended. ~12 tok/s on Apple M3. + Llama-3.2-1B (750 MB, vocab 128K) — smaller download but slower + due to large vocab (~2 tok/s on M3). + SmolLM2-135M (138 MB, vocab 49K) — demo only, low quality output. + +Larger vocab = slower lm_head matmul → smaller params with smaller vocab +often beats larger params with larger vocab. See docs/supported_models.md +for the architecture support matrix. """ try: @@ -53,17 +60,37 @@ class ChatContextOverflow(RuntimeError): Path.home() / ".cache" / "quantcpp")) # name → (HuggingFace repo, filename, approx size in MB) +# Note: download URL is constructed as +# https://huggingface.co/{repo}/resolve/main/{filename} +# Verify both fields against the actual HuggingFace listing before +# adding new entries — there is no integrity check at runtime. _MODEL_REGISTRY = { + # 138 MB demo model. Tokenizer + arch are llama-compatible but the + # model is too small to produce coherent output for general chat. + # Listed only so users can verify the install/load path quickly. "SmolLM2-135M": ( "Felladrin/gguf-Q8_0-SmolLM2-135M-Instruct", "smollm2-135m-instruct-q8_0.gguf", 135, ), + # Recommended default for first-time users on Apple Silicon / typical + # laptops. vocab 49K keeps the lm_head matmul small, so even on a + # mid-range M-series chip we measure ~12 tok/s — comfortable for + # interactive chat. Same llama arch family as SmolLM2-135M, so it + # exercises the most-tested code path. + "SmolLM2-1.7B": ( + "bartowski/SmolLM2-1.7B-Instruct-GGUF", + "SmolLM2-1.7B-Instruct-Q8_0.gguf", + 1700, + ), "Qwen3.5-0.8B": ( "unsloth/Qwen3.5-0.8B-GGUF", "Qwen3.5-0.8B-Q4_K_M.gguf", 508, ), + # Smaller download than SmolLM2-1.7B but slower at inference time + # because of the 128K Llama-3 vocab (~5x slower lm_head matmul on M3). + # Kept in the registry for users who specifically want a Llama model. "Llama-3.2-1B": ( "hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF", "llama-3.2-1b-instruct-q4_k_m.gguf", @@ -170,7 +197,7 @@ class Model: Examples -------- - >>> m = Model.from_pretrained("SmolLM2-135M") + >>> m = Model.from_pretrained("SmolLM2-1.7B") >>> m.ask("What is gravity?") 'Gravity is a force that attracts ...' diff --git a/bindings/python/quantcpp/cli.py b/bindings/python/quantcpp/cli.py index 830204f..8a5fe73 100644 --- a/bindings/python/quantcpp/cli.py +++ b/bindings/python/quantcpp/cli.py @@ -18,9 +18,13 @@ import json -# Ollama-style short aliases → canonical _MODEL_REGISTRY keys +# Ollama-style short aliases → canonical _MODEL_REGISTRY keys. +# Plain "smollm2" without a size suffix points at the 1.7B model — that's +# the recommended default. Users who explicitly want the 135M demo model +# need to ask for it by full name. MODEL_ALIASES = { - "smollm2": "SmolLM2-135M", + "smollm2": "SmolLM2-1.7B", + "smollm2:1.7b": "SmolLM2-1.7B", "smollm2:135m": "SmolLM2-135M", "qwen3.5": "Qwen3.5-0.8B", "qwen3.5:0.8b": "Qwen3.5-0.8B", @@ -329,8 +333,13 @@ def cmd_client(args): def cmd_chat_default(args): - """Backwards-compatible default: auto-download Llama-3.2-1B and chat.""" - args.model = args.model or "Llama-3.2-1B" + """Backwards-compatible default: auto-download SmolLM2-1.7B and chat. + + Default switched from Llama-3.2-1B to SmolLM2-1.7B (2026-04-12) after + user feedback that Llama-3.2-1B's 128K vocab makes it ~5x slower at + interactive chat than SmolLM2-1.7B's 49K vocab on Apple Silicon. + """ + args.model = args.model or "SmolLM2-1.7B" args.threads = getattr(args, "threads", 4) args.max_tokens = getattr(args, "max_tokens", 256) args.temperature = getattr(args, "temperature", 0.7) @@ -354,19 +363,19 @@ def main(): client PROMPT Send a request to a running serve (default: SSE streaming) examples: - quantcpp pull llama3.2:1b + quantcpp pull smollm2 # recommended: small vocab → fast quantcpp list - quantcpp run llama3.2:1b - quantcpp run llama3.2:1b "What is gravity?" - quantcpp serve llama3.2:1b --port 8080 + quantcpp run smollm2 + quantcpp run smollm2 "What is gravity?" + quantcpp serve smollm2 --port 8080 quantcpp client "What is gravity?" # streams from :8080 quantcpp client "Hi" --url http://localhost:8081 quantcpp client "Hi" --no-stream # single JSON response backwards-compat (no subcommand): - quantcpp # default chat with Llama-3.2-1B + quantcpp # default chat with SmolLM2-1.7B quantcpp "What is gravity?" # one-shot - quantcpp --model SmolLM2-135M # different model + quantcpp --model llama3.2:1b # different model """, ) diff --git a/docs/feedback/2026-04-12_0900.md b/docs/feedback/2026-04-12_0900.md new file mode 100644 index 0000000..c925007 --- /dev/null +++ b/docs/feedback/2026-04-12_0900.md @@ -0,0 +1,195 @@ +# quant.cpp User Feedback — First-Time Setup & Usage Experience + +**Date**: 2026-04-12 +**Environment**: macOS (Apple M3, 8-core CPU, 10-core GPU, 16GB Unified Memory) +**Version tested**: v0.10.1 → v0.12.0 (pip) + latest main (source build) +**Tested by**: End-user (developer, first-time quant.cpp user) + +--- + +## Summary + +pip install부터 `quantcpp serve`, Metal GPU 빌드, 채팅 웹 UI 연동, 다양한 모델 비교까지의 전 과정을 체험했습니다. 전반적으로 "설치 → 모델 다운로드 → 추론"까지의 흐름은 매우 간결했으나, 모델 호환성과 속도 면에서 개선점이 발견되었습니다. + +--- + +## 1. 좋았던 점 + +### 1.1 설치가 매우 간단 +- `pip install quantcpp` 한 줄로 설치 완료. 의존성 zero. +- `Model.from_pretrained("Llama-3.2-1B")`으로 모델 자동 다운로드 + 캐시. 매우 편리. + +### 1.2 OpenAI 호환 API 서버 +- `quantcpp serve llama3.2:1b --port 8080` 한 줄로 서버 기동. +- `/v1/chat/completions` 엔드포인트가 OpenAI SDK와 호환되어 기존 코드 재사용 가능. +- SSE 스트리밍(`stream: true`) 정상 동작. +- CORS 헤더 (`Access-Control-Allow-Origin: *`) 기본 포함 — 프론트엔드 연동 즉시 가능. + +### 1.3 v0.12.0의 CLI 추가 +- `quantcpp "What is gravity?"` 한 줄 질문이 가능해져 체험 진입장벽이 크게 낮아짐. +- `quantcpp` (인터랙티브 모드)도 직관적. + +### 1.4 KV cache reuse (최신 main) +- 연속 대화 시 두 번째 요청부터 prefill이 생략되어 응답 시간이 ~50% 단축됨. +- 첫 요청 27초 → 두 번째 요청 14초 (Llama-3.2-1B 기준). + +### 1.5 Metal GPU 자동 감지 +- `TQ_BUILD_METAL=ON`으로 빌드하면 Apple Silicon GPU를 자동 감지하여 활성화. +- 별도 설정 없이 matmul 배치 디스패치가 Metal로 전환됨. + +### 1.6 SmolLM2-1.7B에서의 우수한 성능 +- vocab size가 작은 모델(49K)에서 ~12.5 tok/s 달성. 실시간 대화 가능 수준. +- 출력 품질도 깨끗하고 정확함 (예: "The capital of South Korea is Seoul."). + +--- + +## 2. 개선이 필요한 점 + +### 2.1 pip 패키지에서 CLI가 누락 (v0.10.1) +- **문제**: PyPI v0.10.1에는 `quantcpp` CLI entry point가 없었음. `zsh: command not found: quantcpp`. +- **해결**: v0.11.0부터 `cli.py` + entry point 추가로 해결됨. +- **제안**: PyPI에 최신 버전을 빠르게 배포하면 첫 경험이 크게 개선될 것. + +### 2.2 `quantcpp serve`에 quant-server 바이너리 필요 +- **문제**: `pip install quantcpp` 후 `quantcpp serve`를 실행하면 `quant-server binary not found` 에러. +- 사용자가 직접 CMake로 `TQ_BUILD_SERVER=ON` 빌드 후 PATH에 복사해야 함. +- **제안**: pip 패키지에 서버 바이너리를 포함하거나, 순수 Python fallback 서버를 제공. + +### 2.3 Llama-3.2-1B의 극심한 느린 속도 +- **문제**: Llama-3.2-1B (Q4_K_M)가 Apple M3에서 ~2.3 tok/s로 매우 느림. + - 60토큰 생성에 ~27초, 200토큰에 ~67초 소요. + - 대화형 사용이 사실상 불가능한 수준. +- **원인 분석**: vocab size 128,256이 병목. 매 토큰마다 128K 차원의 output projection 필요. +- **대비**: 동일 환경에서 SmolLM2-1.7B (Q8, vocab 49K)는 ~12.5 tok/s로 5배 빠름. +- **제안**: + - 기본 추천 모델을 SmolLM2-1.7B로 변경 검토. + - 또는 모델 선택 가이드에 "vocab size가 클수록 느려진다"는 안내 추가. + +### 2.4 SmolLM2-135M의 출력 품질 문제 +- **문제**: SmolLM2-135M은 속도는 빠르지만(0.3초) 출력이 HTML 쓰레기 텍스트. +- **제안**: 135M 모델은 "quantization 데모용"으로만 안내하고, 추론 품질 기대를 낮추는 문구 추가. + +### 2.5 Gemma-4-E2B 호환성 문제 +- **문제**: gemma-4-E2B-it-Q4_K_M.gguf 로딩은 성공하나, 추론 출력이 완전히 깨짐 (다국어 쓰레기 토큰). +- 서버 로그에는 정상 로딩으로 표시되어 사용자가 원인을 파악하기 어려움. +- **제안**: 지원되는 모델/아키텍처 목록을 명시하고, 미지원 모델 로딩 시 경고 표시. + +### 2.6 Phi-3.5-mini-instruct 아키텍처 미지원 (신규) +- **문제**: `Phi-3.5-mini-instruct-Q8_0.gguf` (3.9GB) 로딩은 성공하나, attention 레이어 매핑 실패. + - 서버 로그: `loaded 32 layers (0 self_attn)` — self_attn이 0으로 인식됨. + - 출력: 완전한 쓰레기 토큰 (`uffrasspkeryensonisatcreteBUG...`). + - 속도 자체는 0.85초/80토큰으로 극도로 빠름 (vocab 32K 효과). +- **영향**: Phi-3/Phi-3.5는 vocab 32K로 속도 면에서 최적의 모델이나 사용 불가. +- **제안**: + - Phi-3 (`phi3`) 아키텍처의 attention 레이어 매핑 지원 추가. + - 이 모델이 지원되면 "속도 + 품질" 모두에서 최적의 추천 모델이 될 수 있음. + - `self_attn=0`으로 감지된 경우 사용자에게 경고 메시지 표시 필요. + +### 2.7 Qwen3.5-0.8B 출력 품질 문제 (신규) +- **문제**: Qwen3.5-0.8B (Q4_K_M) 서버 로딩은 성공하나, 출력이 완전히 깨짐. + - DeltaNet hybrid 아키텍처 특성으로 인한 호환성 문제 추정. + - 33초/60토큰으로 속도도 느림 (vocab 248K). +- **제안**: Qwen 계열의 지원 상태를 문서에 명시. + +### 2.8 Metal GPU 가속 효과 제한적 (소형 모델) +- **문제**: 1B 모델에서 Metal GPU가 활성화되어 있으나 체감 속도 차이 없음. +- 소스 코드 주석에도 "Metal Q4 batch → 38 tok/s vs CPU Q4 → 95 tok/s (SmolLM2)" 명시. +- 소형 모델에서는 GPU 디스패치 오버헤드가 연산 시간보다 큼. +- **제안**: 모델 크기에 따라 CPU/GPU 자동 전환 로직 추가, 또는 `--device cpu/gpu` 옵션 제공. + +### 2.9 서버 단일 요청 처리 (동시성 없음) +- **문제**: 첫 번째 요청 처리 중 두 번째 요청이 완전히 블로킹됨. +- 채팅 UI에서 연속 질문 시 두 번째 질문이 3분+ 대기. +- **제안**: 요청 큐잉 + 처리 중 상태 반환 (429 or retry-after), 또는 요청 취소 API. + +### 2.10 chat template 잔여물 +- **문제**: 응답에 `<|im_start|>`, `<|im_end|>`, `assistant` 등 template 토큰이 노출됨. +- Llama-3.2-1B에서 특히 빈번. SmolLM2-1.7B에서는 `<|im_ennd|>` 정도로 경미. +- **제안**: 서버 측에서 stop tokens/template markers를 자동 strip. + +--- + +## 3. 모델별 벤치마크 (Apple M3, 16GB RAM, Metal GPU 빌드) + +| Model | Quant | File Size | Vocab | tok/s | 60-token Time | Quality | Architecture | +|-------|-------|-----------|------:|------:|--------------:|---------|-------------| +| SmolLM2-135M | Q8 | 138MB | 49K | ~300 | 0.3s | Unusable (garbage) | llama | +| Qwen3.5-0.8B | Q4_K_M | 508MB | 248K | ~1.8 | ~33s | Broken (garbage) | qwen/deltanet | +| Llama-3.2-1B | Q4_K_M | 770MB | 128K | ~2.3 | ~27s | Usable (artifacts) | llama | +| **SmolLM2-1.7B** | **Q8** | **1.7GB** | **49K** | **~12.5** | **~5s** | **Good (clean)** | **llama** | +| Gemma-4-E2B | Q4_K_M | 2.9GB | 262K | ~10 | ~5s | Broken (compat) | gemma4 hybrid | +| Phi-3.5-mini | Q8 | 3.9GB | 32K | ~94* | ~0.85s* | Broken (0 self_attn) | phi3 | + +*\* Phi-3.5 속도는 attention이 작동하지 않아 실제 추론이 아님. 정상 지원 시 예상 속도.* + +### Key Insights + +1. **vocab size가 속도에 가장 큰 영향을 미침.** 파라미터 수보다 vocab size와 양자화 방식이 실사용 속도를 결정. + - SmolLM2-1.7B (vocab 49K): 12.5 tok/s + - Llama-3.2-1B (vocab 128K): 2.3 tok/s — 2.6x vocab → 5.4x 느림 +2. **Q8이 Q4보다 빠를 수 있음.** Q4의 디퀀타이즈 오버헤드가 Q8보다 크며, NEON SIMD에서 Q8이 더 효율적. +3. **llama 아키텍처만 안정적으로 동작.** phi3, gemma4, qwen/deltanet 아키텍처는 로딩은 되지만 추론이 깨짐. +4. **Phi-3.5가 지원되면 게임 체인저.** vocab 32K + 3.8B params로 "속도 + 품질" 최적 조합 가능. + +--- + +## 4. 아키텍처 호환성 매트릭스 (신규) + +| Architecture | GGUF Load | Tokenizer | Attention | Inference | Status | +|-------------|-----------|-----------|-----------|-----------|--------| +| llama (SmolLM2, Llama) | OK | OK | OK | OK | **Fully supported** | +| llama (Llama-3.2 GQA) | OK | OK | OK | Slow | Supported (vocab bottleneck) | +| phi3 (Phi-3.5-mini) | OK | OK | **FAIL (0 self_attn)** | Garbage | **Not supported** | +| gemma4 (Gemma-4-E2B) | OK | OK | Partial | Garbage | **Not supported** | +| qwen/deltanet (Qwen3.5) | OK | OK | Unknown | Garbage | **Not supported** | + +**제안**: 이 매트릭스를 README 또는 docs에 포함하여 사용자가 모델 선택 전에 호환성을 확인할 수 있게 해주세요. + +--- + +## 5. 제안 우선순위 + +| Priority | Item | Impact | Effort | +|----------|------|--------|--------| +| **P0** | Phi-3 (`phi3`) 아키텍처 attention 매핑 지원 | 최적 모델 활용 가능 | Medium | +| **P0** | chat template 토큰 자동 strip | 출력 품질 즉시 개선 | Low | +| **P0** | 기본 추천 모델을 SmolLM2-1.7B로 변경 | 첫 경험 대폭 개선 | Low | +| P1 | pip 패키지에 서버 바이너리 포함 | 설치 → 서버 기동 원스텝 | Medium | +| P1 | 미지원 아키텍처 로딩 시 경고/에러 | 디버깅 시간 절약 | Low | +| P1 | `self_attn=0` 감지 시 경고 메시지 | 호환성 문제 즉시 인지 | Low | +| P2 | 서버 동시 요청 처리 (또는 큐잉) | 다중 사용자/연속 대화 | High | +| P2 | 아키텍처 호환성 매트릭스 문서화 | 모델 선택 가이드 | Low | +| P2 | vocab size 기반 CPU/GPU 자동 전환 | 최적 성능 자동 선택 | Medium | +| P3 | `--device cpu/gpu` CLI 옵션 | 사용자 제어권 | Low | + +--- + +## 6. 테스트 환경 상세 + +``` +Hardware: Apple M3, 8-core CPU, 10-core GPU, 16GB Unified Memory +OS: macOS 15 (Darwin 24.5.0) +Python: 3.14.3 +Compiler: AppleClang 16.0.0 +Xcode: installed (Metal shader compilation enabled) +quantcpp: v0.10.1 (pip) → v0.12.0 (pip) → latest main (source) +Build: cmake -DTQ_BUILD_METAL=ON -DTQ_BUILD_SERVER=ON -DCMAKE_BUILD_TYPE=Release +``` + +--- + +## 7. 테스트한 모델 파일 목록 + +``` +~/.cache/quantcpp/smollm2-135m-instruct-q8_0.gguf (138 MB) +~/.cache/quantcpp/Qwen3.5-0.8B-Q4_K_M.gguf (508 MB) +~/.cache/quantcpp/llama-3.2-1b-instruct-q4_k_m.gguf (770 MB) +~/.cache/quantcpp/Phi-3.5-mini-instruct-Q8_0.gguf (3.9 GB) — NEW +~/dev/projects/TurboQuant.cpp/models/SmolLM2-1.7B-Instruct-Q8_0.gguf (1.7 GB) +~/dev/projects/TurboQuant.cpp/models/gemma-4-E2B-it-Q4_K_M.gguf (2.9 GB) +``` + +--- + +*This feedback was generated based on a hands-on first-time user experience session on 2026-04-12.* +*Updated with Phi-3.5-mini-instruct and Qwen3.5-0.8B architecture compatibility findings.* diff --git a/docs/supported_models.md b/docs/supported_models.md new file mode 100644 index 0000000..5e9600f --- /dev/null +++ b/docs/supported_models.md @@ -0,0 +1,117 @@ +# Supported Models + +quant.cpp loads GGUF files from HuggingFace, but only some model +architectures are fully wired through the inference path. This page +tracks what works, what loads-but-fails, and how to pick a model. + +## TL;DR — Recommended models + +| Use case | Model | Why | +|---|---|---| +| **First-time install** | `SmolLM2-1.7B` (Q8) | Fastest end-to-end on a laptop. Vocab 49K keeps the lm_head matmul small (~12 tok/s on Apple M3). | +| Smaller download | `Llama-3.2-1B` (Q4_K_M) | 750 MB vs 1.7 GB, but ~5x slower at inference time due to 128K vocab. | +| Quick smoke test | `SmolLM2-135M` (Q8) | 138 MB download to verify the install path. Output quality is poor — not for real use. | + +```bash +# CLI quickstart +quantcpp run smollm2 # SmolLM2-1.7B (recommended) +quantcpp run smollm2:135m # SmolLM2-135M (smoke test only) +quantcpp run llama3.2:1b # smaller download, slower +``` + +```python +# Python quickstart +from quantcpp import Model +m = Model.from_pretrained("SmolLM2-1.7B") +print(m.ask("What is gravity?")) +``` + +## Architecture compatibility matrix + +| Architecture | GGUF Load | Tokenizer | Attention | Inference | Status | +|---|:---:|:---:|:---:|:---:|---| +| **llama** (SmolLM2, Llama-3.x, Mistral) | ✅ | ✅ | ✅ | ✅ | **Fully supported** | +| llama with 128K vocab (Llama-3.2-1B) | ✅ | ✅ | ✅ | slow | Supported, vocab is the bottleneck | +| **gemma** (Gemma 2) | ✅ | ✅ | ✅ | ✅ | Supported | +| **gemma3** | ✅ | ✅ | ✅ | ✅ | Supported with hybrid sliding-window attention | +| **gemma4** (Gemma-4-E2B / E4B) | ✅ | ✅ | ⚠️ | ⚠️ | Partial — some Q4_K_M variants produce garbage; report with file SHA256 | +| **qwen** / **qwen2** | ✅ | ✅ | ✅ | ✅ | Supported | +| **qwen3.5** (DeltaNet hybrid) | ✅ | ✅ | partial | ⚠️ | Partial — pure-attention layers work, DeltaNet hybrid still being validated | +| **phi3** / **phi3.5** (fused QKV) | ❌ | — | — | — | **Not supported** — uses `attn_qkv`, see "Why phi3 is hard" below | + +✅ = works · ⚠️ = loads but inference is unreliable · ❌ = load fails fast with a clear error (since 2026-04-12) + +If you load an unsupported architecture, the loader now prints: + +``` +tq_load_gguf: ERROR — model architecture 'phi3' is not supported. + Detected 0 self_attn layers and no DeltaNet weights. + This usually means the model uses fused QKV projection + (e.g., Phi-3 `attn_qkv`) which quant.cpp does not yet handle. + See docs/supported_models.md for the architecture support matrix. +``` + +…and `tq_load_gguf` returns NULL, so callers can fail-fast instead of +silently producing garbage tokens. + +## Why vocab size dominates speed + +quant.cpp generates one token at a time. Every token requires a +`lm_head` matmul of shape `[hidden_dim, vocab_size]`. For a typical 1B +model with `hidden_dim = 2048`: + +| Model | vocab_size | lm_head FLOPs/token | +|---|---:|---:| +| SmolLM2-1.7B | 49,152 | 100 M | +| Llama-3.2-1B | 128,256 | 263 M | + +Llama-3.2-1B has fewer parameters (1.0B vs 1.7B) but its lm_head matmul +is 2.6x bigger, and on CPU it dominates wall time. External user +benchmarks on Apple M3 (8-core CPU, 16 GB RAM): + +| Model | tok/s | 60-token latency | +|---|---:|---:| +| SmolLM2-1.7B (Q8, vocab 49K) | ~12.5 | ~5 s | +| Llama-3.2-1B (Q4_K_M, vocab 128K) | ~2.3 | ~27 s | + +**Take-away**: when picking a model for an embedded / laptop scenario, +vocab size is a better predictor of interactive latency than parameter +count. Pick the smallest vocab that produces output you're happy with. + +## Why phi3 is hard + +Phi-3 / Phi-3.5 uses a *fused* QKV projection: instead of three separate +tensors `attn_q.weight`, `attn_k.weight`, `attn_v.weight`, it ships one +`attn_qkv.weight` with all three projections concatenated along the +output dimension. + +quant.cpp's GGUF loader currently looks for the three-tensor layout +(`blk.N.attn_q.weight` etc.). When it loads a Phi-3 GGUF, none of those +names match → 0 self_attn layers detected → forward pass runs against +zero-initialized attention weights → garbage tokens. + +Adding Phi-3 support requires either: + +1. **Loader splits** `attn_qkv.weight` into the three views at load time + and writes them into the existing `wq`/`wk`/`wv` slots, OR +2. **Forward path** learns to dispatch a fused QKV matmul when the + loader detects the fused tensor. + +Option (1) is simpler but doubles the working set during load. Option +(2) is the right long-term answer. There's a tracking issue / spike in +progress; until then Phi-3 is the highest-value missing architecture for +quant.cpp's "speed + quality" target (Phi-3.5-mini has vocab 32K plus +3.8B params — it would beat both SmolLM2-1.7B and Llama-3.2-1B at +interactive use). + +## Reporting an unsupported model + +If you tried a model that's not in the matrix above, please open an +issue with: + +- The HuggingFace repo + filename +- The exact `tq_load_gguf:` log lines (including `architecture = '...'`) +- The first ~50 generated tokens (so we can see whether it's garbage, + partial garbage, or just wrong-language) + +Don't include the model file itself — link to the HuggingFace page. diff --git a/quant.h b/quant.h index 36cbbb2..136d1e4 100644 --- a/quant.h +++ b/quant.h @@ -11940,6 +11940,39 @@ tq_model_t* tq_load_gguf(const char* path) { n_attn_layers, c->n_layers); } + /* Hard-fail when neither standard self_attn (`blk.N.attn_q.weight`) nor + * DeltaNet (`blk.N.ssm_a`) was detected on any layer. The GGUF loaded + * fine but every layer is missing its attention block — typically + * because the architecture uses fused QKV (Phi-3 `attn_qkv`) or some + * other naming convention we don't recognize yet. + * + * Without this check the load returns successfully, the forward pass + * runs against zero-initialized attention weights, and the user gets + * pages of garbage tokens with no clear error to debug. The previous + * behavior was reported by an external user (2026-04-12 feedback) as + * the worst part of the first-time experience: "loaded 32 layers + * (0 self_attn)" looked like a success log. + * + * Listed architectures that hit this path: + * - phi3 / phi3.5 (uses fused `blk.N.attn_qkv.weight`) + * - any future fused-QKV architecture we haven't ported yet + * + * Hybrid models with at least ONE self_attn layer (e.g., Qwen3.5 + * DeltaNet) are NOT affected — they hit the branch above and proceed. */ + if (n_attn_layers == 0 && c->delta_n_heads == 0) { + fprintf(stderr, + "tq_load_gguf: ERROR — model architecture '%s' is not supported.\n" + " Detected 0 self_attn layers and no DeltaNet weights.\n" + " This usually means the model uses fused QKV projection\n" + " (e.g., Phi-3 `attn_qkv`) which quant.cpp does not yet handle.\n" + " See docs/supported_models.md for the architecture support matrix.\n", + gguf->arch[0] ? gguf->arch : "unknown"); + /* tq_free_model owns gguf_ctx (set above at line 11463) and will + * close it as part of the teardown — do not double-close. */ + tq_free_model(model); + return NULL; + } + /* Set up layer_is_sliding for Gemma hybrid attention. * Detect from K tensor shape: sliding and full layers have different K output dims. * The MAJORITY of layers are sliding (e.g., 25/30 or 28/35). */ @@ -15874,36 +15907,197 @@ int tq_generate_continue(tq_model_t* model, * Pass cached_text_io == NULL to disable text-prefix tracking. * ============================================================================ */ +/* ChatML / template-marker filter ---------------------------------------- + * + * The model can generate template tokens like `<|im_start|>`, `<|im_end|>`, + * ``, etc. as REGULAR text bytes (not special tokens). When + * that happens the BPE tokenizer fragments them across multiple tokens, + * and a per-token strstr check (like the existing `should_stop` logic) + * never matches. The user sees the marker leak into their stream. + * + * This filter holds the most recent CHAT_LOOKAHEAD bytes of generated + * text in `pending` and only flushes bytes that are guaranteed to NOT + * be the start of a marker. When a full marker is matched: + * - `<|im_start|>` at the very beginning of the response → header + * skip mode (drop until next '\n'). The model is regurgitating the + * `<|im_start|>assistant\n` prefix that the prompt template already + * contains; we silently strip it. + * - any END marker → emit the prefix, drop the marker and everything + * after, set `stop_requested` so the generation loop can break. + * + * Cost: each token is delayed by ~CHAT_LOOKAHEAD bytes worth of stream. + * For typical English (3-4 chars/token), that's ~8-10 tokens of latency + * before the first token shows up. After that, streaming is steady-state + * with the same latency window. + * ----------------------------------------------------------------------- */ +#define CHAT_PENDING_CAP 128 +#define CHAT_LOOKAHEAD 32 + typedef struct { char* buf; size_t len; size_t cap; - int tainted; /* 1 if accumulation ever failed → buf is incomplete */ + int tainted; /* 1 if accumulation ever failed → buf incomplete */ + /* Lookahead filter state */ + char pending[CHAT_PENDING_CAP]; + int pending_len; + int in_header; /* skipping <|im_start|>...\n */ + int stop_requested; /* end marker hit → caller should break */ void (*user_cb)(const char*, void*); void* user_data; } chat_accum_t; -static void chat_accum_callback(const char* tok, void* u) { - chat_accum_t* ctx = (chat_accum_t*)u; - if (!tok) return; - /* Always pass through to the user's callback first — losing tokens - * from the user's stream because of an INTERNAL realloc failure is - * far worse than a stale cached_text on the next turn. */ - if (ctx->user_cb) ctx->user_cb(tok, ctx->user_data); +/* Emit n bytes from `p` to BOTH the user callback and accum.buf. + * Used after the marker filter has decided the bytes are safe. */ +static void chat_accum_emit(chat_accum_t* ctx, const char* p, int n) { + if (n <= 0) return; + /* User callback gets a NUL-terminated copy. */ + char tmp[CHAT_PENDING_CAP + 1]; + if (n > CHAT_PENDING_CAP) n = CHAT_PENDING_CAP; + memcpy(tmp, p, (size_t)n); + tmp[n] = '\0'; + if (ctx->user_cb) ctx->user_cb(tmp, ctx->user_data); if (ctx->tainted) return; - size_t tlen = strlen(tok); - if (ctx->len + tlen + 1 > ctx->cap) { - size_t new_cap = (ctx->cap + tlen + 64) * 2; + if (ctx->len + (size_t)n + 1 > ctx->cap) { + size_t new_cap = (ctx->cap + (size_t)n + 64) * 2; char* nb = (char*)realloc(ctx->buf, new_cap); if (!nb) { ctx->tainted = 1; return; } - ctx->buf = nb; - ctx->cap = new_cap; + ctx->buf = nb; ctx->cap = new_cap; } - memcpy(ctx->buf + ctx->len, tok, tlen); - ctx->len += tlen; + memcpy(ctx->buf + ctx->len, tmp, (size_t)n); + ctx->len += (size_t)n; ctx->buf[ctx->len] = '\0'; } +/* Drop n bytes from the front of pending. */ +static void chat_accum_drop(chat_accum_t* ctx, int n) { + if (n <= 0) return; + if (n > ctx->pending_len) n = ctx->pending_len; + memmove(ctx->pending, ctx->pending + n, + (size_t)(ctx->pending_len - n)); + ctx->pending_len -= n; +} + +/* Find first occurrence of marker `m` in haystack[0..hlen). -1 if none. */ +static int chat_find_marker(const char* h, int hlen, const char* m) { + int mlen = (int)strlen(m); + if (hlen < mlen) return -1; + for (int p = 0; p + mlen <= hlen; p++) { + if (h[p] == m[0] && memcmp(h + p, m, (size_t)mlen) == 0) return p; + } + return -1; +} + +/* Markers that signal "stop generating now". <|im_start|> is included + * because if the model emits it MID-response (after generating real + * content), it's hallucinating a new chat turn and we should stop. */ +static const char* const CHAT_END_MARKERS[] = { + "<|im_end|>", "<|eot_id|>", "", "<|endoftext|>", + "<|im_start|>", "<|start_header_id|>", "<|eom_id|>", + NULL, +}; + +static void chat_accum_callback(const char* tok, void* u) { + chat_accum_t* ctx = (chat_accum_t*)u; + if (!tok || ctx->stop_requested) return; + int tlen = (int)strlen(tok); + if (tlen == 0) return; + + /* Make room. If pending would overflow, flush the safe prefix + * (everything but the last LOOKAHEAD bytes) first. */ + if (ctx->pending_len + tlen > CHAT_PENDING_CAP) { + int emit = ctx->pending_len - CHAT_LOOKAHEAD; + if (emit > 0) { + if (!ctx->in_header) chat_accum_emit(ctx, ctx->pending, emit); + chat_accum_drop(ctx, emit); + } + } + /* Pathological: token bigger than the whole pending buffer. + * Emit pending + token raw and bail (no marker scan). */ + if (tlen > CHAT_PENDING_CAP) { + if (!ctx->in_header) { + chat_accum_emit(ctx, ctx->pending, ctx->pending_len); + chat_accum_emit(ctx, tok, tlen); + } + ctx->pending_len = 0; + return; + } + memcpy(ctx->pending + ctx->pending_len, tok, (size_t)tlen); + ctx->pending_len += tlen; + + /* State machine: drain pending as far as possible. */ + int progress = 1; + while (progress) { + progress = 0; + if (ctx->in_header) { + int nl = -1; + for (int i = 0; i < ctx->pending_len; i++) { + if (ctx->pending[i] == '\n') { nl = i; break; } + } + if (nl >= 0) { + chat_accum_drop(ctx, nl + 1); + ctx->in_header = 0; + progress = 1; + } else { + /* No newline yet — drop everything (it's all in header) */ + ctx->pending_len = 0; + return; + } + } + /* Scan for the EARLIEST end marker in pending. */ + int em_pos = -1; + const char* em_str = NULL; + for (int i = 0; CHAT_END_MARKERS[i]; i++) { + int p = chat_find_marker(ctx->pending, ctx->pending_len, + CHAT_END_MARKERS[i]); + if (p >= 0 && (em_pos < 0 || p < em_pos)) { + em_pos = p; em_str = CHAT_END_MARKERS[i]; + } + } + if (em_pos >= 0) { + /* Special case: <|im_start|> at the very start of the + * response → strip the header (don't stop). The model is + * echoing the chat-template prefix. */ + if (em_pos == 0 && ctx->len == 0 && em_str && + strcmp(em_str, "<|im_start|>") == 0) { + chat_accum_drop(ctx, 12); /* len("<|im_start|>") */ + ctx->in_header = 1; + progress = 1; + continue; + } + /* Otherwise: emit clean prefix, discard rest, request stop. */ + if (em_pos > 0) { + chat_accum_emit(ctx, ctx->pending, em_pos); + } + ctx->pending_len = 0; + ctx->stop_requested = 1; + return; + } + } + + /* Safe portion: keep the trailing LOOKAHEAD bytes (any in-flight + * marker is at most this long), flush the rest. */ + if (!ctx->in_header && ctx->pending_len > CHAT_LOOKAHEAD) { + int emit = ctx->pending_len - CHAT_LOOKAHEAD; + chat_accum_emit(ctx, ctx->pending, emit); + chat_accum_drop(ctx, emit); + } +} + +/* Generation finished — flush any leftover pending bytes. Called once + * before reading accum.buf for the cached_text update. */ +static void chat_accum_finish(chat_accum_t* ctx) { + if (ctx->in_header) { + /* Stuck mid-header (no '\n' arrived) → drop the rest. */ + ctx->pending_len = 0; + return; + } + if (ctx->pending_len > 0) { + chat_accum_emit(ctx, ctx->pending, ctx->pending_len); + ctx->pending_len = 0; + } +} + int tq_generate_chat_text(tq_model_t* model, tq_tokenizer_t* tokenizer, tq_state_t* state, @@ -15929,9 +16123,10 @@ int tq_generate_chat_text(tq_model_t* model, } } - chat_accum_t accum = { .buf = NULL, .len = 0, .cap = 0, .tainted = 0, - .user_cb = config->on_token, - .user_data = config->user_data }; + chat_accum_t accum; + memset(&accum, 0, sizeof(accum)); + accum.user_cb = config->on_token; + accum.user_data = config->user_data; void (*orig_cb)(const char*, void*) = config->on_token; void* orig_ud = config->user_data; config->on_token = chat_accum_callback; @@ -16052,6 +16247,9 @@ int tq_generate_chat_text(tq_model_t* model, int piece_len = (int)strlen(piece ? piece : ""); if (config->on_token && piece) config->on_token(piece, config->user_data); + /* The chat_accum filter may have detected an end marker + * spanning multiple tokens — break before forwarding more. */ + if (accum.stop_requested) break; if (output && piece && output_pos + piece_len < output_size - 1) { memcpy(output + output_pos, piece, piece_len); output_pos += piece_len; @@ -16100,6 +16298,11 @@ int tq_generate_chat_text(tq_model_t* model, output, output_size); } + /* Drain the marker filter's lookahead buffer before reading + * accum.buf for the cached_text update. Without this, the last + * ~32 bytes of clean output would be silently lost. */ + chat_accum_finish(&accum); + config->on_token = orig_cb; config->user_data = orig_ud; diff --git a/src/engine/tq_generate.c b/src/engine/tq_generate.c index 0211a83..f3a69a4 100644 --- a/src/engine/tq_generate.c +++ b/src/engine/tq_generate.c @@ -834,36 +834,165 @@ int tq_generate_continue(tq_model_t* model, * exactly like tq_generate_continue. * ============================================================================ */ +/* ChatML / template-marker filter ---------------------------------------- + * + * The model can generate template tokens like `<|im_start|>`, `<|im_end|>`, + * ``, etc. as REGULAR text bytes (not special tokens). When + * that happens the BPE tokenizer fragments them across multiple tokens, + * and a per-token strstr check (like the existing `should_stop` logic) + * never matches. The user sees the marker leak into their stream. + * + * This filter holds the most recent CHAT_LOOKAHEAD bytes of generated + * text in `pending` and only flushes bytes that are guaranteed to NOT + * be the start of a marker. When a full marker is matched: + * - `<|im_start|>` at the very beginning of the response → header + * skip mode (drop until next '\n'). + * - any END marker → emit prefix, drop the rest, set stop_requested. + * + * Mirrored byte-for-byte with the version in quant.h. ---------------------- */ +#define CHAT_PENDING_CAP 128 +#define CHAT_LOOKAHEAD 32 + typedef struct { char* buf; size_t len; size_t cap; - int tainted; /* 1 if accumulation ever failed → buf is incomplete */ + int tainted; + char pending[CHAT_PENDING_CAP]; + int pending_len; + int in_header; + int stop_requested; void (*user_cb)(const char*, void*); void* user_data; } chat_accum_t; -static void chat_accum_callback(const char* tok, void* u) { - chat_accum_t* ctx = (chat_accum_t*)u; - if (!tok) return; - /* Always pass through to the user's callback first — losing tokens - * from the user's stream because of an INTERNAL realloc failure is - * far worse than a stale cached_text on the next turn. */ - if (ctx->user_cb) ctx->user_cb(tok, ctx->user_data); +static void chat_accum_emit(chat_accum_t* ctx, const char* p, int n) { + if (n <= 0) return; + char tmp[CHAT_PENDING_CAP + 1]; + if (n > CHAT_PENDING_CAP) n = CHAT_PENDING_CAP; + memcpy(tmp, p, (size_t)n); + tmp[n] = '\0'; + if (ctx->user_cb) ctx->user_cb(tmp, ctx->user_data); if (ctx->tainted) return; - size_t tlen = strlen(tok); - if (ctx->len + tlen + 1 > ctx->cap) { - size_t new_cap = (ctx->cap + tlen + 64) * 2; + if (ctx->len + (size_t)n + 1 > ctx->cap) { + size_t new_cap = (ctx->cap + (size_t)n + 64) * 2; char* nb = (char*)realloc(ctx->buf, new_cap); if (!nb) { ctx->tainted = 1; return; } - ctx->buf = nb; - ctx->cap = new_cap; + ctx->buf = nb; ctx->cap = new_cap; } - memcpy(ctx->buf + ctx->len, tok, tlen); - ctx->len += tlen; + memcpy(ctx->buf + ctx->len, tmp, (size_t)n); + ctx->len += (size_t)n; ctx->buf[ctx->len] = '\0'; } +static void chat_accum_drop(chat_accum_t* ctx, int n) { + if (n <= 0) return; + if (n > ctx->pending_len) n = ctx->pending_len; + memmove(ctx->pending, ctx->pending + n, + (size_t)(ctx->pending_len - n)); + ctx->pending_len -= n; +} + +static int chat_find_marker(const char* h, int hlen, const char* m) { + int mlen = (int)strlen(m); + if (hlen < mlen) return -1; + for (int p = 0; p + mlen <= hlen; p++) { + if (h[p] == m[0] && memcmp(h + p, m, (size_t)mlen) == 0) return p; + } + return -1; +} + +static const char* const CHAT_END_MARKERS[] = { + "<|im_end|>", "<|eot_id|>", "", "<|endoftext|>", + "<|im_start|>", "<|start_header_id|>", "<|eom_id|>", + NULL, +}; + +static void chat_accum_callback(const char* tok, void* u) { + chat_accum_t* ctx = (chat_accum_t*)u; + if (!tok || ctx->stop_requested) return; + int tlen = (int)strlen(tok); + if (tlen == 0) return; + + if (ctx->pending_len + tlen > CHAT_PENDING_CAP) { + int emit = ctx->pending_len - CHAT_LOOKAHEAD; + if (emit > 0) { + if (!ctx->in_header) chat_accum_emit(ctx, ctx->pending, emit); + chat_accum_drop(ctx, emit); + } + } + if (tlen > CHAT_PENDING_CAP) { + if (!ctx->in_header) { + chat_accum_emit(ctx, ctx->pending, ctx->pending_len); + chat_accum_emit(ctx, tok, tlen); + } + ctx->pending_len = 0; + return; + } + memcpy(ctx->pending + ctx->pending_len, tok, (size_t)tlen); + ctx->pending_len += tlen; + + int progress = 1; + while (progress) { + progress = 0; + if (ctx->in_header) { + int nl = -1; + for (int i = 0; i < ctx->pending_len; i++) { + if (ctx->pending[i] == '\n') { nl = i; break; } + } + if (nl >= 0) { + chat_accum_drop(ctx, nl + 1); + ctx->in_header = 0; + progress = 1; + } else { + ctx->pending_len = 0; + return; + } + } + int em_pos = -1; + const char* em_str = NULL; + for (int i = 0; CHAT_END_MARKERS[i]; i++) { + int p = chat_find_marker(ctx->pending, ctx->pending_len, + CHAT_END_MARKERS[i]); + if (p >= 0 && (em_pos < 0 || p < em_pos)) { + em_pos = p; em_str = CHAT_END_MARKERS[i]; + } + } + if (em_pos >= 0) { + if (em_pos == 0 && ctx->len == 0 && em_str && + strcmp(em_str, "<|im_start|>") == 0) { + chat_accum_drop(ctx, 12); + ctx->in_header = 1; + progress = 1; + continue; + } + if (em_pos > 0) { + chat_accum_emit(ctx, ctx->pending, em_pos); + } + ctx->pending_len = 0; + ctx->stop_requested = 1; + return; + } + } + + if (!ctx->in_header && ctx->pending_len > CHAT_LOOKAHEAD) { + int emit = ctx->pending_len - CHAT_LOOKAHEAD; + chat_accum_emit(ctx, ctx->pending, emit); + chat_accum_drop(ctx, emit); + } +} + +static void chat_accum_finish(chat_accum_t* ctx) { + if (ctx->in_header) { + ctx->pending_len = 0; + return; + } + if (ctx->pending_len > 0) { + chat_accum_emit(ctx, ctx->pending, ctx->pending_len); + ctx->pending_len = 0; + } +} + int tq_generate_chat_text(tq_model_t* model, tq_tokenizer_t* tokenizer, tq_state_t* state, @@ -905,9 +1034,10 @@ int tq_generate_chat_text(tq_model_t* model, /* Wrap user callback to capture generated text into a buffer for the * next call's cached_text update. */ - chat_accum_t accum = { .buf = NULL, .len = 0, .cap = 0, .tainted = 0, - .user_cb = config->on_token, - .user_data = config->user_data }; + chat_accum_t accum; + memset(&accum, 0, sizeof(accum)); + accum.user_cb = config->on_token; + accum.user_data = config->user_data; void (*orig_cb)(const char*, void*) = config->on_token; void* orig_ud = config->user_data; config->on_token = chat_accum_callback; @@ -1039,6 +1169,9 @@ int tq_generate_chat_text(tq_model_t* model, int piece_len = (int)strlen(piece ? piece : ""); if (config->on_token && piece) config->on_token(piece, config->user_data); + /* The chat_accum filter may have detected an end marker + * spanning multiple tokens — break before forwarding more. */ + if (accum.stop_requested) break; if (output && piece && output_pos + piece_len < output_size - 1) { memcpy(output + output_pos, piece, piece_len); output_pos += piece_len; @@ -1088,6 +1221,11 @@ int tq_generate_chat_text(tq_model_t* model, output, output_size); } + /* Drain the marker filter's lookahead buffer before reading + * accum.buf for the cached_text update. Without this, the last + * ~32 bytes of clean output would be silently lost. */ + chat_accum_finish(&accum); + /* Restore the original callback before returning to caller */ config->on_token = orig_cb; config->user_data = orig_ud; diff --git a/wasm/quant.wasm b/wasm/quant.wasm index f018484ec65b5e37499d72dfa881b5ad97eade97..477218d12224aa19f4eb475f84b767177220a75a 100755 GIT binary patch delta 36044 zcmcJ&3w%^X5;uCfdy<)CCdmYnkQZ=fKnTeD{f2`g4|$2G=&}L}C_B4HhMQ6nOtqDBE9E2}|KK~RE=iYRjb)qT#%WFY(f?stFJUDD@tS65e8 zS5;S6cc1y{*UW!L8O4)36bhC0zt2r zr3FxumYv2$PL7e$JewQLXu^|%rjI8H7V>#AGs8Y3B~^$_;Y%|Np41}1QoV+kdD1*a z%UqV5#2DjAN&FlUXvJC^!J;;8oA}td9>ZV(VR+klJYMJjDd`4F7kMHtoh7BQD`Vc=iQhNVZ!yjRm{m1(tYsPK{U6!$jvB5?jJI`p!&NtRDHpF<6 zu?viKj14s&XKa{J&e(8cEn_2$ml?azc!RN##umm#8P6~_+IX6=F~)kv#u_g%c99Wd zY@G2TV;39$WNf^#nXyZZO2#6W8k-oq%y@yZ2}T8Dmm3=yyTW*$u}Q{rjI}hLWqisM zV*^XMf*BU`w|$D;&z@kb*#m4h`-pwWs@VtZKK2&d#@=LGSrvPYy~>c(t+sSsY?P=^4_AeG-tx~^!D!3)OOPrs#?Kmsq z#_%*|8m6Z>Wbi@sKWpdlXdBNt$(xTeYxD8wc+Vqz%kk(j&*fd^>r2AMydmfdI5FKfzXkO>O`0wYyFh9^YpBf5-ZAJXRaTex9+!t2N zLh;VPP-1mv+PNBvJ5WYE)TL?TQWUj}M(TyMJi@#yt#4ammY3K`olL?*!0&fx^bf%( z`)63Z1nvG3?VNs?gVJUh!{}{B#s*5VgJY@rtAlr^0|WIe3H+f(P6sCsAfz0na8klV^q%ID`O#5PdYWPh{B%0s_Pe^$=iwYvltmOHiYSJ`_8+cX{=dWu==yrlx zts`0m!v^R#Og^fBL=Ab0!G6mKK{!$0L!Cd42jV;AidhkQ_!JMCr}S28Ge-=<$cPef zf+?vELlhutrjm(+cz|^Qot8N?&1mjdqvf%RW+LZClh3q>hcl^;LBqB;%S^6DGpkNS zf6r=Ps5BaILctludWdcHK0mva13Nc6uet7;$uErW5(ER5CW+X*KD#|f@=12b^GIUO zq(R9twTOP}mhj}Y*tI>lhqzU272(OU$s3;2uE!t0vNqi4$%E)R*rlF*7IN|KE2b`; z$<5Le(G59M!%Cx!I{jx*YOeHO>os(&Qf%!eH9kp9DaoB&-CAjNq$HVqsew z43b4<{Xp&+`p`DXsU?C*{o)#y!-;_90b%<<)nYgszM{lCU(lpiQbZD zId*Yt-Yd#b6D+!|47E~MUT3Hu$$##w9rh-6xMg&C!C2jCBfis$J0;;H$K;(Zc&8xI z7?UQLhH5l=pm5h&8x)BR=0y)T{Y!owmub9-y{>Vc-0VZu5~uDZ>F&k1G_v_yE%jL1 zQ%Q(nP?_mdEqP5nP8iMZW;Z{vQb{`4SYrOnF0L#No#N?R`dP`rzF|)qH-%Lhy{_er zY)kaVmP=V>R2w`0`##D$U&>d0UwK7mcn|ZVgj+j)ZY5X8f?$|c;`a&s<@qDY+BKagvT_^fU?%t zw?ls`$3K1v<2#NfMy)5ZqRm(P_?t(gZB`eWTaOyW zs9L!q99m|EASs@C1--aW;S|;zN27DsWTlcBF)_wfN29l|?o6m(T&+-lwpyV+vD!g> zK_?sZ%|yID!rRU+J$tJ~@vRnew6=WB1=QNdYgB8$tx>Jzu60`*`#)P7`9E75R@Yh? zFru~R*Q&nlUmF^@J(2t>gXq7}uJcx6u}P&RcB$fWA#r%p6NP;1(dey6G%KSUpYTyr zFF!Gq*Dj7Od^$f`_+(!G!nZJr!9t6*fW^2VEhXRIg3%OLySQ@9lZ{yY;ZDX_cZV^Nzf6?25y3vGCvg}s&V%FX}%_ zIz_28c^n7|o>vHZJ-@_}#Lu3ulSH98 zos_`o@J7zseX`MIaj(JMm#(u;8}q`+y15j~7zU0mAbZi?_;_Pl4u0CLo{*NYmB( zp*oZV)vtP?9sy~ZfNn}x0Rfb26H%Hr%0?PxK%*=K$|QyISbdZMN5BjX0hAp!74qth z(UF@j*GWNi>!tzDW7A4{to&=`mx2~O17dF^RoAlMO_iRPmN0fx^zN67*#(t9zg*zO zOq=<}xr}eCtsJoR1D^iuN%C)Kjgo4WpNtl~b#wFOEX>Fq!cvV7LZ{#c%nW~VtE_&f8;fft zh*l~a?fR!+BPw@(Fb*sG%7PF7;vsr&-1|K%bm%cmKOl)VILpc%-}hVewokhAn%|=T z{G=r%pMTOhx8^r2Wu5>n_y<_{*W_!i2VtM~?X@2H_^0#L`T!odMjB?L)6g@!cx0L( z205UhPk#EO;0NxD4%jz^lGXbr@rvV>{6AlF-?m!97xqVQ+uJ1SsrflnM{ppH;C;U_ zE0stAo%#D)=N$YE4m8%17Ca7%UsymzM}CW5w!djktu6|xB9AUQ{#*2J6rIpTxvD5x z7ya>D^i>pT};Rpiq}i%&*tQM6PS6{(_BU9{|Ew8dAUoHAY1NfiZj(TbDNE52%) zvr-rJQbp;yXw}K+yNnbbT67iQ@@K)|XtOhGQ*^INlrRznZ=_Mho zIxd%U*0lvKRMZ+wErh^6TEp7TxdPmBTpr-;cI8?6h`q?TR1*lbmIKy_;1`fE;1U#g7#WH9f`reLh%fUj_!^yw&uE;8 zE{o75C8Eo=(M9YrP&DV91z(dye7S@#Ih;~|tn$p1GsZ6;d5Pm!fIK~Z;2s+7*3M`O zdzU!$&5)N^wK?*1)!33u_6P5Z-VI4m;I&+y7#vs zDh43ZN=4@lVwbk%Wk+I<6tjVhd19Z>gqdX?>6y(&BDrWb>x$eZvl)WQ0DEXID-+0Q zUBXfU7VC07yj-LuH?Rr5#ual;UCJxwF!h(Kzw*lnD?V$%#L@-|K5MAp(uEDw*xgXU z*hLN0cyAG#dp4?KV{d0k>}(}+#$t9AIv1;6%zB8kLgX!NptXk@DrkIH12xJTDoDG# zff{$;&1RjIs+4=!?6Vf2t4-y)d)Se)mZG~o28=yN>g>_QxJy z$!0g9oQw+gQh<$%?cBoVV;dmW?=?1)L`y#XI-5lRO%S(FriV|qvw7)!|Cg~5d)W@o zevyqnWwY4Z6OUKEU#wtiC#{`~b@% z=zkqxJ)ENcx+wUKE;{!c1a_Yq5S#i9bd7C}?f916<)P_w++jAAuR11UhgtK!i_6F& zz|}d8|HQnZVIz%G47B{1a)nYaK!#x*(2fi|6!ma{%95E!STiqNM=LdW8^uL(hFhzS zAqc6ehHaFlq5KyYU&sJ@wY(&Mu|eDq9${t+8@IDA75RR9VXB+RM^mv+kFZ-9-+5eK z@*`{E%-^^D$Y?u6793@IKg6JU+@t4liprV7oyXL0H;r#|&dx|79V;?v*`$VVARI zv9e#-#U6HV?7&I(BD?4%uqxDpnM4yy027Npm`8<}6`FwC@{(}{!kArA{F}}i}Rh+&h9(QVSlpLCOOZ^ z+IjZYInOE&3n|kZowC2Na-#u6ZnW2O-DvsJpG^DFC?h}GYq@^3JWOTeNTbYDWv(MF zo1I1(dD18&Pugp_p0vE^G}Ep$%E*=WTCOWCmr_0QrBOz{wAXTdY5A|ytYK&R!F>LI z?@T{<{F;p8!%2*TgeVmzOW$6p!G zYfQc?U}n;oMUqUz?=O$t#rSi{o;$JYD-R1kBo^`UhnRogZWe-|7xC)dF*B84k;3kf zw>0J(2vt!Qf0CCn`F<9Uv7z#gY~GPAkuPTRHta%Klg*#+c_9U%sIhINI)3n{hW13v zmgYHTi*;cZT#R*18&ShTttdUn0-R#?vA1$K_n@Vn`Fsww)VqLJQTlrUpM^AbO(DM; z-uiC&QB&TQMdYcbd_G^gTVC6YKSPu@YtF}TUcNgvC&WEGZ)lKW*cC~|W>OnbmPCJ7 zCYddROXRW^ymf&tE=O@IP=>-%1+p|LSQdM~1^A;_$bXZ6J0Hymn^7WM7 z(TR7XbW0~bE9d?X8Li_l#0H!nXGo9r;7_Eod*nkF-_7oc6%Xdmh~#AqK5eZ$J(%AVi(bHAV*J^Ya?~(> zVH%dgY8#Gbpp){+VSFeS@AAiCoEGsiYdG)P-~v-l8qWLEy4+ctlcUS* zQ;g^MWjq<=VQG-$=;T-r*?0s$r^q&LDUNYVt!La)Vcb%bag&or@MivaBX-vaeufdN z9KpYBIAVc4K#Ex8$@fR{Lh9;~k-E!`M)8IN7xFKoc;C@yb{XpHbozfpF5E6-y;9{V zZfW?Bz)hAnRf&EqIeM%`>xf5Bw=J;mi_OO5me|d+jFBPX%;0JkEybDu`K4@3L6{E- zCk+W>;}q)yWdD(BKx?tdTx=PA3|NdJzKVqyCM$&Z_yyN`v?D9!O&*wBO^IAEmbXnsOCgK{(8-Fie0byP zlSHp5Mg)e2w2x7JQU)&Ktpg~~Otg?Q@FH$yZs-#nNJj+VI?(K59g_E7#4qDJm&org zf_*MsA}5UlpO8k!@zCWvmw*o-0m5kF$P=fd9@t)A!cQ3<1WZn4gd3e|5zahyahj2D z1+6l>ATHCi%FoQKSQ1OSnD=1)tCn!^9jW`VWbV~XZwxD-LSriji10L zGViuS2zlC@k@Ee^dE?yS7zfkeM%)P43 zcwWVW{h&cmS`ii&d1)Z6h8HUm%aOu|3~>P;4nxne|6B|m2qy!Ktb%$A_S5xA=}4Tz z*uz!km$IJ_u&qcBu!w#@bXsMUa-D9r-(NzC(Ff=-#bq)t_5%*Nz*K>aM#uw70J4e> zFD&-V0x|mpbp|zI0)51512PMck*W&7Dc=-Ux5z9|244`(A}Q2~sgssEDbh(hByh4} zYZM~g7oZ_1Bm{+oppbOZNjIJJ(utvyK{`=EnP7j_Z6k!S1y6B+#^f%SR@lr~Wiw-` zJwm~07@J_8|EE^+qB%SqZL|IIvAN2_ghas6>=(+QWKbwb$@w9~a@z7?p*(sU8EQ<) z*bvT>6!Gz)Y)U4CatLH%D4*UYhnkqlvb2z$BTqfXQ(9WJ0_RX#!VwImn8uL$lLPU> z2&c>0bNR$T3fWx{3vQ>#kLU81Y3Wp6NPo<*M<&hVW+Z(G)}NH)3+0eP)8pi@7lV|q zUnY==z`)Y0H`p^QhJ;h~n+T_xUI>`Lp#V`3$^-zm{|o#fEWzMG7Go#QuvW1@{D|Vx zIkp3kMRIv-7{nJT?Z`%Onq{Pksv}fKU{qjMASj~VgoPe>&2*3s@yC3KrHTpT<|zmV zA?{e{;20N4dM2a+CIoVygJQpl}+ zKw_E2!$JYS8K_4Et*t;gdNLA}s}3Y#K*Q7^34nE5D8;Ji9qcj$9vpaJ1~K;`!7(Lz zLr*bDnH2t36K^A$cq7)a_rV_Q1>5tk-V09iYEOVhV|*w8;)AxNPaF~UpuK=*fj}~{ z00e^#=Z6DBR2j$&_|Q$7aVH|dGH^WR46oA}*iJN75K0k)T13<|9WZ@n3LO{YCB-mi zen8-fM2Xh!np$jBlcSJsW)QoAms#XfVDmcF-ZDfTC`^j!9U8`3-|tM-hKZe)32IqN ztRedoKeiE|RIJHDl1USJ61oZ~a!nd(pT+vI8hrtaxNcFg$P+PBu@Zu|_(Q3hS&$MK zgSNaEbPK|g3r$0LKkP788KWTqsYuZ`h)v^}5QbE<@rZB)Zz*Q#=&&~p0A^$QH;fqS z9fd5evOEQpK%8e0hX1_#qcz|69^#hfyV8EMUkgi}e! zKp$yFDinGq*inNl?2kbW19%4Yz5>fc!G$46nhylVd6!u*Id!VwO~dMpbf5?c#0e`l zl1LLAGwTN0qk6m@x)7m}_PAY!9HiN9u^MgAqSRR_M$mPPNhbsb1pkH=gajXiAqisr zirLUZf()DF11GZRc__wEotVRrnu6YFJjgPjccUXVcyKG^m6|k%AFQ za+Lv=1fjkl9!`at2chN= zO^kxpZ_UNqKv0bvQCc9j9?hc-mANjU;$HVh4P z1D0PIWMojLts8K$C}d;2nTvJ|Q)M#r&1VRKM z4}?8HWy>)->a$7@dPhP?d={H&@!{aO2hZOd7tj-}8GP2lgWj32nZ8kBOv|tk{t|1y zm{$M*+SxpEe+Ej^g)juPCRN#@(J13)&X5o>T&qmPkJhnS7n}<$zw{2aSJgAEg?-$| zWzHjLABHJB+yqtdJTlb8?z~rZKGSSs`e|iK6Bz8E-?f7Po@jbTg#AkD_3VWnkT66r z*^?N+?o3b$2c7k~I_m=`FSE#O#Z6!h$cmB_sD{A*8jk=Ql-6JcPeS2SZV!n$YU?S$ zr({V!ynqKHBs}Pr79VW;VGpD?}a2_dS~|N$-ChZeorUxrc<44Y|G#(6s#(Qjy_dw&*w8ncJjSt!yPcu*soUj}- zEjTv`K^DW04g3~L)u{}2g!EfKDpgl=7F1oXp3fUcl%hM`4cJnNbI7FV>TeL|~DYMgjqkj+BJlO-JL?6kc*!)65_=K20OWDYqHQOf?`U zTII<{6Q&<3?@=o6C6&h_0c!~GJ1tJ-)8eKN?KXS#tbqZPwYTwJ5ip;eKU*w(|b zw%#+;mi$OZ>fQN3*?I^@y6TbF77QN*Bi-qfD;PFE1hpd6Y#4!$s}&fNg6o=ttDjO7 zkQQCqYAa7>=px^-0h6bI>gqeYxW~N6E0LPmf^3_d{IUvf4u$&L;Rem}a zsV1D59(=NDAL!5)A_Q3Q$y-W!tIWDC)(Qkzkv*w5zcl*DuC^mF6Lmv82|VnD1UUL3 zf$K>@CM?{}N0QeI97{GE6StGiEbt#mHoQ|LIkXB)HuLGEXL4-OLVi6qGh)9j;(iv< zo7{*~J95ZiSXh!przEe26zXz6?x77r@XFpBhk-!U-PtwAGxD+7pw1W~O+}=VOyY61 zfgXapY-JRq3CN)-Xyr}V3Et$f+ZXfEUQqMVJ$#kmwY%lq`*=TIx<@{LAAgF{OUtm? zGBoy88OQO3q4LrP_$^32dVtSGa$b}_%&QN@zKHVs)Hcza2l=ifwlvo75q>vgBDP}{ z-|Jy_%STu9caY3jgZSUkU2@$Tem}b_)_X1A%XsOBvEY*s0{-v-dGS;H2)kVlTF0a8 z_Sl|vd^`hOo2U5~3Vq53-pGs1tinzFcI>5{7=mN2%d#}(KosvHlt^n7J54( z1@eG+6L&|6LpKpzTZyZVh=25ib-)88x`>zI24AtYkPTDTrOvi+D>RIv0|c{z4GAF} zgjFON3R*v@B_x9X467oFPeeEkmRIdhV3^=6%&`n@BRmvzfROMc-5g}Yo)vi*`VL@# zj|it?hXjjM;tZ{{Qcc84lfWZ@XuVOCfLhd;8Ae2>3|fuq9?(7-u@M9U9J+KT6y0ZZ zik8bRSI|AHl*lFP3A0J@GBPzOiUy})jFPReK#4sXtT@rz_*FE6)hK87g42unkfJXL z6PTQACDVMUn<9ZGfMYMJLs+H~TYO=)t3x|A?mi9O&%y4HZ7n%gldvvHib2j!4QQu; z6*D7no()3eA@Wi+9BuP}43~e{V*mgQ7)7`6h>#JuOu=BII;86S!c@!C&KO*p|S`X8{jnxlLn!Jr+4Bm3$$-}>FcsO>zSYq<7hHvm6 ze;7*^4>rcZNdM9sYys*1Vj>;ZT9XSW+mC>@$1?o51!<6_7lSActZ7yFVZ%(J@jv+c z_8&jlO!$S1AD#kjZ2&QChXFBhS&YVHi*bN63Z+Mj!wX;lphNV0VF3nNH-;{z!QZD= zsZk0bnLMJ#Dxq)hyu9|8!;e&cNxh-&CJjEi>(rW0-d`0Pa1l`;bko9uV*wXRxBddP z4zpsnPJU%w>+wz2YZ2zn?(ToLA9CNO@7|{e>92S!OI?|!Wed! zl}yyl>_*nh>6#Wt(c&d%yX)N(_tM!Su{=mz_2VmHw+P6hfc zm|qf^a&5_^8Plgsn||%2YfiPKjYZ;!7<_!|+iQQl=ii%%E@b!zA9&~O-ygj9;nxP( zw7G+j6*maW{`Vk&WKxSb7DSXRKluB{Kb#?JP&Lolqz(BnX%OSlu*I0xup$Y;aN^nY zBMxfN{KwS;rac{4$K_IG<8UYmv*;Wr+6&8$nVTjKrIutfMv!Ei-~nQ?VchzF7E&FM z?d<$JI!q^pV1SZr17ww-P5Noi#^abM%mRdtEFWl18bW^})}h|P0zA>)9<~SyU|-RG ze8Xhyl^H=l=|~*xX`Lw{nq6ZON+mti1`#F{I!Klj^CQ_}JwKu^+9Y9?pve*ZCxRN7 z9pfiQ6+Nz-9PQ{aoCL=J>mv+6=1z^Ns;@+%vydRWnHoPeM)Z2;EVD$tiBVAxd8cX7 zv8f8Vni-slg4&6n5*5~!7%)9qIFzP1CmZ}BuT1)bjSVB9QiwGPmbE4lKLyaSPla&0 z>(IrvyvVO@!w)W!ooe{ZK+PgfI1OvxA}-g}@ae5~Dj%4#_V%~RdRhn6VRQ^9vBzY} z8?Id>yX@!TuDsDDc)7Yr-d3(G6TD-NFuXBS(MJj z_9~JRY_B4@5&Noyd_kxWy)I*jw*h+@3w5LSl?=N`NFjZQyS5=|>rU;hXQ2U<|11j? znyKgV3KkkjN56Z7%p=OkBW1;4h?gZWtJ?vtkFs0Qj)PmALNRX2)M>k^LiEy#2 zu$_{RZQYmp2i4H&G5nMU6P}dij;bC!d*yZujirM0TL;ys>H?^7a5N2#FL<+@`V9{? zMyZBtVT74g(F;LbDZk;}2V6j|1)Yg2gn%F(WC6~4qNjCkP#s7tW!J)N;r)bRXOx=7VfJvg})4o_|mseWX*py-c<4q=RSOK=*xe z#CN<`3Z~*7yrRhT$L{$K$51+Gc#|Pe1q$(|zZXHhnoN84v}}c;rxVl&Qp@1?yf?pp zr@ZicK8`NX{3{Q$ zpXI&3@~a9qziC)4v0-P2WMIOT1;^ml$n+C@82{#woN$7hJ!|rcgMX`A?{XMk7++9k z9r?}(B~w%mtNjysVXWUlG(x*5y5o@Cc!DqBTmK{5|Hem$_8=reEU89nRi_u*>YQRV zqWQ!+@o%~6H$EzP8-X>J3xDH{eT`Mg@_jP(B)^152g)fY`GMRrLQocmAX|a`D*yUB zKb~7br4@0dIjZzj?3zFLFN`mLUrzaxKblugVC8XOxeDy{?_y0(^Cv}KO&ZE;;4)~V zlVatm@`vA$d$~9sT1-?T;!;XzO5@PvE3n7EjeRe~4?J(@k1X8X?6Sb@9tn3@2dhL; ze?*&h%GzX6+W8N(LVJxTD21dQrxZM@rPgcXS}#y&w|*{Hd&Ol_tJ8|j?mY!!EM}LU zVkcEX5TlAJRmD{*RFyt^Q_D-CUW;Tn``<}sBgik+sfer7RMmOru>6Zpv`H_g;&K!t zss*8Vz$dO`HS(BGEJ!b-qB6JWmJ~4_Mb#_Js-j9&4ox9d zS*}eJ<8+rxvZ*a%C&5)Xa59oE{$fMOF&V;}UQXrZZuztf@slp^EFa4j=Vj+=G-U)< z=7QA(#T5>kjhR4GO68?)`L;~ejM+$Bo?Z)8v}&=c&?9zdBd2IwHtl4C#u|dCaUn)D z784xgO|nEVy_(9a-SX~P&cK@TnJh6;w$4Jxphjk8JM|RpBho9VUWHrlmu%4`xt#LK zWrrNmMHc0VOgS${T#Cv&bHw#X1~w7%k$~`V*|RitrBtKT#gd;Z{t5_4TuZqsSM*G; zMKr{!O?CRWKUYx)45Q`ZJkdM7hRSQ)@^|wbDx1g7%NNI3Xb*)`$P5u+H2~Gx9qasV zN)8@AATKEt{nD$bxXQ)zc%dWt*JY2?lvWT#g$uE?sl%rdc~LXbF}n5ODvkgP~%#9d$%R6C&ImP1fvEguHZV^5*+Mq8AB;1sm8F3 zaF@x<=AwHa;G77HOy{v^fn?Fy0YtmQ)P(UXA$>25ChP`Y4c9dwOPh<{MazJT%t$Gv zaMYF(?o!b0C`fN@66F5o;ym`FJO}Ye^LPq?$xtD+YH=XM0q&!KH^{q9(LAq)N^9au z`>N7q`{h60}@JU{!Ho{S?^K631@_vG-&1L!uSWt01U~IH&;%>g{jkrWT@; z8_zO4gXGZ`WV;By9Po6tsz$bIDO$30d0|U&0jr4(>@0#zt~p2K$)CoET)F!k@ltvj z)hI&^tve6560PLWQ6fXW-%8n_KU;~W>7@iz>H@WCEn2al9NU^8Vx_G`C#Es(ECR+_ zwKb3LYiW$j+h~j}<-lGd2k=MRh&DESr*p-7vb2rTEq}W+;Xv+!_71tvwHMpvsdE+Z z!VU?*x^F*qbbwoQ63?Yq5hYbFC0}$pvywweX&3RFtnQ?c_w9Nn94L9Vn}hs6-2mLa zs{&rx{j9(#JssdaJpp`ocLn@&&$9ww)!PAvhGc`pTj>>4qr$DRXOOPZhu1Wb8&gG&9BGMgdO1Or14;|- z0!vBQT1#~F)-)*&Vt-Hmc|K(Jm?ipHs{l+M#7atOIZG);h6za-HKi6!Dc0wPQkQLC zVUY^&@pC&e9h9>Mi&p8S1YYV=R*pQmd6L7ru*^G8WT)3A1E@CH;cvh55`g|mKs5wZ z;{tt}2z1f;A`b&H^L)`Py_#UFU08Yk**Mr}h$vuCu?`oABl52El{^#;1y3^M>)l0f z-~yx1S@--UNI-`3?kL^Fc);`Q32bbJGQglhLQEm{xwGj%9 zYM7F)a@Q!4;vZric2U%f5(S<2*ezERs_M9wkLZ>^vs*smRzu4La_DH$B)y92RJqO1 z8tpXi44od2&&IgpAyda<{H`3W#;^6*dW<#YV`D|T^m0OqD^+%%J_QiCdwQ%mC!H=} zS!Hffmy4XDra9f$BA&4YCxDY!&eb#6@9YP*3q9eHy^{5fUJfENWarC8SFoh^a+r+;mkVEdDS?#&OtD5j05ExQvS=jVxm=v1 zA)BYyLMw5w-2rNPg~E0I6{3HVGSPB>I_Q4*3emR^-$a0mX<|{oo`xvdtV+MOJLmU&LiBy@u$h zap~AL5p)FQ;fbP`Y8Me*L?6&@K(|Y0E>*h&|0>$^&F{$Be--B-jXj8zrB@TSY8Tti zD;2h*SBf6Jcaj;xUivmlVS=g%v9IcH$I;_x34D zJ~4YVja5$(0h&Jk^EWZRa0E=SNgasW=fl~mkztl5G+q2FxLgw%-KL3LE~i{2wgm2@ zTx>rEhgkQ?Hdl+M0%bNL#i+vh< zl(09J>#q?L0xL8y1Hu(DP>gf459%C_oCjsEV)0AxAzgi~ErSoq0W-xUj{9_v&lF7t zNL~8dZjD^pT*C28o4>S+>Uc{lHSiZ(5rZ}mB3mhs&lD#cJ*?ZzfOcX_ELdbc{LtUU z1iAI^;#_(A-^IX2k2nCCHozkQ$df<(UAz)prE#u*j|gaEt&*|XVrKABoqOjeI`>i8 zWsd08<1wAPbEnRItRAuYfH{WiaSiq4=Y~}XA|IFKb3~h9xy~*99`PZ}BIWY4Iif>O zO$%(<;z&etD9znyK^W4yUSxa9ea=&v`&8;Y$y?`&M$9kE=88kOWPI~HaZ?v{xnOwkHUBKGeDHc}ZU!=A&#y zDST2WG|u-Nz=KtxRzZH{r0RG+Jw0RLNO#9QSj3egEpjD z@OqgyUo`7;9fg<1Kz1lF>X;1^L0*ESlOf0??OJO$Le7Ad6r5zOmGkC{&aD(}aIOJQ z{-W&+JfVm2_3heWe?sn@FA7<;{ARv5pWP_imWcN3205WbG>s@OV7JlcfqEhi(B__E z0Ni`k$YL5Ya`Q8h(vUfgJz2m1ndtwMcK_WAGphdq-7B1jr~d12UdZDmqH#8Hfn5k2 ziNJSVn7i;gkv}1k5l@5nR9;#$Rs@v#vf%*2(49!H)Um>(_~z~m%c#3X;T&la3F#5y_>{E=u-2WMK+S| zH;aq$HC=i4&4}5}mtWtEh_7Gf+#=dRLlA4niVNZp*Zy<5{deEi?LQ}H+@g3ctgP#M z@NxoIt2kSsp4Z^_V$!A+@bfZui+}>i&u$TgnHyChji6Q5A{eqZ%H##YOslBNex`+- zyg;<-sbKEx0GJnSsZG&qHVY<-Zsuxb-=tf6c!y!NgKgd<%NB^v?Ufc9;3*C)!9U0K zQLNqdC7l$jQdjx!VE}-@O8MggaUnY-``jvqfX>oeMb^bo-#89zwi~I@UD{kvWz_)_ z@Zec(gt(^bpSnj3XEt$yAlTaar~K|#(PIG4v*_(C8ss7jH&&TY0xXhn;{qp}h*(#A zU(`srwXQ#=!Kv1Z@-Md`yapO?6S+fRbTo<@LQz8~>>-Fb^-lCt^nFQ&@A zXn9PqB4V*TU+XyAQ}{xPTek=;NAw=awwc$659m6s_H52wLUl(MCMzs+8A&* zPfNpV8iouP?)F%(x#fChpe87~gf%r(4}|Iy@F~{oaky`9-5y8&^O2_~Oc!}oedJa0 zmBpg(2!*he5Yh_VLHI@-!WBgd-y1eQJpqBETkGT7DhDnRrQ;Mntj{5=ShdmoM9gsT zzNzbY$)o*8JA85|-+D6<@3uO;UNB*s%)CQP?V;@7LEt4RI6x^ph#G3Ah8B@F1K(10 ztwRU26uc#$xkJ1LngEM+K13FxeNvg>Tu?U7lvR(Dr@-EPyQoelJLM;%QxFgxTc zcZ$v$Zx!LKIt$*NXX4#y<8`I-?fQ7%mKk@6P^5_b1dxmn14um!>UZK$Z^yw7u<@Nl z)bG|u{jOWC2Sdpc+9;9qGAbe^8hI=<(=xgsd{Id$s5`$Yp~nK0fpsscne~;r;cnA zyFGHzJ)+ehs8)MyCy}>Tjayz;O?!3f4G}DQt9ITha&5-%t&e4|Ja~^7KFn_|ZEe?6 zSWdKx$8yklt00{5?Ebh88?o|ZdDAj+QRXKq*DC)KM^R{n_KEy_nfSenvIbSe1&U5M zBjiUQ z)-H!7rl}qF67df70R%Yfi_ha4-isYHh|K416E@fO)hA(}Tkc}{PhzY8Nq~{3my2^D zWgwxPNT67XL&6tvB*a=N627pj>lg#N^<{k$zI4l7EH#NF)Fi;*8v&4GxN9vnD-frF zcE-`MKaP$invVT04x1HU)u-btw>;#GQfgcWezZ(E)epoG zVb6^RYy_?e{-!<=-?-&q##uz-TMcfDga%k7lBR0;rc`z zcFSENjwBLsBmqW#dA~@Vtw^XQ5~|}!_#uu2OwptpKe*Lx5`L^t0zT!42piq4bxAmy zNWxLg{Vd(_Dr&VVuGQMOR^f0^tFLSFq->xH5S}Q032c% zN5p^Qh}hdk5%FIaflb8m`a~Re%UvRVP9);z1Q<<3p(di7h$xRE;+Hrg;EoU{esK}l zMEqKxh+o}umxvRIM4U)~(L|`34y-66Dk%KvXv1%DRA5piDt>b@*i@XXPsK^M+{N;H zA{D>ahmn6S7by{%T$OnsB1+?k_#=*pmvk8D4;O(=#Hsp3oN~)uBK}My;?Mdpng})b zlU-1OB3uA)IB`0Th%fL<4dBFS7lBPgslI4JemhN9rC8i0=)*z{Q8z;^tPi7!IAU|6 zhKRt&6I~(}>FR2xT%?Iu#v{QAI>l#SyVIjtF>>M8r}Tflb6+^@+I4 zEq95yJCTUH>%(XwPS`|L5D^t|MBEcc1YBey;vN@)O~kVLL@aa5T_WyHBm!UhjF%ft z#2+>hGp`|CLj8T3K7<2 z-HMXstFRm5m)|~u!$5x7ewCPfw#<&Q*P}Sp<(Kz9isOlX`R1dDy!xf@F>wQu(#I5z z&5t=a@@4kpVwo2MSA=UEa^K_FqpW&dg|+?A3mSdpL;+U+^;I<`5f{U<%%VWz$PGqb$gL6Sctx>1@5602(s~+bE z-8O6*9;#2nLvFb%Y_eXnXjLY_cpkb>cq8LA4U<7bO&kp?<7n8bor0BaeVc}d>(lVC zTkdl6A2t?O*8Y(I2Ci^nj0=)EHak10IXUc~iC zz8{mF){2=8goih+6>YyyV534A>E=UQ=#khiQ8gM;N7{lPa3t2dTT<+1nHiM^iS zYMs3Xn_hj9y*jq(S+Rs828Zt-osVHc6l@R(9W>LWbB(UC;$2PW8u_=4qPt$@=7tdj z?uJ@e3UtzA+ztsAyG2N_lBD$kVqhjP;=@W#&(~^rd=WIMKW^%qoe-XC#R>k0bRM$8NNcpMc_|LB)v z)@{; zRE+D0o$2@&XzwKC#!~J{-TimpGOQjLO!lOF{RNTJ z7e|Q@wMe-D@mAVqvv9MCzB@@rjFxhIat0`zkik%*HMjZS-MH&W@CD%JGvemgK18ZfhdOQkbRf^6b#oexB9c~)B35^^a1=!=2!i2mEo57knbTbs#z>j40Na_k!BbQF^ z8yOXZD$v20`It(kgTwmp8Eyi7!zBF>a0G?h_r( zaCB+C8V(2$xDLKgvew%=fFCHcbs%2;v_#U5Y%=b07lwq*E)geyYq|sr&JaJGo+I(Q z7&XB*60f29ktB*9kxx9N8xR32FbhlqmANJc7QfgW>SnXV7^meB!jP_S>!wr7jueEG zl^nDYMpzAL*L6^c(;4_~mT913zEu^Z&+^*$0n{sYYwc@nun}euAB`duxVVEGN{HI@ z4ATo(=*J^A^)l+vMSyS6%sO;l)dF(x5fQi~`rcoL0t!1#Fik^nAa)61q>$3oVIW0g z_Gh_t0wJKU^$767^wC${l4&$?NQ{&O(c&Jw0H=^30yNTH%x<^-lYU2$_9ke&<*FA2 zzAQ;*HJ?6HeZEU7Sr?kk4 zq*CYTBZjby^yNZe#SJCzLzkxEQ4ks$is5(y-x2nl%84hd$Y zPDpU)1GnGkFdZ&w;p1NP36xZ`YfnBB3BLI;9!dXRI5m&<@$HX^c*VCrCL_U_zG+C{ zWX?c>Z-2}}f^UD!MS^dClptx+13##kYW4y01xRyy;L;t^Zas174rw0tI4RBVflGHt zyZ6MUJEQ}8;nE#a+)sO!(t*8k>253@@O`P5D5DuRlPGPD%_T}hJ#gy|X}AY2-63s3 zDPYZlmgYHr)hQq1C zMY@_rvg&2AZMLd`_X5s!`d&98UJUrv>DZQ(YC=)_`!w|O#H*r3+ME4w4NzT#C*p9ivzvVWRnfEl?apJb;wm{%%Bn7C(7hwJ48OOIud(jhq#mB`}c$27Lz$XvaO)IVwJnagAD2Ly`m4N zf=M4aICVdcob#Z~G{`QLpmuBNMsWA}-lsbJ`TM6&v#%9t;(*BZfZtZss|Bw1ce zle{foumbiW13x~c3Xsunh3RQ;q=zw)svlkjIC5`H1nZ^1(h|T*62ZDDuz(9TNu!&a z2-iu$`CT{-HY*XVodQcu0Gm+)yM6jR&qihRV-UEpuH!!{6Shnj^&l(TgbTe?Sx{3Nx4C+hTPTPEdeEpCaio(MO#3-V$&o`p5;4D0Zx&iR5Ida}N zA}9Y$o^?{?Z_bgcsr*bH716hHd@<0M_S)v zPvlI#KvGdoRMh7SL=*Wma`pKFQA9pbRi7_Dl}{sQ=i6-YsQfCOFF*NKe6Lj+KIZ?y zZUqzM_(ou=XWqa_n751sFcQUjVBHj0ofXVH!*0oo4#rs;lFfPu+9?EehELa@kqCxy z0Xd0wj|8_Kww4Mk(e6z?!_sKOm6FiA1`JoC86+*J*J`e6HPH-`QaE7R(m)*0i9|a{ zO5(sNO9SZuSfUvuow31W#UYXPH^{VZ2GMW( zW}m{LD9ekW41O`VDaw$SbE*k8=B;#{*-U3PIn@M?v6Zben^UIz?yz_yjtW|)g&Prm z(50zJqO8VFmZCy^@i~jKIwBgwSMF#sQs`?mi{2FcHn^DfdgI# z4)_3<@*}?H{E>@KJ!yRUQQ`ZjN?!XT@Y!&{=fDA<0|$J7`~63J8f>?VPd#aT`cdKA zT_yi^RLssn7Jkkif9-^R+iHv#StWy*5%gJ)XG0}q3 zYNUPwNjjBgFy2^}9mmbMte?dA9E=C>;IEw!*6P<|xBVp6@<27J2s7){Uq$ss+5Tsd z-Kx2k9SOrF)2GRrAb1Ni+5zhSNL=@Fm5bXsgjs^^nj|Mc0@=3G@W z{l=071LPwuJiQASOrP`jN!LtUFu>^dw>k6i7Z8)?Ts5iWx@%|0K560cW<+MqnKEtC z)mLA4RUdQ6xN&2~nO~KzF#kSh+VolGlxwHXxTa+KR6tKRTc0z(wRz2abM~APbN+R6 z=gzsdWcsuYsit}UbOM<^&Fo~(pFZoVNmEKnW}CC7+%)~#`R0__(@+5*vr49no?c?! zF#VdVXOzsRYU5`B*>&?H*G-u<>n8K>)2GayZ7zDxU^M)DMOr2rQyKc(tk`7bn&NZja z0TQD3CNw<*w9J}LEnhr+I*O*w??~O~rus2Sb$EV?5d zo)I~pzRmne9I}GZK8|M=GOWe#$PUGxF6mp|L0J;Yk_>AXZq5_~zT#bmFAO7({+a5Z zwewy1ezE7m{Pnw-Ka=u3_(wP!;a`?vZQG^VN`2xz;D^9eyeIFR>A8R}d|!Sr)6**T zU+*J7U|2iem#1fX@;m%Y?^viT{DAqhGYB93F;ar`*T7$tZA2MsWLW?DKn|VdX>s09 zcu&tqaehAi#eZ0+tgQwR2-t`;y9s5gKPg%C2+D}K{}4m67Hjf%VKL=FK(jSkqT zhyhU$QG-zj6*MY`B5GJK)-wnqDoVHnSCId=y5GD^MzX*E@%jASCH-Dkb#--hb#--h z_nVg@&A&a?tfE!22#3SQ6uz$QRuQ1$&%TupO+HeaJpw@3VW^+w3j&CaYnuvkmMuwvjDmkFrPD!)ytY>>;+8 zEn*S&AbWsqXYa9X>|OQ_+sd}E&1{pOt!J+=A8X?|aBtw%$Oqz_ygdh5F*io|nQ54= z(xAZy(f_P{2O}L^ZIV7d$gGbKMkcx*8zaP?8fEODtU*xH}CLsOj#}Xfm{cHJfr45vBenMof+e)38#YGrm-> znM55-1^5UQz_ZECq^Ti=4jq6VL>_O}chuUWganKi)q{_Dt@TIQ;BYTct-?cjWjX$H z(JN3*kL4@1rq8f;uK8q*`$np|=BTYN6-Og&nz!;VRCGdakeqWga(?rIP{mQ!$uKMz z$%S;8{zq!4S_8C2MMz%HY7%NXKp}feL!e2ZZxiX_$63AGn9 z5gtga&?{zz>ESpYw8-eK(r&vL1i_245PHKR)d-=dp3qprdr7}LPn|XEqkC2&1!p=#-)a%dY@42`jT`4 z!n$cDKR3Qf=y%u=H;H*oRws_+-&vhokV3eu;;GD2DDHd|^o>EGZ__N3=k4B!&zC1+YfDxlXtvZ}xoNzTx^x(xfnO0B~ znV2&?s0KPy+F=iKH;|38kkT9)=5ORgL$+2MX3}WCGGge(Kk0RQH>6Q}FtRmw>{zX7 zrt37uPiauH)*w?*FWpeWT{H$Tpp0Ord$%d?Rb_yQIAoy=uo{QF!2my& z|Lkd{ken!mGA#ikUII#7Wg&OW(6Pd81&Ichbip)Kg^|65+fQ4WNUSU`@^g#7<~NAk zOxCh?OzPiS?oc&xYMNU&Exx8Ptv09X&LwM9Xy%cD?Pd1xlDNpy(bRCWd6bQ-o)$dG z(-!x)k^=ogEr^(M+~DQeQNmEv&w^&G01e2-9u4&(rzGN<#WEF78Kr7Qm4-{mG=^?f9%>qNf(`|3-&zRB1P5wm+M{(0Ys)qMzG z@I_>P_m1qX$b~(IMNV{|%-*iPu*V_JpZ>0TbgvgLXKSkYqcd4{{bqJ@f`Pww!qr=( zV(EycKnIo%YzlPFV@;MixGE#^@?#yFDtWx=k{2v%vc9#;nyk`G*T69+{$6r>+qF_cJuzCw{cxxyjc#{D(T~HL(x9)OYEl;ltZGt&9$eLAse?_I8oIj4QctdKvabI$UFyQ8+mndqKHVT9 zu0@pGSVYzPp5}}n+8EitrtQFk8(B<)lOdiBER<}S)D@YqA*Pws$*7ORuv8xgVxYqt zBj-QUiq~(9Oj&z=e4cCX_=J8ipEiOX_-uqfk3s@ zbBgL^)mf4ApPNpd^7eBLog#FnxEng9I`Dih>v(Lf(-XZ1_ng1VI%UiVCF%N8EhUHB z$g*@Yeq9@xP-V6_+E5@h#-E`vj;xJ5P!(!S@=Tkg-OX@zNVJo6CxhfSRVF{WHp15B zaTG6H7vjg(Mh30R&LpMqYwZ5O##jP&={kkIP-CxJmr$8$SJqtf@6*K#QQWPHA6;km zi{sxH(>X(99BF8pFQ#+m$&64NQ=Gh5)US<*R}X&cL!P$% z7`ZOAw(wXr$0DsZ|AQ@zJhHivm)~2xesd{L`tlf-ZO0bz z0)c(+TuX!^eDlOI`OtYX9T z0+X14yQ>`P6TiyWM)ADJqo09`W06-rE289^&${N8AA_^C2@K&M3cyV!&*2?#x3gdG zcaV>NI$yc@aM3j75F6vC1)c_v@*84M3>8d1y7NiF=iD0^{N-hoJo)9Ny!K#q(pS4V zG5gau%~`O4SzjEpCw^s?hd2OxEq9%fJ^vV%Z&>_U1{Qbx+yVkEJQn%ut`^x98iZ~F z$fZGxjz#VRXo&{pDo~OJEj<=_b61NIl^SF!P>Kev)OkfJFIDGxg48id8Ah{2(|{HH05B))ln39BwYxjm`+_n*7?nyZL zIr7?xp6Rt%2z4O|hY@l>-l-FJMn}2W4JU@MPGNbYf{5?YuPjtCdn{UF-M26^qR?UYAV@?(>Cy+}ck!CpbogRy6DSaY9EZR|%~+=DQ`Mk6GxbmX zmyf)}{wqM9(|@XD3)sHIo^Oe~#FDL$=ahV55mr9wrO`+;c8L3qoFa=COtI=uG5J6y z%l6?xhj{>sR%f!Cg`lU7x$Ika=Jt27pq>^?GMm9;4A8KueJ2`f$+|H1sT|*m&EUTc zh;C`c{_wJM<>n4-8BAdG%8u-EGW5}Ho$QCso!PA_XH6FtO$F-T`m)anhu~A6=avkfA+y&P$#Cw7E$aGdA zkkO%xd4Luzxfz}?(wVogNhuk%(@sw0wbPjT%hg}GJIqQ?3z$;g1mLTt02kfaM2XLu z0!*0IM2Q`<*!0tKb>BU#g`6>)?K>@W+8lNj8W{a}4l5C-#b`0Vi5eei3Xrp)i4u#N z0yMk3i4ynS&8D4}EB`(0y3+xoXl^Py9Ymt3C9-ZI>q6~qelN>SC4K4m5W9)<@4t>d zv4mZhLpd2$Y`u@28{NKv-H0H5bkOT;Dyf!y_6;_T5DE}fPojq}wy+s#eD+t-F`uxl zoP8*>ce3l)Mmcw<*6`yy*(Nu8LT>z;T~8W+<~Pivl(cUhdyxn1=|02}e!zixM|Ru8@(DC%59>wgygi7WB7MV7kJ)MG z?^J&D>hCa=*uByB_Ob;oOjmN^PwaBO;TQSZPpnmV<-G_sV*zNL!hd4Mu#n;B6rZg5 zMfo?Lp~x_-IqEG0|3f{{^p7l*QhpIqPhW%LezbHn*^nDB`@uJ~Jb~LRL%Zifx#xan zFD^Iyg5V-(Wgrl2*KHcgjj8!XPTS9Nvrb#3e0V=ITgOP-D^`$hpo+%igL3rC{p_EN z?>i_j{Drl)=TbTA7e@ONvef~mHz*96R9$*frOgLZIFqV;^B~KRj~`&y|7n|o!c7#p z#y@#`JXTKpQ*r|+NfxnvHAT_SlQlmFZ{7n z48eD#$Jq=O=e2h$s0TyB0-}ntF?$0*A_RP+j_L9Em>z8vAXuDi$MmAPhq3d)xl_yT#RH;@OQ+;W1o^a7@1 zZ#$3411DHv+PR$MUe3orQ!tA1PO?V6umAMEuN?IU)6Q>_c7Br^oL{-%52n3ez{vYe zZt#BP27;0M3mCb-$(G~($^(FnC;t~P@_&=9IR6)UMJn&~{x87f|0Y|G|EtJ!q`XMi zZ*!C<3?%Y|lP$*+mJ3gTA995OBUd=ta$I5g#wpgcFTA)V|G)Qz7eByT!vHLPnO}X* z95wDs&^kQnJsRc>ks+{#7GOx;IpsW87@=xxqL{{1QuwgjG`!vtLNU=qZFq+Ks11J>t)A7EcSJj%Z_7Iq+jVVu z4x1&9x8--x>&!Fw>uj9tRm=ynanW0f`9;h_LN7$G_?Z0K&vWG;?Rnd1UVFZb@pT`_ zS32;Cls4Utf3T>jR>XsIqWt15@`N2c?-U-R(3p-|AUIHK9jGc zbWCUdAf?}S=Bp`P*oF7-*;GbjUa;%#-;1}Ep02zo+lHM>u$WCo1J3En?_qPJwO#q& zu{f7!b>}aze?|{>M=rJRj-LElHa@zqCr`shc=T8aU+G~BK<^2!l>z-iX-w#f;B z@F)Z+8rJYEehn`_AHlDPS-(^mI5M(9Pb2H+f%Qw(*6*s3ys`DWe3_!ZAmz6>p9j0u&*B&{R`~)>+6KSC zVnNvHAeIAg7kq9nQia0+!}Gyea{BqaMOs4Mqv!LI7LD^j*T6DV7dtdd{&GII5=bSW z=vS^w945vRsAv9eLT0CJ=ixeFyzuUkwpB3oI9Rh!1v9PUyp}9+6ELk&w=d`by#ZMY8!)$)Cb*S0N@qRcnrd9Lj%-UH|_7`5l2Gq^s zkTM*yeCtiD`K?_I?{Crjw+-)a)BEm*_dTqasKCh0|i51mJI7{dV-zha4QV! ztwwMEvfok^>{DIFNANZhG5^ znW5}ep@8WQToz2j-{JD-zd}B<_sZa8K7k+DCvTa|C-WoUM?aa&{Q>^PX1Qk?AHX+% zFI)eef5V1LaUK5~+4HNf<6m;VwMd@4frr9~hZ&e&C@dD_l=#r0NCH7OC7D4NCAnCU zA}I`}P+|sCDJcqiC@BtlDd`yWQPL&oHH9C>kVF!^be*}GAJ0J8orVqg~0@HH| zA7x@_m2m7nDH_N}hmvG6vE_}P1}C9Z^m)F1nG{0!8qrF6O%3!4xrc<3^qbpEH8G80 zhcvYyC;;}FsRdrhl6hos2hR+PL5Sy~#tEi}Y8;FDb;ZaY1jk_%_U??4s%7MCI{(AD z8Pkom0UwIhg5Gg5&>I1?DdYw-!6d8P#NKOC0eItOffgbE5Wfe!2pnU8%EQad1UT$R z#KH*72qC18TA<;WUJHUaXo5E8I+U=w0wwf)5<(iOpiqkFj#+AYM+VcpW?CaYOh4>* zV<~#jJ#WT1oK?jhjx2QI3-CXJOxQAb%jTr)KKf8=4w5(?E}N~ zV4Ka*>mk2h581OMF*^a;mtqA@r2INp%K`=$wO64+BKpsR)Z)r${v&u|=DQQR=rU%U|_5jf#h}fi? zX=YO3g4n)bn!UdQeMo^F52Xx)iZE~3X#qL-lSp<-HbIxybj29aDy<}w3CUok6@YdG zN`u0#Qz?P9wH6C7h=|9D$Q3r-a6B=NywIVTp5RPGdkjf*Qy611)C_yYW1u{=XN-7L zYC$M%D(QilHWF$JhsQkz8~iTHpnt;)uqsL(t@ivWE46@fOGB=y)*$5>L1k~70eaLS zysQ*^5JoUbkfPTarRa?QGbr2iPY479g|Zl*7`eAT)yNcS;)VrL493`Ujir)k?jqsOri7(kimsZ5NC zG$3Q-OhM;iC}2R0MAu;6g=$-vZelQ6V0*)HBLfE;un0rzT?JFICsl{4U1%#>j%T#b zRbZI_oO_zyOhZRE3aYSnKxw_K!J9_w)o1%`fhN$X6O z6;`_MSAC4;Vida^E9gS|1iAtLhQs{uLFfcO#yBqvC6V4@Xrw9qO&SJ4GeDr3sivSe z>Hx9>Xv~<9AOn~ReTxkRNR0XrG{Rc1z@FRDd^hy)0*lNOcEkfQ%?ClE59+Yaj^5Lp z;*Pz!F|ue3LqRcyM*&eiO@JX%JqmHtK&HAwE(~SAA7}I+%2@eEX2+N8WYd_TAr8@E zh-(!fvlzkwGDrliIpKj%0Srie61uQqgwO!VQ3J%yVNlSuHHMi(y48#Z3G|`a$PnIA zVE@S<0ZKFaXO9y21(e|~prn*W2y)c;prND3M-nP@p=+F)IYvw;|+v@c?KgTf&stbux<+4Z2a$cv>;K>KKLk{!Z=3PumwpBu~o z_jzXwvVJs%(j&FQW!0fx=* zsF2yp7M?w}^vF_5H5`4*ymHb^UKj?uP-M)-B{)KYow*X;o9w)qI<CLyvtj1ei21_LJ1uCkud5u7Y3& zB$*U9_65X&`6F)ZyR^<=g(2^p#e0V{fPo=w&jDlwFoaJxVtuBc#3O^LaU(bk&zJN8QX+pPrXh{ihaOm6>^|X84wkba+$_NAkJd+E z2TLw71dvpS7$FcdRk;`#R(wdvi!p%`gF=iZT6%-Ci%T9R*M10m*HY^*6el4Lri8FE z6%=uClvHOyqI%N^RiwvxLFDdCiDBz{fPv*N41t_Dm$%KL6|;k2)p5kB0P6ao=pp-T z;{GtqKbGh;BVaL3dVy6u8LUDiFjFQH9H=Qm0*9#>iLZorL;?q^3lhxXJ&<5o)EfyF z*Zq+MO3?UZvsnp02T5LUEU%Nz3|itQo0+uqO*Zp;^9e|@Ky4DzZ1UQYqZi)AZ${)Z zT04(>Sy=C(!6}aU@(iPgxoO8NaJ8S?du(28;y~1~tvB>FqIa=PIeY&NDnRFw5UW)C zZy4)~6b9lShLs-kyG3k24tVIG=tT?om}GGCbOm23c*PDmY!UCzD|g7bi}+KNc3O-R=qTEg$A@NIPHBYeA?-4)HK6pWP{Q*769u zGy2S0K7p}$FF(h>R@CdB=gpH58$bLqpN)|HEm2+)7>SWe>VO#sS{E1@-55o@+cjG- zmpm|x7sy9H66AkiXEOzn3M4+cd_F=ZF@dX7XozzLf>;%BjHEa-glvS0lo?WjGim1M zWQ6RMyR+R$I~i0B5`!cj=moD7FSxiv07-1-C?qTpc7KoDx&n zfdcPu&&h`lVD;-oeh?;uS_lVKN9<&{Ie<;Q@Qw!I$qwZzk*|lA<7nDaGefDCvl3 z9g;5aevtHl)%0SC(L30OUWX%2hw`I>-RW%t!gb#Eh`|k@x5!b5#vKtOev?z!Wqm<~fbeW|-F#7L{a8V_-$Ja}Hpp>X}7W5v8u(ApEb zuFyOfO{guiDv@HIQY$!GvSHRxi|ddFXb9?Id4@g<2l{IQu-MQ+?VBP_N~`suW64K3u~d(=fs?@1bHOg*q{HQlyjOP>O!&h=it0UC$`J@c}x7 zhoo31zzWlA7ya5pC%tuom>|Ih>4fOoO~z7z?z@o(V7fGAaJ$HcF0}0qujJ}zC(`8I>g)3zTALAyhE~0OJ0A7hdkr>Ae)YS z^boJipRYEN={8tzQ|TS;$@Ku^_sYK<=Dkxr#9tLcLeYl~V@o8U@lA%P7~E4}ojl50 zxG}$170b)qqEPnxmFMvvx5|mX@^O6YUb*I1-Z@+C8weV0*flV)Yryb#xIFMHb`9#j zldX^O(W$R}4(o#zMvDCE7`8(nIL6QBJHC~l9^>QL(rEkR{A?E%G1r~oJ(xH8#0h?R z62EJ=Oc7!#|LqfblMo^Hfm|ZQRqS4QN{BpGEAw1p_=WSn*L2ghy9NIf9t@l*+mC_c zS({fH_*=td4(%jKfI;iN@8B=elGGRaB%Zu1JO|QuS|LB!h~y zlKqlIXTJX|IW0--$t|ZM<#9!_RFPVFO|m$WTS?H$IB2$lei|)H5x?TL&iitjM?98S zL1-0mXt@e)*Pdv|E1nd2wSK^B{S8gYQ}9DIvMx;=Njd_sDdYngA|oYVLEhXQJ(@0l z=6U;mVxgX9x7*B~;ZV0#@JY!i74z~f^7rPVyz71{iAioBrBJ;+ltOlONUgfK3JVn3 zJv-&v3~|AkHU3hwXD>m)oo>CvdMZTHqe4}2g$h-n7i8B=QJ7ar(8@Sy3kBW2PhOQN z+ND(xxB_5okXW_yu}pCZ^yWmSxGk-mK;;hTt}HPDAeJx6E6TN^WurS2zIlP3w zL-o|49_Ch4aBGp&+9l@93ceDPOX3R&3yGn<$2 z3c{#xFs?5Uhcrf4`E*d6okj69pp_F^xr6pop|~hUs8D0n(;~3m8x!buZvykW}r-no`c}_^2kyK95aycU;E)%uEQu%&? zXe)mYi80`MWNUFF64e@C5xB1R0J+{1Yi*k%aWUY~ydt@(Nc2jpBY2$y-(RFyMfrJY zwFIhlK%cgWaoH+5rmZ;0f}4P#Y!M;W0Ff(aIMwQ*y8X3Xa!Rr2pH@ZSDu>RR;+XQ^ zB=2Yk4X-44r2~(&i%F|YPHrzcr&SQV!ht{BUi2I@jJ6To*6yNW1M0Xh56@o|wZoAW zjMzGQd#~uEf!99RYpKmc@8IUYR@XpJ%RuQ#wz?4Qa#`3x^z4h=$uP<|Xf=VH75EH@ zhFBkms0!nIBKm#^RoE453^i0B@97}=6raGh5?PFTO0j@EPTNfN*w2ZLhCWIxv*rE{ z;%v56cIhaFn0pABY!V^Y#Ub}q$aCc4j-pjwEkSGJp#2o|xv%B>9mQC4JrP08RfJX* zht^-A&5{E;i9u{t^o~xVEzheYtjaj7vlP}(yXD5tq6-2ZwA?UMR45>icP8^iB`Q#Y zj-J-a&Rs+i^UDcc#8BuGJ746>`iq1&TGK_i7)zIr_Ypbr%dTR5S~(RgM?r1(R&*0> zQSjYvVgL#zcNaLzK{BpKSYKmw>keJMV7v&(Yr4Z@tBo$`E^rRqrn#dBXx34QI;Vu} zsc5e2EBeGUBf(6L>`@|qjur0HOS~^{>#GKLX7B%A=wmsz52kii?N{>O{Z*m6`<}K? zG|(@u*&X|fXVa>P#VUuz@B5#&AhGz9oOD*Kq8z#4EEKM!!j(?pa|ZsG!Y>U{g+Cl5 zHp%+46_?Ac|AOEA?1swDK3hC1M-Nqn{y6);6uN#$L!rtcD70p{D)iT(|E19OVGV^& z4nwo*Xav?J$2xlba8@q@Rrq!y!>Y$Dr(HJ!(-y4DHYUW7A z#r{7*uN)~d(yE9?l|$pMk%_!U*|{Qxu{7x!B{rv35_+YB{?#aje*T3@souN{s`Qu9 zB9vA^Xca)y17Ysyn6|z)nzU7ZIa*Az76OUfj`@_r?U+L;7TM)gxSZ;+&Eyw`8hCrZ z!QuuU=@)j`AYI-x26TI!r`q>4J`K77XspQVgt;4$0kTtu8NkXT7>EVQl<}$|5;YEq%CT|si(<(4q(RwhoY*hFo}l>b zIv$L?(@mtyilL%I+)z9umrX$T&L1Lj<%bghe|VzqxCu=bxNu^uz}*u?dRhhXQsMBj zYGUIin-`$T<%Co2;GBIyW1J0yQ%^oiy*nnlZwM#GahCkOt;oQ}zf~($e_t?3^!J`) z?Xx>2AeT)N1zq=m1j($9NY#N<%(nla`*K5_p;i|^IK_UPB)X;5DsRcDx&4LzL(SbY ze^T^7>m2JRht>}lf`v^s3pGTuCXR*unuXtO7WO;Ez(Ro>dy&XVtD-VEr)pcAqMVqv80+lMYgyEgYw6VMM2S@QHlwr zqp5tcTHUAZE)i`54kk!huD~#P=_Ml6tp*q-ISN&TT1B5MvIporn~`JhJ0tCx$#Fa%?-5G~UxNFEiAJZ`%pUOSY1&r2&O zoN@;z<4UCheXkVV!T)tvV*QzWm9h?x0wSA_6wT#FSBf?om?HyMK~3rqAhqg5%m}x= z3Yc3hc*hrCB|5olbDRauimOCFzWrVK%~c|o(%-HUg_P!8E!uSacq^k|)y;JEjN^Pl zmYIolj1|kChUj3HnXR*;*IX?wU}?2fajjGF``3W^ba~_&(VM8Ad9AKx__d-F|9Gps z=JN>$n8rJ|(IdVpyE;6+MdTdD}P3aBwPoTCm}SugF-YB-1K z5JS4WdMf6mN`cB4WJ(QnK%`T<7) zAlDGXwh^Yq7nzEK5Zzii<9gAx`KoQYp?BL2_2tXl>0)a4+PD}3wjJp38O5lqf3?T_ znQz@=qwCFj0Eyts)0JxdI9;uKhR-0m$XjQK%X(qybbd#$%urk@IfGF^NSwha+bZ(aMqA zZWiAJ7VF|Ca7d0KU5n)_w}?yI>G(R_64DcFEjW#TNTdJjBaQx$v}TGH=~CxL-!iOE zj6i!iaHhzTvu29#n=jF|(y2d0cLT-N5_#S~FkvscRSazYa12PhqXc|d1OEU}esrto zAg}p{cs1~-rhe!9RA)QuQQ7UEVrpQi&aL}Y=Ps4c{!{c0Jf?G(zoT;>le^}Lj0`i? zTBZS8cN$h9PRT8k1-FTIfyZ_33dG~wX#L}I!foQr?Aq4wcCho86!ha@x!t86x$ZWR z<$}kpo^m69yq1g8X)0G9zfCm9(N*v5;s@NRIC{IdwVOJ~09VmtyNVvGDz`z7m8xP= z$$<4>ccqe*vhohmCb!YW7s#Eyb!4UlZjoQzA#%evV`~)0&TxE&wur0@#8j*m1Jt*M za2O3aPY(!mA`E)|p=-nQ*L|^1PT7htIEFa%iYOPI{m_dKoB@GHq>oOfTFW(mjQIOc zHypFbM^q|EO+2iYlYk&7G2L)Ox6>V_2jPEQ|%MF*OqM2zVH)7 zS?yXA+^?FTR;1Wtv!0ey%EdYC7Wrhk=+sJyuNkC3CyTWBOzdz6`q7>Na1WM?j4+7` z`wv=DaKsf8aLhf*A#X+!Lr=z-S=m@p%^<0jHbXkDt1J&p7|!x)xixDnH+;aZL1ZFO z2Ag29b72K(yOq(^NyrO3AITv)Cagt^c!Mf+8c{x(D9R^oZjxdL4sqB9Au?(YtxjU? zcq%5;WG&UFWa=y!K#%M@3l4*={aCDejNjv|7Uka4BNl9{Jo|Vmv+=CdE8N=5COa=ZS78a{oNhv8_5`i**z9 zA+Gkdy7rat>e|_Vco8J~nY4pD(<8-kEake9_ip9ok}8 z9bw1U$&ATg1mZxIG4R8s~3nN*Lcis zX7@X=BVc9RY@KQ~150menpz#&<*Rl1*S^w{sBSd6v1^09Opmoz+cU14_M&bB!_gpo zhca-6^`e||wXG&O$VrWV8 z9+8=HP?wF$hwl*u(4Cj=5i`xb{Vac=pFVToSJ#@D^=dy%vHdW{nDv<)y-?hdac6&A z+U}>0Rnqpnu#WVLe!WmU$4BFa40Z@m3puqlR2!{QV>W5Mmi%X!%_-=y-j=-nm>8yw zS2TP{U#kGR@QVER{h|{FBPJ&Jj5O3CY@$i4*qCv6RhOykrEJ5ivPXqz8-7h?1}?N- zYp6n7TH=fjhTwjS+r)Mu9YcfqE2@|UsO!}t*VdH{ijW%8)`uOyHLi`@BvX|A1=kxj z4%suDPrz&Ii`4vu^LDD?kes^bfz>gFBv?%J22mRJ(2hujfuY@yFVcM zja2l?i5@MfWAtj`=qFKEp%)Z%}-kWm$1EPGKqK73obQkM0n%QVdjM05d zmq~_QvmWV)gJt>FTZwevZlIeCA-pZ`eo$OqqKx7`y5|D5*h49tlsc-Xj@E&;95$)4 z);-^9wb&$EN5t#k3Ak9nLt%kS(;M*=>-NpMKC-pe@}8=#n{A%Fv5wi&z!P=M7TJA~ z=&I?~5Z#*7(B1lHx?63!j+(yHnC?4r{$ddf7t@FZ%ZMa^)zgrEH;(ic94&(w-%TXF ztug6s4qSJJQYF+;Np)1B4t)|b)q0P0Q)?n6k0tF@AHC-kw&lFNG4buhcafF{*6+|w zB1p`~$@%>_re~Ro?)!;!KWI$%1Np~d@v^2{PISvbmj+h6oAseC1BaAbJ3C|GUcp9|vv5!*By>*l_vY?hRGL(#4EPKcosS^!~wnGuL z79yrY_Gd>!g(RWaIK@)mmH|>)^2hY_UsSv+4v-Mj&va7O;9oWkSOXr`lf_33Y?Io3 zB!8Ep?H~-TPS{N%|GGAp`07M^NQgYu&|7Wntt!{nnNdq+!M)llh^3F$uY3*#N#6TrGZPDNk0^LdkIeSt?J}nadt$+`UH|FDO2k!9kO(GxP$Yx7L9@HCc zN4N&VF|N+KxH{+RI_n&ocAdK#*SX7qJ09=1HWf$W-zMO=2Fa(Fige9HIWbWVCaCL3 znPM8SJB}H9QrvA5a%}IO#?0(-;HTGx?=-rt3oa+3L$|Y^n?L~WkgIF_Qvsm zxsw*Od!0ILfqdVXhwmM@LuFs1YEkRH1RSLd>h>m>s3Ruo;+Xg$j)~vfD<*z$2-r;g z*qDhQ9k|29Pl-(Ylz^j|z-361L@hB<8^^@{I3_SVlO*;#1Z*aLZp_5b4%}hlmqaG; zK^*19sL>0hCMG<3MAQ%qHE}E)h+_ec2eEL#DQ>e+-9!aeANCJ+QexYWfl9;HBW8!EW6YxW*=|>#` zHWR-#X5v=|?l5sIk%?moIGPDH#gTa+E-K=ezgxI@9_;;$&kS&4ilt z!9+dTrg{W4V|L+>I3~Wv$88~rKO6!!6Q>$8ams-^RLXT=t6^#?$3l=|Qub)jOzgKM zQAbSF#W8WGF0Q7?S(=GE9RfBJvosE^8fdbd<-i>#W+yT+yD^StLb*X?7l?`4I40)A zF#&gzBr(S!U^6kdF%xqgxWmLzp7)LX4#Fj)2F;Nr8#Jo5r;64%)^Be*;6Z0E0 zG2ekZOe{!bVnJgZ&BSq=i7H~EDvpV}cCs zIAJqUNlaA6F>!Ak6HjzdOx)`bu$j28F%$PWaA)eh-=^Zs;P*Gik>TgC<+EX>7>ji? zPA)D28?^SSAa*LiPHYaZP=u_hDFJ&m(n9+|WT=LbBp{~APbRl!@3 z{I*6>>-09Z8@@t>%N@%^c6h9|htpA{9(QtMmS|BN565(eFLLUz zd05<-hs6%uq4H3pYEkP$2{UnIoza>ka$59=Roc_EhXbBsL=Et=BF;(0;hEN}A`D60w=f%_} zBggHkMEg);zb{MVd0F(PD)CQ&D=xCWS_}?6uIqU7O~dMgHPqwL!7qwWT!BiR{X4Sz zA-ghq#w%hj@8F>;Wapr>5FbNa2p>Ojm|d<*{A}-lE|;IZCVJ}SZf*!6V!AtvgQi_{ zPj^IuRWHRJv7{_Q0=b!>Xf`x|g{HXTW8M4}w(xjt=b#yFMbKshLb?df! z^yLl1+f}b;p3syY|4dVQqJa`A5v`nYwE(y4a3x4?*dRiFTonqW(s5yx_0t9s9JB}D zdc@Iu?1BW&E6_g({OU zd}IO4_ZZA>3-vI&jiW1&w0nab<3ja02z!jI)a_rn*|18`n=8@&5#$f(kNi7F{jq_8 zRdMQ2j6bU1LTr@2ltddF^Eob<(^j8N@kvebn}Jqe+2wVSor}$5fwN=yh#U`f8~o;+ z0k;o+5@0RAvzS(tC_5icH`MFMX98mnyU6jt(0~wnh&V@Av za^YA2E?j0Jl-8(c(Tm*%8yC>`I?(?vyoiYqYZ9)e4V=+7=sM{$@E^lJj(-CGf_;D} zz$`8@m6@JWv(H1Xik4wPpO(PpG+mcBu~SI;rnG`?50sJx1(Wu*CBnfMPN0A!S-aa- zseVx71fQwE&gLqpf~^VmP>9z;Jviv2BT56qh9g2HK>SFO%UaPbP!O+#P(d8MA{893 zHVb1?jlGa2>1{;p6=-!?h3_Z$ra^e%G}la_PY&Vt=Yl;T1A`0)Rnc{%qtZcB>0+cJ$TSaufpn7# zn?3_^w-0Sc>O%jj8~CUicc;*ED#2awX}FLUN+yGeuYFO+LhsGVP%fN8Qg=m4u~XFJ z_Lu0#N*L|TV#6C2kmPO~19fK&_o&J8=96{p0ygl~f|x3-)_ z9gz;Z;T{$~`9t6R@|xXy@li-{cXAi)*p zX-Kk4_;e&;++8k1np47OBIyg_w;|0f!7WInJ$jMniBBD1gOk$y65QiM+Orp?c%*0b z#w|#s_(JT>X-jM(QQ8WdN|Xk1&xF!Y2`)h*Z7r)di(DTr!@-u} zY(Fl;$&WXSK7y{S2DXUo03N98416LBmbtuKmTVClhUx-CY%@(38iqM`la8*lTevyd z$$(bOz%wiky{ixM(PdbDw-U5&eOGkK*Z1%2jR`m1qBd!#Uj+UkYu**D{Y(4fMkcQQ z@quxW5p_PZhlYYvZ<( zVFG!3e%m70k#zhjfC3<+KR#;T$w&)fzEHnk1a#!iOhoIg(EJH#Wr=7#6q?UL>#fO6 zPsHn@@VpM5Mw`}vCM)-gOwB#mxuyZ2F^g6C)P(X^B%&25G?@KHjhm8)W-7Gg1hmPE z;HNZ_RiVO5O2C_xNG?~Qxf9SPB%)<1G*<%J*f=!#!p}I>6YHr_aS+u>K1BfTd85`1 zPee;mXt-Y02<@CiG?zl74`jzH!k|U?e3&hS=;&X>^|6W6c_BBRvl`-&}@!5 zrUpMv7E_x}#&^>ss~6e%dm0e%yG{frJ?LcYo+i5z{M!b&OT($3I~lvC$;$xeb<^an z^`dvH?}68BAwFJ}+7UmJK@*=MHd!I>ohEDRMNa;AF(eoL)(=%i0`Fw(0W$|t+3p57 zehQ4>B+pL9w_t|gyBgs5l{12qggO~u;x1s=`cB2_ zv*C7q9b@%%K4jNt!|nQ@th&BS58U;Q*C{$8bjc<@L^b3fPaxD=eDccCPlmiOB zd)h;X0VW~aU|Y!02NZ{py8uo?9)4IHxK?QyIoSW*Py`b6V38c{1&Xo+^_ia(8P$R6Z830yZ}C=$&M zDPp5qwVg;bJCowjOl1%C(D`T>WBMA@cEqVOb|}K*>tR$tV1ZSqq() zeKLoz(sX7^otbm8Sq@xHD@*6LqFj0Laq(#12DWJB6lx9#F{;_3FDQjFVp)nUe6+bN zlu22#+izk;h<=0)%|dh36Uw0(`VpU?rAPVeZ{q409_YvLKsSa5dcX_+2Om*9fu4LYer9yd!I6@DI_N(xG-b^$&4lKDvc| zm~(^vna&MN{Y>WuCPJ>ro8vN~-cw>BFRlkYVP>jdSp(&hevx4``p?W#F!=^^<>SZJ zqF*rAj!gU}ntU$XRTM7QUs!wvU z`G1X$&2hb*D}!BJCE{<_-x$5Iiz_)jx^9qbVM?;y3VF^5R~_pSZ8OsKiAfDOQR;fa z$Cqr8%WrXw%s#$_dEK;h2Y-vUGA|2)m#vb`bahMn67Oy|U~c?4)>a%)2Y&54%m;QB5{nw5@vT=d6C4bWc$e*ycDaGK+b-rHsVKU}R_ zJ@`KJrsV^ipHF}BUn)MCjZBsa3LihbH@f#O*Hq@& z^8@qt!Bu0UoxF6uD@Wci-_?rO{3s*yT_L{jCmEgZYRQ-Em!HjdRgBsBGsrOf%m~tE u=DdKC2g!u~gS<`