t81dev
diff --git a/‎README.md‎
Lines changed: 13 additions & 0 deletions b/‎README.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 2 additions & 1 deletion b/‎pyproject.toml‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎python/bindings.cpp‎
Lines changed: 3 additions & 0 deletions b/‎python/bindings.cpp‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎src/llama.cpp‎
Lines changed: 1 addition & 0 deletions b/‎src/llama.cpp‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎t81/__init__.py‎
Lines changed: 2 additions & 0 deletions b/‎t81/__init__.py‎
Lines changed: 2 additions & 0 deletions
@@ -108,6 +108,19 @@ Optional CUDA/ROCm backends can be enabled with `-DUSE_CUDA=ON` / `-DUSE_ROCM=ON
 
 `t81-convert`, `t81-gguf`, and `t81-qat` automate quantize→export→train flows with progress reporting and validation hooks. Browse [docs/references/cli-usage.md](docs/references/cli-usage.md), [docs/diagrams/cli-workflows-mermaid.md](docs/diagrams/cli-workflows-mermaid.md), and [examples/cli-examples.md](examples/cli-examples.md) for recipes.
 
+### Dequantizing for downstream runtimes
+
+Use the new `t81-dequant` helper (backed by `t81.dequantize_gguf_to_float`) to rewrite a TQ1_0/TQ2_0 bundle into float32 before handing it to stock llama.cpp, Ollama, or LM Studio builds that lack ternary support:
+
+```bash
+t81-dequant model-tq1.gguf model-compatible-f16.gguf
+```
+
+That command rewrites the tensors in place while preserving the standard GGUF metadata so the resulting file works with existing loaders. Keep the original `model-tq1.gguf` around for runtimes that already understand TQ tensors, and only run `t81-dequant` when you need immediate compatibility.
+
+For a zero-disk workaround you can also dequantize on the fly (via `t81.dequantize_gguf_to_float` or a small loader patch) before instantiating `llama_cpp.Llama`; see the docs for an example monkey patch if you want to load `model-tq1.gguf` directly without producing an intermediate copy.
+
+
 ## GGUF v4 compliance
 
 t81’s GGUF exports already mirror the llama.cpp conventions; v4’s mandatory `gguf_header` additions are worth calling out for everybody writing their own converter:
 
@@ -33,9 +33,10 @@ torch = [
     "accelerate>=0.20",
     "datasets>=2.13",
 ]
-dev = ["pytest", "pybind11>=2.12", "cibuildwheel>=2.15"]
+dev = ["pytest>=9.0", "pybind11>=2.12", "cibuildwheel>=2.15"]
 
 [project.scripts]
 t81-convert = "t81.convert:main"
 t81-qat = "t81.scripts.t81_qat:main"
 t81-gguf = "t81.scripts.t81_gguf:main"
+t81-dequant = "t81.scripts.t81_dequant:main"
@@ -30,6 +30,9 @@
 
 namespace py = pybind11;
 namespace core = t81::core;
+using t81::DeviceType;
+using t81::ScalarType;
+using t81::TensorMetadata;
 
 static std::string
 decimal_string(const core::bigint &value) {
 
@@ -0,0 +1 @@
+Subproject commit 482211438dd671224a7f176b7480b4ded424212c
@@ -18,6 +18,7 @@
     "gguf",
     "read_gguf",
     "write_gguf",
+    "dequantize_gguf_to_float",
     "convert",
     "Linear",
     "ternary",
@@ -34,6 +35,7 @@
 _LAZY_MEMBERS: dict[str, tuple[str, str]] = {
     "read_gguf": (".gguf", "read_gguf"),
     "write_gguf": (".gguf", "write_gguf"),
+    "dequantize_gguf_to_float": (".gguf", "dequantize_gguf_to_float"),
     "convert": (".convert", "convert"),
     "Linear": (".nn", "Linear"),
     "ternary": (".qat", "ternary"),
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Subproject commit 482211438dd671224a7f176b7480b4ded424212c`