You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,6 +40,7 @@ This file helps AI agents discover and understand how to work with this reposito
40
40
- Added production-ready Python bindings (`python/bindings.cpp`) plus packaging helpers (`setup.py`, `pyproject.toml`) that expose `Limb`/`BigInt` helpers, Montgomery contexts, NumPy quantization utilities, and a tutorial notebook `examples/ternary_quantization_demo.ipynb`.
41
41
- Added `t81.hardware.TernaryEmulator`, documentation for hardware simulation, and `examples/ternary_hardware_sim_demo.ipynb` so agents can explore ternary gate/circuit modeling, fuzzy AI decisions, and power-aware PyTorch inference workflows.
42
42
- Added `docs/references/cli-usage.md` (linked from `docs/index.md`) to cover `t81-convert`, `t81-gguf`, and `t81-qat` usage with the CPU/offloading tips we surfaced for low-memory Apple Silicon.
43
+
- Added a unified `t81` console script that exposes `convert`/`gguf` subcommands while preserving the legacy `t81-convert`/`t81-gguf` wrappers, plus updated docs/tests to reference the new entry point.
43
44
- Added `docs/diagrams/cli-workflows-mermaid.md` to visualize the `t81-convert`, `t81-gguf`, and `t81-qat` workflows for future contributors looking at the CLI surface.
44
45
- Extended `examples/ternary_qat_inference_comparison.py` so it now runs train + validation loops, logs compression ratios + per-step losses, and correlates the ternary threshold history with measured GEMM latencies.
45
46
- Added `scripts/quantize_measure.py`, which chains `t81-convert` → `AutoModel.from_pretrained_t81` → latency/compression stats so you can automate quantize→measure in other pipelines.
-**CLI & GGUF/QAT workflows** — `t81convert`, `t81gguf`, and `t81-qat` (the legacy `t81-convert`/`t81-gguf` aliases still work) automate quantize→export→train flows. Follow [docs/references/cli-usage.md](docs/references/cli-usage.md).
50
50
51
51
## Highlights
52
52
53
53
-**Balanced-ternary core**: `t81::Int` (an alias for `t81::core::limb`) ships overflow-aware arithmetic, canonical I/O, and deterministic hashing.
54
54
-**Ternary-friendly GEMMs**: `t81::linalg::gemm_ternary` packs balanced ternary matrices into AVX/NEON-accelerated kernels with alpha/beta semantics mirrored in the Python binding.
55
-
-**Python, CLI, and Torch helpers**: Pybind11 bindings expose quantize/dequantize utilities and `t81.torch`/`t81.nn`, while `t81-convert`, `t81-gguf`, and `t81-qat` automate quantize/export/train workflows.
55
+
-**Python, CLI, and Torch helpers**: Pybind11 bindings expose quantize/dequantize utilities and `t81.torch`/`t81.nn`, while `t81convert`, `t81gguf`, and `t81-qat` (and their legacy `t81-convert`/`t81-gguf` aliases) automate quantize/export/train workflows.
56
56
-**Normative docs & demos**: Architecture notes, CLI references, and runnable demos live under `docs/`, `examples/`, and `bench/`.
57
57
58
58
## Quick start
@@ -80,7 +80,7 @@ On macOS or other PEP 668-enforced environments, activate a virtualenv before ru
80
80
81
81
### 2a. CLI-friendly Pipx install
82
82
83
-
If you prefer shell-level access to `t81-convert`, `t81-gguf`, `t81-qat`, and `t81-dequant`, pipx can install the repo and then inject the torch extras:
83
+
If you prefer shell-level access to the unified `t81` CLI (with `convert`/`gguf` subcommands) plus `t81-qat` and `t81-dequant`, pipx can install the repo and then inject the torch extras:
@@ -121,7 +121,17 @@ Optional CUDA/ROCm backends can be enabled with `-DUSE_CUDA=ON` / `-DUSE_ROCM=ON
121
121
122
122
## CLI helpers
123
123
124
-
`t81-convert`, `t81-gguf`, and `t81-qat` automate quantize→export→train flows with progress reporting and validation hooks. Browse [docs/references/cli-usage.md](docs/references/cli-usage.md), [docs/diagrams/cli-workflows-mermaid.md](docs/diagrams/cli-workflows-mermaid.md), and [examples/cli-examples.md](examples/cli-examples.md) for recipes.
124
+
`t81 convert`, `t81 gguf`, `t81 info`, and `t81-qat` automate quantize→export→train flows with progress reporting and validation hooks (the legacy `t81-convert`/`t81-gguf` names still work). Browse [docs/references/cli-usage.md](docs/references/cli-usage.md), [docs/diagrams/cli-workflows-mermaid.md](docs/diagrams/cli-workflows-mermaid.md), and [examples/cli-examples.md](examples/cli-examples.md) for recipes.
125
+
126
+
### Large models & GGUF streaming
127
+
128
+
When targeting multi-gigabyte models (Llama 3.x or Gemma 3.x checkpoints) you can still run the CLI helpers without triggering macOS’s OOM killer, but you need to pin everything to CPU memory:
129
+
130
+
- Set `ACCELERATE_DISABLE=1` (and `HF_ACCELERATE_DISABLE=1` when you launch a `transformers` command) so Accelerate never offloads tensors to `meta`/disk and the helpers can call `.to("cpu")`.
131
+
- Prefer `--force-cpu-device-map` or `--device-map none/cpu` so `t81 convert`/`t81 gguf` (and the legacy `t81-convert`/`t81-gguf` wrappers) keep checkpoint shards on host RAM.
132
+
- The Python GGUF reader now streams metadata/tensor infos directly from a file handle, seeks to each tensor block, and only buffers one tensor at a time, so `t81 gguf`/`t81-dequant` (and `t81-gguf`/`t81-dequant` scripts for compatibility) can handle the resulting bundles without reading the entire file into memory.
133
+
134
+
If you still see `NotImplementedError: Cannot copy out of meta tensor` or a kernel that dies while building Matplotlib’s font cache, repeat the cache setup from [docs/troubleshooting.md](docs/troubleshooting.md#large-gguf-conversions) (`MPLCONFIGDIR`, `FONTCONFIG_PATH`, etc.) before rerunning the CLI.
Copy file name to clipboardExpand all lines: docs/index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,17 +24,17 @@ to understand the balanced ternary engine without digging through specs immediat
24
24
-**Python cookbook** — [`docs/python-cookbook.md`](python-cookbook.md) gathers recipes that mix `t81lib.pack_dense_matrix`, `t81.torch.TernaryTensor`, and the CLI helpers.
-**PyTorch how-to** — [`docs/torch.md`](torch.md) walks through `t81.torch`, `t81.nn`, conversion helpers, and how the CLI scripts mirror the Python flows.
27
-
-**CLI reference** — [`docs/references/cli-usage.md`](references/cli-usage.md) lists the `t81-convert`, `t81-gguf`, and `t81-qat` helpers
27
+
-**CLI reference** — [`docs/references/cli-usage.md`](references/cli-usage.md) lists the unified `t81convert`/`t81 gguf` helpers (with legacy `t81-convert`/`t81-gguf` aliases) plus `t81-qat`
28
28
plus the common flags for exporting GGUF bundles and running QAT.
29
29
-**Hardware & energy reference** — [`docs/references/hardware-emulation.md`](references/hardware-emulation.md) connects `t81.hardware.TernaryEmulator`
30
30
with the Python quantization helpers plus the new [`scripts/quantize_measure.py`](../scripts/quantize_measure.py) automation that chains
[`examples/ternary_qat_inference_comparison.py`](../examples/ternary_qat_inference_comparison.py) to kick off a mini `t81.trainer` QAT loop, print the ternary
34
34
threshold schedule, and compare `torch.matmul` vs. `t81lib.gemm_ternary` latency so you can prototype
35
35
entirely inside Python before launching the CLI helpers.
36
36
-**CLI automation & energy benchmarking** — [`scripts/quantize_measure.py`](../scripts/quantize_measure.py) and [`scripts/quantize_energy_benchmark.py`](../scripts/quantize_energy_benchmark.py)
37
-
chain `t81-convert`/`t81-gguf` runs with latency/energy measurement so you can report quantization impact directly
37
+
chain `t81convert`/`t81gguf` runs with latency/energy measurement so you can report quantization impact directly
38
38
from command-line workflows.
39
39
-**Use cases & demos** — [`docs/use-cases.md`](use-cases.md) and [`examples/README.md`](../examples/README.md) capture the canonical scripts, notebooks, and research stories.
40
40
-**Hardware simulation** — [`docs/hardware.md`](hardware.md) details `t81.hardware.TernaryEmulator`, fuzzy helpers, and the visualizer notebook.
0 commit comments