Skip to content

Commit 260105c

Browse files
committed
cli
1 parent f9f5f78 commit 260105c

16 files changed

Lines changed: 641 additions & 167 deletions

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ This file helps AI agents discover and understand how to work with this reposito
4040
- Added production-ready Python bindings (`python/bindings.cpp`) plus packaging helpers (`setup.py`, `pyproject.toml`) that expose `Limb`/`BigInt` helpers, Montgomery contexts, NumPy quantization utilities, and a tutorial notebook `examples/ternary_quantization_demo.ipynb`.
4141
- Added `t81.hardware.TernaryEmulator`, documentation for hardware simulation, and `examples/ternary_hardware_sim_demo.ipynb` so agents can explore ternary gate/circuit modeling, fuzzy AI decisions, and power-aware PyTorch inference workflows.
4242
- Added `docs/references/cli-usage.md` (linked from `docs/index.md`) to cover `t81-convert`, `t81-gguf`, and `t81-qat` usage with the CPU/offloading tips we surfaced for low-memory Apple Silicon.
43+
- Added a unified `t81` console script that exposes `convert`/`gguf` subcommands while preserving the legacy `t81-convert`/`t81-gguf` wrappers, plus updated docs/tests to reference the new entry point.
4344
- Added `docs/diagrams/cli-workflows-mermaid.md` to visualize the `t81-convert`, `t81-gguf`, and `t81-qat` workflows for future contributors looking at the CLI surface.
4445
- Extended `examples/ternary_qat_inference_comparison.py` so it now runs train + validation loops, logs compression ratios + per-step losses, and correlates the ternary threshold history with measured GEMM latencies.
4546
- Added `scripts/quantize_measure.py`, which chains `t81-convert``AutoModel.from_pretrained_t81` → latency/compression stats so you can automate quantize→measure in other pipelines.

README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,13 +46,13 @@ It is **not** a drop-in replacement for PyTorch or NumPy, but a focused toolkit
4646

4747
- **C++ limb/bigint & numerics** — build locally, include `<t81/t81lib.hpp>`, and verify the `tests/unit/` suite. Starts: [Quick start](#quick-start) & [docs/api-overview.md](docs/api-overview.md).
4848
- **Python quantization & helpers**`pip install .[torch]` unlocks `t81lib`/`t81`, NumPy wrappers, and `t81.torch`/`t81.nn`. See [docs/python-install.md](docs/python-install.md) & [docs/python-api.md](docs/python-api.md).
49-
- **CLI & GGUF/QAT workflows**`t81-convert`, `t81-gguf`, and `t81-qat` automate quantize→export→train flows. Follow [docs/references/cli-usage.md](docs/references/cli-usage.md).
49+
- **CLI & GGUF/QAT workflows**`t81 convert`, `t81 gguf`, and `t81-qat` (the legacy `t81-convert`/`t81-gguf` aliases still work) automate quantize→export→train flows. Follow [docs/references/cli-usage.md](docs/references/cli-usage.md).
5050

5151
## Highlights
5252

5353
- **Balanced-ternary core**: `t81::Int` (an alias for `t81::core::limb`) ships overflow-aware arithmetic, canonical I/O, and deterministic hashing.
5454
- **Ternary-friendly GEMMs**: `t81::linalg::gemm_ternary` packs balanced ternary matrices into AVX/NEON-accelerated kernels with alpha/beta semantics mirrored in the Python binding.
55-
- **Python, CLI, and Torch helpers**: Pybind11 bindings expose quantize/dequantize utilities and `t81.torch`/`t81.nn`, while `t81-convert`, `t81-gguf`, and `t81-qat` automate quantize/export/train workflows.
55+
- **Python, CLI, and Torch helpers**: Pybind11 bindings expose quantize/dequantize utilities and `t81.torch`/`t81.nn`, while `t81 convert`, `t81 gguf`, and `t81-qat` (and their legacy `t81-convert`/`t81-gguf` aliases) automate quantize/export/train workflows.
5656
- **Normative docs & demos**: Architecture notes, CLI references, and runnable demos live under `docs/`, `examples/`, and `bench/`.
5757

5858
## Quick start
@@ -80,7 +80,7 @@ On macOS or other PEP 668-enforced environments, activate a virtualenv before ru
8080

8181
### 2a. CLI-friendly Pipx install
8282

83-
If you prefer shell-level access to `t81-convert`, `t81-gguf`, `t81-qat`, and `t81-dequant`, pipx can install the repo and then inject the torch extras:
83+
If you prefer shell-level access to the unified `t81` CLI (with `convert`/`gguf` subcommands) plus `t81-qat` and `t81-dequant`, pipx can install the repo and then inject the torch extras:
8484

8585
```bash
8686
pipx install --python python3 /Users/t81dev/Desktop/t81lib
@@ -121,7 +121,17 @@ Optional CUDA/ROCm backends can be enabled with `-DUSE_CUDA=ON` / `-DUSE_ROCM=ON
121121

122122
## CLI helpers
123123

124-
`t81-convert`, `t81-gguf`, and `t81-qat` automate quantize→export→train flows with progress reporting and validation hooks. Browse [docs/references/cli-usage.md](docs/references/cli-usage.md), [docs/diagrams/cli-workflows-mermaid.md](docs/diagrams/cli-workflows-mermaid.md), and [examples/cli-examples.md](examples/cli-examples.md) for recipes.
124+
`t81 convert`, `t81 gguf`, `t81 info`, and `t81-qat` automate quantize→export→train flows with progress reporting and validation hooks (the legacy `t81-convert`/`t81-gguf` names still work). Browse [docs/references/cli-usage.md](docs/references/cli-usage.md), [docs/diagrams/cli-workflows-mermaid.md](docs/diagrams/cli-workflows-mermaid.md), and [examples/cli-examples.md](examples/cli-examples.md) for recipes.
125+
126+
### Large models & GGUF streaming
127+
128+
When targeting multi-gigabyte models (Llama 3.x or Gemma 3.x checkpoints) you can still run the CLI helpers without triggering macOS’s OOM killer, but you need to pin everything to CPU memory:
129+
130+
- Set `ACCELERATE_DISABLE=1` (and `HF_ACCELERATE_DISABLE=1` when you launch a `transformers` command) so Accelerate never offloads tensors to `meta`/disk and the helpers can call `.to("cpu")`.
131+
- Prefer `--force-cpu-device-map` or `--device-map none/cpu` so `t81 convert`/`t81 gguf` (and the legacy `t81-convert`/`t81-gguf` wrappers) keep checkpoint shards on host RAM.
132+
- The Python GGUF reader now streams metadata/tensor infos directly from a file handle, seeks to each tensor block, and only buffers one tensor at a time, so `t81 gguf`/`t81-dequant` (and `t81-gguf`/`t81-dequant` scripts for compatibility) can handle the resulting bundles without reading the entire file into memory.
133+
134+
If you still see `NotImplementedError: Cannot copy out of meta tensor` or a kernel that dies while building Matplotlib’s font cache, repeat the cache setup from [docs/troubleshooting.md](docs/troubleshooting.md#large-gguf-conversions) (`MPLCONFIGDIR`, `FONTCONFIG_PATH`, etc.) before rerunning the CLI.
125135

126136
### Dequantizing for downstream runtimes
127137

docs/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,17 +24,17 @@ to understand the balanced ternary engine without digging through specs immediat
2424
- **Python cookbook**[`docs/python-cookbook.md`](python-cookbook.md) gathers recipes that mix `t81lib.pack_dense_matrix`, `t81.torch.TernaryTensor`, and the CLI helpers.
2525
- **Python install paths**[`docs/python-install.md`](python-install.md) explains pip/pipx builds, validation tips, and CLI helper installs.
2626
- **PyTorch how-to**[`docs/torch.md`](torch.md) walks through `t81.torch`, `t81.nn`, conversion helpers, and how the CLI scripts mirror the Python flows.
27-
- **CLI reference**[`docs/references/cli-usage.md`](references/cli-usage.md) lists the `t81-convert`, `t81-gguf`, and `t81-qat` helpers
27+
- **CLI reference**[`docs/references/cli-usage.md`](references/cli-usage.md) lists the unified `t81 convert`/`t81 gguf` helpers (with legacy `t81-convert`/`t81-gguf` aliases) plus `t81-qat`
2828
plus the common flags for exporting GGUF bundles and running QAT.
2929
- **Hardware & energy reference**[`docs/references/hardware-emulation.md`](references/hardware-emulation.md) connects `t81.hardware.TernaryEmulator`
3030
with the Python quantization helpers plus the new [`scripts/quantize_measure.py`](../scripts/quantize_measure.py) automation that chains
31-
`t81-convert` → measurement.
31+
`t81 convert` → measurement.
3232
- **Python demos** — the [`examples/`](../examples/) scripts/notebooks track `t81.torch` + `t81.nn` workflows; add
3333
[`examples/ternary_qat_inference_comparison.py`](../examples/ternary_qat_inference_comparison.py) to kick off a mini `t81.trainer` QAT loop, print the ternary
3434
threshold schedule, and compare `torch.matmul` vs. `t81lib.gemm_ternary` latency so you can prototype
3535
entirely inside Python before launching the CLI helpers.
3636
- **CLI automation & energy benchmarking**[`scripts/quantize_measure.py`](../scripts/quantize_measure.py) and [`scripts/quantize_energy_benchmark.py`](../scripts/quantize_energy_benchmark.py)
37-
chain `t81-convert`/`t81-gguf` runs with latency/energy measurement so you can report quantization impact directly
37+
chain `t81 convert`/`t81 gguf` runs with latency/energy measurement so you can report quantization impact directly
3838
from command-line workflows.
3939
- **Use cases & demos**[`docs/use-cases.md`](use-cases.md) and [`examples/README.md`](../examples/README.md) capture the canonical scripts, notebooks, and research stories.
4040
- **Hardware simulation**[`docs/hardware.md`](hardware.md) details `t81.hardware.TernaryEmulator`, fuzzy helpers, and the visualizer notebook.

docs/references/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
### Reference index
22

33
This folder collects lightweight guides for the console scripts and the GGUF
4-
plugin surface. For a broader overview of the CLI helpers (`t81-convert`,
5-
`t81-gguf`, and `t81-qat`), see `cli-usage.md`.
4+
plugin surface. For a broader overview of the CLI helpers (`t81 convert`,
5+
`t81 gguf`, and `t81-qat` plus the legacy `t81-convert`/`t81-gguf` aliases), see `cli-usage.md`.
66

77
### GGUF export (llama.cpp / Ollama / LM Studio)
88

99
```bash
10-
t81-convert meta-llama/Llama-3.2-3B-Instruct llama3.2-3b-t81.gguf --quant TQ1_0
10+
t81 convert meta-llama/Llama-3.2-3B-Instruct llama3.2-3b-t81.gguf --quant TQ1_0
1111
```
1212

1313
When writing a converted checkpoint or GGUF bundle, append `--force-cpu-device-map`
1414
so that Accelerate keeps parameters on CPU/disk instead of dispatching them to
1515
`meta`. The default `device_map="auto"` path can offload modules to disk and
1616
triggers `NotImplementedError: Cannot copy out of meta tensor` when `save_pretrained`
1717
runs. Using the new flag ensures everything stays serializable, and you can rerun
18-
`t81-gguf`/`t81-convert` with it whenever you hit that error.
18+
`t81 gguf`/`t81 convert` (or the legacy `t81-gguf`/`t81-convert` scripts) with it whenever you hit that error.

0 commit comments

Comments
 (0)