Instead of having the LLM repeatedly re-describe common operations in natural language, you define those operations once in human-readable `.txt` catalogs, index them for semantic search, and then have the LLM output compact `.vcs` programs that reference operations by numeric IDs. The .vcs (vectorized code stack) can then be interpreted via a plugin for your software environment by including a small interpreter and static code modules correlating to the human-readable operations. Overall this should save tokens on thinking by offloading that to a vector search client side then saves tokens via the output format in the response. The interpreter plugin is Turing complete and mimics CPU architecture. This makes it fast and requires only static function calls that enable operations to be implemented however the user likes for their environment. If an LLM gets confused while generating a .vcs file it can always refer to the human readable catalog or correlating function library directly if the vector search produced insufficient results due to poor human description. Early benchmarks suggest a 90% token reduction if initial prompts contain sufficient keywords.
0 commit comments