Skip to content

Comments

Optimize token usage in default list_ tools#2016

Open
kerobbi wants to merge 23 commits intomainfrom
kerobbi/response-optimisation
Open

Optimize token usage in default list_ tools#2016
kerobbi wants to merge 23 commits intomainfrom
kerobbi/response-optimisation

Conversation

@kerobbi
Copy link
Contributor

@kerobbi kerobbi commented Feb 16, 2026

Summary

Adds a generic response optimisation package that reduces token usage in list_ tool responses by applying six strategies at runtime: nested object flattening, URL elimination, zero-value elimination, whitespace normalisation, collection summarisation, and fill-rate filtering.

Each relevant tool passes its own config via functional options:

  • WithPreservedFields - keys exempt from destructive strategies
  • WithCollectionExtractors - extract important values from nested collections instead of summarising as [N items]
  • WithMaxDepth - override flatten depth if needed

Why

Current JSON responses are token heavy, with lots of redundant nested objects, URL fields, and zero-value noise. A single list_pull_request tool call at 25 items can use 180k tokens. This pipeline brings that down to 55k (~69% reduction) without losing anything the model actually needs.

What changed

  • Added pkg/response - generic optimisation pipeline with per-tool configuration via functional options
  • Added tests covering all strategies and config interactions
  • Updated existing tests for flattened output format
  • Wired response.OptimizeList into 6 default list_ tools (list_pull_requests, list_issues, list_commits, list_tags, list_releases, list_branches)
  • Did not wire list_issue_types, as its data is already flat and the pipeline produces no benefit

Token reduction

  • list_commits: ~21-25%
  • list_tags: ~73-74%
  • list_releases: ~91-94%
  • list_issues: ~5-15%
  • list_pull_requests: ~54-73%
  • list_branches: ~5-9%

Measured using OAI's tiktoken library (o200k_base) at 2, 10, and 25 items. Model accuracy validated with GPT 5.1 and Opus 4.5 across 6 prompts per tool, no degradation observed.

MCP impact

  • No tool or API changes
  • Tool schema or behavior changed
  • New tool added

Prompts tested (tool changes only)

Security / limits

  • No security or limits impact
  • Auth / permissions considered
  • Data exposure, filtering, or token/size limits considered

Tool renaming

  • I am renaming tools as part of this PR (e.g. a part of a consolidation effort)
    • I have added the new tool aliases in deprecated_tool_aliases.go
  • I am not renaming tools as part of this PR

Note: if you're renaming tools, you must add the tool aliases. For more information on how to do so, please refer to the official docs.

Lint & tests

  • Linted locally with ./script/lint
  • Tested locally with ./script/test

Docs

  • Not needed
  • Updated (README / docs / examples)

@kerobbi kerobbi changed the title WIP: Add response optimisation for list_ tools Add response optimisation to list_ tools in default toolsets Feb 18, 2026
@kerobbi kerobbi marked this pull request as ready for review February 18, 2026 11:01
@kerobbi kerobbi requested a review from a team as a code owner February 18, 2026 11:01
Copilot AI review requested due to automatic review settings February 18, 2026 11:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a response optimization package (pkg/response) that reduces token usage in list_ tool responses by 5-94% (depending on the tool). The optimization applies six strategies: nested object flattening with dot-notation keys, URL field elimination (except preserved ones), zero-value removal, whitespace normalization, collection summarization to [N items] or extracted fields, and fill-rate filtering to remove rarely-populated fields.

The package provides two configuration mechanisms:

  • preservedFields: exempts specific keys (html_url, draft, prerelease) from destructive optimizations
  • collectionFieldExtractors: controls extraction of important subfields from collections (e.g., labels→name, requested_reviewers→login)

Changes:

  • Added pkg/response package with optimization pipeline and comprehensive test coverage
  • Wired response.MarshalItems() into 6 list tools: list_commits (depth 3 for commit.author.name), list_tags, list_releases, list_branches, list_pull_requests, list_issues
  • Updated existing tests to work with flattened output structure (map[string]any instead of typed structs)
  • Intentionally excluded list_issue_types (already flat, no benefit)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/response/optimize.go Core optimization pipeline with flattening, URL removal, zero-value elimination, whitespace normalization, collection handling, and fill-rate filtering
pkg/response/optimize_test.go Comprehensive test coverage for all optimization strategies and configuration interactions
pkg/github/repositories.go Integrated response.MarshalItems into list_commits (depth 3), list_branches, list_tags, list_releases
pkg/github/repositories_test.go Updated test assertions to work with flattened map structure instead of MinimalCommit structs
pkg/github/pullrequests.go Integrated response.MarshalItems into list_pull_requests
pkg/github/issues.go Integrated response.MarshalItems into list_issues, renamed response variable to issueResponse to avoid package name shadowing
pkg/github/issues_test.go Updated test assertions to work with map[string]any structure instead of typed structs, removed unused verifyOrder field

@tonytrg
Copy link
Contributor

tonytrg commented Feb 19, 2026

I have left a few comments, let me know what you think.

@kerobbi kerobbi changed the title Add response optimisation to list_ tools in default toolsets Optimize token usage in default list_ tools Feb 19, 2026
@tonytrg
Copy link
Contributor

tonytrg commented Feb 23, 2026

Thematically they fit to our current approach of creating minimal versions.
Id rather see us creating minimal versions of every list tool call object(if it does not exist already), without looking I assume we also offer the single get variant for most of them.

The approach to handle raw json strings and manually cutting down the objects with default values, or having to manually append exceptions imo justifies manual handcrafting of the reduced minimal version.

Afterwards there is still room for sensible optimizations for list based responses.

Like:

  • transform to minimal version
  • additionally apply any extra optimization only applicable on lists

Conversion from original to optimized struct

Requires:

- defaultFillRateThreshold = 0.1 // default proportion of items that must have a key for it to survive
- minFillRateRows          = 3   // minimum number of items required to apply fill-rate filtering
- defaultMaxDepth

preservedFields      map[string]bool
collectionExtractors map[string][]string

While we do already have a way to reduce bloateds structs.
Example minimalPR, see:

r, err := response.OptimizeList(prs,

minimalPR := convertToMinimalPullRequest(pr)

Problems I see:

  • two different optimized/minimal representations of a struct
    • maintaining two different code sections instead of sharing code
    • I would say that the minimized representation from a single get_ to its list_ method having the same fields is a hard criteria
  • harder to imagine the output/final state for the reduced variant in optimizedlist, with this approach of passing configs/exemptions

@kerobbi
Copy link
Contributor Author

kerobbi commented Feb 23, 2026

@tonytrg I've wired minimal types into the 4 tools that were missing (list_pull_requests, list_issues, list_releases, list_tags) and removed collectionExtractors entirely, as with minimal types that mechanism is redundant.

The optimisation pipeline now does: flatten -> URL removal -> zero-value removal -> whitespace normalisation -> fill rate filtering. Reasoning behind why I kept each (plus the preservedFields config) is:

  • flatten: minimal types still have nested structs (i.e., MinimalUser, MinimalPRBranch)
  • URL removal: MinimalUser includes two URL fields, which might be useful in get_ tools, but honestly noise in list_ (this also acts as a safety net if someone adds a URL field to a minimal type in the future)
    • an alternative here would be separate struct variants (i.e., get vs list), but imo the pipeline has less boilerplate for the same result
  • Zero-value removal: similar tradeoff and alternative solution as the above
    • omitempty would work but it affects get_ tools too (e..g, N * merged: false or all the reaction counts are just noise in a list_ response imo).
  • whitespace normalisation and fill rate filtering: these are runtime only transformations
  • preservedFields: minimal config needed to protect html_url from URL removal, draft/prerelease from zero value removal

On get/list tools field parity, I have somewhat mixed feelings here. These tool types serve different enough purposes that they benefit from different field sets imo. List tools calls done first to find items, then get tools calls to drill in.. having fewer fields in list and richer fields in get feels like the right separation to me. The minimal types are now shared between get_ and list_, so we at least have a consistent base representation. The pipeline just strips noise that doesn't add value at scale, making list output a strict subset of get output rather than a potential entirely different shape.

@kerobbi kerobbi requested a review from tonytrg February 24, 2026 09:43
tonytrg
tonytrg previously approved these changes Feb 24, 2026
}

// flattenInto is the recursive worker for flattenTo.
func flattenInto(item map[string]any, prefix string, result map[string]any, depth int, maxDepth int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style nit: I'd much rather prefer this function to return a parameter than to return void and fill it in the body. Is there any particular reason you chose to do it this way?
If depth >= maxDepth we just return an empty map, maybe we should keep it as-is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to avoid creating + merging a new map at each recursive layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the 2-3 levels of nesting and each item having ~5 nested objects that'd be 3-5 additional map allocations per nested item. In the grand scheme of things (marshalling, unmarshalling pipeline) it's negligible. I know it's idiomatic go code but I much more prefer clean functions simply returning parameters (although it's a matter of taste so I am not pushing on this).
What needs to be addressed is that we always drop the last level in the nested structures due to depth < maxDepth check. You can reproduce it by copy-pasting this test and running it:

t.Run("silently drops nested maps at maxDepth boundary", func(t *testing.T) {
		// With defaultMaxDepth=2, depth starts at 1.
		// Result: "user.org" and its contents are silently lost.
		input := map[string]any{
			"title": "issue",
			"user": map[string]any{
				"login": "alice",
				"org": map[string]any{
					"name": "acme",
				},
			},
		}
		result := flattenTo(input, defaultMaxDepth)

		assert.Equal(t, "issue", result["title"])
		assert.Equal(t, "alice", result["user.login"])
		// BUG: org data is silently dropped — neither "user.org" nor
		// "user.org.name" appear in the result.
		assert.Nil(t, result["user.org"], "nested map at maxDepth is silently dropped")
		assert.Nil(t, result["user.org.name"], "nested map contents at maxDepth are lost")
	})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the code style nit, more than happy to address it in a follow up PR.

As for the second point, that is actually intentional. The main reason for this is that deeply nested data is not useful in a list response imo.

I did just notice that the PR description is generic though and focused on flattening only, my bad! It completely slipped off my mind, as this behaviour was part of the nested object flattening strategy as a whole.

Will edit the PR description and add a test to cover this behaviour.

if !preserved && isURLKey(key) {
continue
}
if !preserved && isZeroValue(value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't isZeroValue cause fields like draft in pull request to be removed? I don't think we want to strip important fields 🤔

Copy link
Contributor Author

@kerobbi kerobbi Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

draft specifically is not removed due to being part of WithPreservedFields. However, for fields like merged: false for example on open PRs, zero value removal / isZeroValue is intentional. N * merged: false is noise in a list response where the get_ tool variant can be used as a fallback for details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about protected? That's an example, I didn't go though the whole list of boolean fields that are important but will be stripped. Looks like a lot of additional configuration needs to be added to make sure we aren't truncating important data?

Copy link
Contributor Author

@kerobbi kerobbi Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected is actually a very good catch, as there's no get_ branch tool as a fallback. I've added it to WithPreservedFields in list_branches.

Regarding the broader concern, there are two alternatives I considered (i.e., what I also briefly mentioned in #2016 (comment)):

  • either add omitempty on relevant fields (this breaks get_ tools)
    • alternative to not breaking get_ tools is having separate types for get and list, and this is a lot more boilerplate imo
  • no zero value removal at all (-> leads to noisy lists)

Overall, there is a risk of maybe leaving out a field, but other than the one you caught just now, all the other tools have get_ fallbacks, which the model can use.


// Wrap optimized issues with pagination metadata
issueResponse := map[string]any{
"issues": json.RawMessage(optimizedIssues),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For empty issues list we used to return an empty list {"issues":[],...}
which is no longer the case after this optimization

Copy link
Contributor Author

@kerobbi kerobbi Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OptimizeList still produces [] on an empty slice 🤔 I've also just tested it against an issue-less repo and this is what was returned:

{"issues":[],"pageInfo":{"endCursor":"","hasNextPage":false,"hasPreviousPage":false,"startCursor":""},"totalCount":0}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a nil guard in OptimizeList, so that [] is returned in that case too.


// OptimizeList optimizes a list of items by applying flattening, URL removal, zero-value removal,
// whitespace normalization, and fill-rate filtering.
func OptimizeList[T any](items []T, opts ...OptimizeListOption) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OptimizeList does marshal + unmarshal + marshal. For large list responses this may introduce noticeable latency.

Copy link
Contributor Author

@kerobbi kerobbi Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately is the tradeoff of having a generic optimisation pipeline. An alternative was using Go reflection (which the spike did), but that's significantly more complex (to read and maintain) imo. It also still has some of the drawbacks of this solution.

I do agree with the entire json round trip potentially adding overhead when a large amount of items is requested, but it should be in the low ms range at worst (which I'd dare to say is negligible vs the actual API call?).

@kerobbi kerobbi requested a review from JoannaaKL February 24, 2026 15:34
}

if len(maps) >= minFillRateRows {
maps = filterByFillRate(maps, defaultFillRateThreshold, cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another issue with the ordering.
The pipeline order is

  1. flattenTo
  2. optimizeItem
  3. filterByFillRate
    So zero-value removal artificially deflates fill rates, causing fill-rate filtering to wipe out keys that had legitimate non-zero values on some items.

Concrete example with list_branches (no preserved fields):

10 branches: 1 has protected: true, 9 have protected: false optimizeItem strips protected: false from 9 items (it's a zero value)
Then filterByFillRate sees protected on only 1/10 items --> minCount = int(0.1 * 10) = 1 → 1 > 1 is false and it's removed
The one branch that was protected: true loses that information entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That order is intentional (and intentionally aggressive). I might sound like a broken record here, but what the optimisation pipeline should do imo is optimise for a workflow where list tools are for scanning, and get tools are for getting in-depth details.

Stripping everything that isn't high signal at the list level should be fine, the worst thing that could happen in practice is that the model ends up using the get_ tool to get more specifics.

With that being said, I did spend some time exploring a couple of alternatives:

  1. Swapping the order (fill rate first, then zero value removal):
  • this might be better for JSON, but it creates a problem for other output formats (e.g., for CSV like format, sparse bool with 1 true and 99 empty cells adds noise, which I'm pretty sure scales linearly with amount of items).
  1. Skipping bools in zero value removal entirely:
  • no data loss, but keeps N * false values in the output, which does add up at scale and goes against the entire goal of this work

Keeping the current order and having WithPreservedFields is imo the most explicit way of controlling what is retained (i.e., high priority fields) and it works consistently across formats.


optimizedIssues, err := response.OptimizeList(minimalIssues)
if err != nil {
return nil, nil, fmt.Errorf("failed to optimize issues: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from list_pull_requests, where if a call to OptimizeList fails we return if err != nil { return utils.NewToolResultErrorFromErr("failed to marshal response", err), nil, nil }

@JoannaaKL
Copy link
Contributor

I added some tests to my branch, just to show you what I mean #2081

@kerobbi
Copy link
Contributor Author

kerobbi commented Feb 24, 2026

I added some tests to my branch, just to show you what I mean #2081

Tried my best at using these as reference and addressing every concern here..

Regarding the whitespace normalisation vs "markdown with code blocks, bullet lists, and intentional line breaks", I don't think there's a quick fix for this without making the approach smarter (e.g., skipping content inside code fences?).

It is a valid concern and I'll try revisiting this in a follow up, but overall I am of the idea that list_ tools are for scanning, and get_ tools should be used to return the full body.

Imo this is acceptable for now (I did not consider this specific scenario during evals, but overall, when a prompt such as summarise the last N items from X was used, the model never had a problem giving accurate replies).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants