Skip to content

feat(tui): word-level highlighting in edit_file diff view#2900

Open
rumpl wants to merge 1 commit into
docker:mainfrom
rumpl:improve-diff-view
Open

feat(tui): word-level highlighting in edit_file diff view#2900
rumpl wants to merge 1 commit into
docker:mainfrom
rumpl:improve-diff-view

Conversation

@rumpl
Copy link
Copy Markdown
Member

@rumpl rumpl commented May 26, 2026

Inside a modified line, highlight only the specific tokens that changed instead of painting the whole line. Paired delete/insert lines are run through a word-aware LCS diff; differing segments are rendered with a stronger, more saturated background tint derived from the active theme's DiffAddBg / DiffRemoveBg colors.

Screenshot 2026-05-26 at 22 51 13

@rumpl rumpl marked this pull request as ready for review May 26, 2026 20:51
@rumpl rumpl requested a review from a team as a code owner May 26, 2026 20:51
@rumpl rumpl force-pushed the improve-diff-view branch from 165b08e to 1807f83 Compare May 26, 2026 20:58
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🔴 CRITICAL

This PR adds word-level diff highlighting using an LCS-based tokenizer, a new applyWordEmphasis function, and a deriveEmphasisBg colour transform. The approach is solid, but a high-severity byte-offset mismatch was found in the prepareContentdiffWords pipeline that affects all tab-indented files (i.e., essentially all Go code in this project).

Findings summary:

  • 🔴 HIGH — Tab expansion in prepareContent creates byte-offset divergence between word-diff segments and chroma tokens; wrong emphasis and potential panic on tab-indented lines (render.go:626)
  • 🟡 MEDIUMdiffWords equal-region guard only checks old-side position; new-side equal tokens may be silently dropped if LCS output is asymmetric (wordiff.go:107)
  • 🟢 LOWderiveEmphasisBg lightness cap at 0.40 can produce a darker emphasis for dark-theme backgrounds with l ∈ [0.40, 0.50) (colorutil.go:450)

eq := strings.Join(oldTokens[oldPos:d.Start], "")
if eq != "" {
oldSegs = append(oldSegs, wordSegment{Text: eq, Changed: false})
newSegs = append(newSegs, wordSegment{Text: eq, Changed: false})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] diffWords equal-region uses old-side tokens for both sides — new-side equal content may be lost

In diffWords, when the LCS diff has a gap between consecutive entries (an equal region), the code uses oldTokens[oldPos:d.Start] as the equal text and appends it to both oldSegs and newSegs:

if d.Start > oldPos {
    eq := strings.Join(oldTokens[oldPos:d.Start], "")
    if eq != "" {
        oldSegs = append(oldSegs, wordSegment{Text: eq, Changed: false})
        newSegs = append(newSegs, wordSegment{Text: eq, Changed: false})  // old-side text used for new side!
    }
}

This is only safe because the LCS invariant guarantees that equal regions have identical tokens on both sides (oldTokens[oldPos:d.Start] == newTokens[newPos:d.ReplStart]). However, the guard d.Start > oldPos does not have a corresponding d.ReplStart > newPos check. If the lcs.DiffLines library returns any edge case where the old-side gap is empty but the new-side gap is non-empty (e.g., degenerate or optimized output), the new-side equal tokens would be silently dropped. The consequence is that concat(newSegs) != newLine, causing misaligned byte offsets in applyWordEmphasis and corrupted emphasis rendering.

Recommendation: Use newTokens[newPos:d.ReplStart] for newSegs and guard with d.ReplStart > newPos, making the intent explicit and robust against any library behaviour:

if d.Start > oldPos {
    eq := strings.Join(oldTokens[oldPos:d.Start], "")
    if eq != "" {
        oldSegs = append(oldSegs, wordSegment{Text: eq, Changed: false})
    }
}
if d.ReplStart > newPos {
    eq := strings.Join(newTokens[newPos:d.ReplStart], "")
    if eq != "" {
        newSegs = append(newSegs, wordSegment{Text: eq, Changed: false})
    }
}

if len(wordSegs) > 0 {
tokens = applyWordEmphasis(tokens, wordSegs)
}
lineStyle := getLineStyle(line.Kind)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] Tab expansion in prepareContent causes byte-offset mismatch with word-diff segments — wrong emphasis and potential panic on tab-indented lines

prepareContent() expands each \t to 4 spaces before passing content to syntaxHighlight. The chroma tokens produced therefore have byte offsets relative to the tab-expanded string. However, diffWords is called with the raw (non-tab-expanded) content:

// renderSplitSide — uses tab-expanded content for tokens:
content := prepareContent(line.Content)    // \t → "    " (4 bytes)
tokens := syntaxHighlight(content, filePath)

if len(wordSegs) > 0 {
    tokens = applyWordEmphasis(tokens, wordSegs)  // segments have unexpanded byte offsets!
}

// Word segs built with unexpanded content:
oldSegs, newSegs = diffWords(
    strings.TrimRight(pair.old.Content, "\r\n"),  // \t is still 1 byte here
    strings.TrimRight(pair.new.Content, "\r\n"),
)

applyWordEmphasis maps segment byte ranges onto chroma token byte ranges. For any line containing a \t before a changed word, the segment byte positions are shifted left by (tabWidth-1) * numTabsBefore relative to the chroma token byte positions. This causes:

  1. Wrong emphasis: Changed tokens are highlighted at the wrong positions (or not at all).
  2. Potential panic: The slice text[cursor-tokStart : end-tokStart] can produce a negative or out-of-bounds index if end (derived from the segment range) exceeds the actual chroma token's byte length.

Since Go source files — including the test fixtures in this PR — are almost always tab-indented, this affects every real-world usage of the word-level emphasis.

Fix: Call applyWordEmphasis using segments derived from prepareContent-processed input (i.e., pass prepareContent(...) to diffWords as well), or expand tabs in the content passed to diffWords before computing segments.

Comment thread pkg/tui/styles/colorutil.go Outdated
// drop. The magnitude is intentionally modest so changed tokens remain
// readable against the surrounding row color.
if l < 0.5 {
l = min(l+0.18, 0.40)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] deriveEmphasisBg lightness cap can make emphasis darker than the row background for certain dark themes

For dark-theme diff backgrounds where lightness is in the range [0.40, 0.50), the formula l = min(l+0.18, 0.40) clamps the output down to 0.40, which is less than the input. For example:

  • Input l = 0.45min(0.63, 0.40) = 0.40 (emphasis is darker than background)
  • Input l = 0.48min(0.66, 0.40) = 0.40 (emphasis is darker than background)

An emphasis background darker than the surrounding row background defeats the purpose: changed tokens become harder to see rather than easier.

if l < 0.5 {
    l = min(l+0.18, 0.40)  // cap at 0.40 can REDUCE lightness for inputs in [0.40, 0.50)
} else {
    l = max(l-0.18, 0.65)
}

Note: Typical dark-theme diff backgrounds tend to have very low lightness (l ≈ 0.10–0.25), so this range may not be hit with default themes. But any custom theme or third-party theme with a medium-dark diff background would trigger this.

Recommendation: Change the cap to 0.50 (or remove it entirely and just clamp to [0, 1]) so the emphasis colour is always at least as light as its source:

if l < 0.5 {
    l = min(l+0.18, 0.50)
}

@aheritier aheritier added area/tui For features/issues/fixes related to the TUI kind/feat PR adds a new feature (maps to feat: commit prefix) labels May 26, 2026
Inside a modified line, highlight only the specific tokens that changed
instead of painting the whole line. Paired delete/insert lines are run
through a word-aware LCS diff; differing segments are rendered with a
stronger, more saturated background tint derived from the active theme's
DiffAddBg / DiffRemoveBg colors.

New helpers:
- pkg/tui/styles: DiffAddEmphBg / DiffRemoveEmphBg colors and matching
  styles, computed in rebuildStyles via a deriveEmphasisBg HSL boost so
  every theme gets sensible emphasis colors automatically.
- pkg/tui/components/tool/editfile/wordiff.go: tokenizer + word-level
  diff that respects identifier, whitespace, and punctuation boundaries.
@rumpl rumpl force-pushed the improve-diff-view branch from 1807f83 to 39dc76a Compare May 26, 2026 21:03
@rumpl
Copy link
Copy Markdown
Member Author

rumpl commented May 26, 2026

Addressed all three review findings in 39dc76a:

  • HIGH (tab offsets): buildLineWordDiffs and the split-view inline call now feed prepareContent-processed text into diffWords, so segment byte offsets align with the chroma tokens. Added TestRenderEditFile_TabIndentedLineDoesNotPanic as a regression guard.
  • MEDIUM (asymmetric LCS gaps): Old- and new-side equal regions are now handled with independent guards and each side uses its own tokens. Added TestDiffWords_SegmentsReconstructInputs to ensure concat(segs) == input for several edge cases.
  • LOW (lightness cap): Cap raised from 0.40 to 0.50 so the emphasis is never darker than its source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tui For features/issues/fixes related to the TUI kind/feat PR adds a new feature (maps to feat: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants