Skip to content

translations: add "make prune-translations" to remove stale translations#22392

Merged
medyagh merged 9 commits intokubernetes:masterfrom
developowl:feat/make_prune_translations
Jan 15, 2026
Merged

translations: add "make prune-translations" to remove stale translations#22392
medyagh merged 9 commits intokubernetes:masterfrom
developowl:feat/make_prune_translations

Conversation

@developowl
Copy link
Copy Markdown
Member

@developowl developowl commented Jan 7, 2026

fixes #21468

Problem(Issue)

Each language.json contained duplicate and orphan keys (keys that were deleted from strings.txt but still persisted in the JSON files).

Proposed Solution

  • Validation: Compare strings.txt with each language JSON file to identify duplicates and orphan keys.
  • Data Flow Audit: Verified the synchronization process from strings.txt to individual translation files.
  • Data Integrity: Ensured that existing, valid string: translation pairs are preserved during the cleanup process.
  • Strict Matching: Only 100% exact duplicates are targeted for removal to avoid false positives from minor differences (e.g., punctuation or casing).
  • Implementation:
    • Developed the pruning logic under the cmd/ directory.
    • Added a new Makefile target for easy execution.
  • Verification: Completed local testing to ensure the logic works as expected without regressions.

Implementation Details

Analyzing the current system architecture.

  • extract.go : extract logic file
  • writeStringsToFiles: strings to file function

🚨 Root Cause

// Remove translations from the file that are empty and were not extracted
for k, v := range currentTranslations {
			if _, ok := e.translations[k]; !ok && len(v.(string)) == 0 {
				delete(currentTranslations, k)
			}
		}
  • A key is only deleted when both of the following conditions are met:
  1. !ok : The key is missing from e.translations (it is no longer used in the current source code).
  2. len == 0 : The translation value is an empty string.

Data Retention Strategy

To ensure that we don't lose any existing translations, I've implemented a strict pruning logic. As shown in the table below, even if a string is no longer detected in the source code (!ok), it is preserved if it already has a translated value. Only unused strings with empty values are removed.

Screenshot 2026-01-07 at 19 03 22

Solution 1) Easy version (I don't choose.)

// Remove translations from the file that are empty and were not extracted
		
		for k, v := range currentTranslations {
			if _, ok := e.translations[k]; !ok {
				delete(currentTranslations, k)
			}
		}
  • Immediate deletion if the key is missing from e.translations.
  • Disadvantage: Requires caution when running extract (potential safety/data loss issues).
  • A single command performs too many roles (violates the Single Responsibility Principle).

Solution 2) create prune-translations & update-translations also, removed the deletion logic from the extract

extract: 
prune-translations:   
update-translations:  
    $(MAKE) extract
    $(MAKE) prune-translations

Adhering to the Single Responsibility Principle (SRP):

  • extract: Handles only string extraction from the source code.
  • prune-translations: Handles only the cleanup (pruning) of obsolete or orphan keys.
  • update-translations: A wrapper command that performs the entire workflow (extraction and pruning) at once.

prune-translations Job

  1. Extract keys from strings.txt.
  2. Compare the extracted keys with the keys in each locale-specific JSON file.
  3. Remove duplicate or obsolete (deleted) keys from each JSON file.
  4. Compare the final key counts of each JSON file with strings.txt after the cleanup.

Add prune-translations, update-translations target into Makefile

.PHONY: prune-translations
prune-translations: ## remove stale translations that no longer exist in strings.txt
	go run cmd/prune-translations/prune-translations.go

.PHONY: update-translations
update-translations: extract prune-translations ## extract strings and remove stale translations

Test!

before test(ko json)
  • Before test
  • Example: ko.json
  • Duplicate key: L168 & L170

Test Case 1) extract -> prune-translations

extract and prune

Test Case 2) Only update-translations

update

After Test

after test(ko json)
  • Example: ko.json
  • Duplicate key is not availabler.

To maintainer.

@medyagh
My apologies for the delay due to personal reasons. I have remained committed to this issue and have finally completed the task. While I considered Solution 1) as a simpler alternative, I felt it assigned too many responsibilities to the extract command alone. I would appreciate it if you could positively review the prune-translations and update-translations commands I have implemented. Thank you for your patience.🥝

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: developowl
Once this PR has been reviewed and has the lgtm label, please assign comradeprogrammer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from nirs and prezha January 7, 2026 10:25
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 7, 2026
@minikube-bot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

Copy link
Copy Markdown
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@developowl thank you for this PR

plz fix the lint and gomft
Please run 'make fmt' and commit the changes

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new prune-translations command to remove stale translation keys from language JSON files, implementing the Single Responsibility Principle by separating cleanup logic from the extraction process.

Key Changes:

  • Removed automatic cleanup logic from the extract command that previously deleted empty, unused translation keys
  • Created new prune-translations command that removes all keys from translation JSON files that are not present in strings.txt
  • Added Makefile targets for prune-translations (standalone) and update-translations (combined extract + prune workflow)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
pkg/minikube/extract/extract.go Removed the automatic cleanup logic that deleted empty, unused translations during extraction
cmd/prune-translations/prune-translations.go New command that reads valid keys from strings.txt and removes stale keys from all translation JSON files
Makefile Added prune-translations target to run the pruning tool and update-translations target to execute both extract and prune operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/prune-translations/prune-translations.go Outdated
Comment thread cmd/prune-translations/prune-translations.go Outdated
Comment thread cmd/prune-translations/prune-translations.go Outdated
Comment thread cmd/prune-translations/prune-translations.go Outdated
Comment thread cmd/prune-translations/prune-translations.go Outdated
Comment thread cmd/prune-translations/prune-translations.go Outdated
Comment thread cmd/prune-translations/prune-translations.go Outdated
Comment thread cmd/prune-translations/prune-translations.go Outdated
Copy link
Copy Markdown
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@developowl I think this would be cool to be added to our

"make generate-docs"

so when this job runs

make generate-docs

it also includes translations

@developowl
Copy link
Copy Markdown
Member Author

developowl commented Jan 8, 2026

@medyagh Thank you for the feedback!🥝 I've reflected all your reviews.
Please check this commit, which changes generate-docs dependency from extract to update-translations. So it now include both extracting strings(extract) and pruning(prune-translations)
Thank you :)

// This tool removes stale translations (keys not in strings.txt) from language JSON files.
// Usage: go run cmd/prune-translations/prune-translations.go (from minikube root)

package main
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about add a unit test for this , and you can make dummy test data and expect it to to do the right thing, and make sure we dont ever delete real translations by mistake

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@medyagh
I wrote a unit test. I added dummy data to each unit test function. Is this correct? Please review my unit test. :)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// Read valid keys from strings.txt
data, err := os.ReadFile("translations/strings.txt")
if err != nil {
panic("Run from minikube root directory. strings.txt not found.")
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error_message "Run from minikube root directory. strings.txt not found." is unclear about the actual problem. If the file exists but has read permissions issues, this message would be misleading. Consider using a message that indicates the actual error or providing the original error details.

Suggested change
panic("Run from minikube root directory. strings.txt not found.")
panic(fmt.Errorf("failed to read translations/strings.txt: %w", err))

Copilot uses AI. Check for mistakes.
- Extract PruneTranslations() function with configurable directory path
- Add PruneResult struct to return detailed results
- Extract pruneFile() helper function
- Improve error handling with wrapped errors

This refactoring enables unit testing without touching real translation files.
Add comprehensive unit tests using temporary directories to ensure
we never accidentally delete real translations:

- TestPruneTranslations_RemovesStaleKeys: verify stale keys are removed
- TestPruneTranslations_PreservesValidKeys: verify valid keys are kept
- TestPruneTranslations_MultipleFiles: test multiple file handling
- TestPruneTranslations_EmptyTranslationFile: edge case handling
- TestPruneTranslations_MissingStringsFile: error handling
- TestPruneTranslations_InvalidStringsJSON: error handling
- TestPruneTranslations_DoesNotTouchRealTranslations: safety check
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 11, 2026
Copy link
Copy Markdown
Member Author

@developowl developowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a unit test. I added dummy data to each unit test function. Is this correct? Please review my unit test. :)

@developowl
Copy link
Copy Markdown
Member Author

@medyagh
More information of unit test. 🥝

Test cases:

  • TestPruneTranslations_RemovesStaleKeys - Verifies stale keys are correctly removed
  • TestPruneTranslations_PreservesValidKeys - Ensures valid keys are never deleted
  • TestPruneTranslations_MultipleFiles - Tests processing of multiple translation files
  • TestPruneTranslations_EmptyTranslationFile - Handles empty JSON files
  • TestPruneTranslations_MissingStringsFile - Error handling for missing strings.txt
  • TestPruneTranslations_InvalidStringsJSON - Error handling for invalid JSON
  • TestPruneTranslations_DoesNotTouchRealTranslations - Safety check to ensure tests use isolated temp directories

All tests use t.TempDir() with dummy test data, ensuring real translation files are never modified.

Run tests:

go test -v ./cmd/prune-translations/...

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

func main() {
results, err := PruneTranslations("translations")
if err != nil {
panic(err)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using panic for error handling in main() is not idiomatic Go. Consider using log.Fatalf or os.Exit instead to provide better error messages and stack traces. For example: log.Fatalf(\"Failed to prune translations: %v\", err)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment thread cmd/prune-translations/prune-translations.go Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Jan 15, 2026

@developowl: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
integration-docker-containerd-linux-x86 4235db3 link true /test integration-docker-containerd-linux-x86
integration-kvm-crio-linux-x86 4235db3 link false /test integration-kvm-crio-linux-x86
integration-kvm-containerd-linux-x86 4235db3 link true /test integration-kvm-containerd-linux-x86
integration-vfkit-docker-macos-arm 25ed770 link false /test integration-vfkit-docker-macos-arm

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@medyagh medyagh changed the title Feat: make prune-translations translations: add "make prune-translations" to remove stale translations Jan 15, 2026
@medyagh
Copy link
Copy Markdown
Member

medyagh commented Jan 15, 2026

thank you @developowl

@medyagh medyagh merged commit de35686 into kubernetes:master Jan 15, 2026
26 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

some duplicate contexts are exist(/translations)

5 participants