Skip to content

fix(run): add explicit UTF-8 encoding to prompt file operations (#604)#648

Closed
sergio-sisternes-epam wants to merge 1 commit intomicrosoft:mainfrom
sergio-sisternes-epam:fix/604-prompt-encoding
Closed

fix(run): add explicit UTF-8 encoding to prompt file operations (#604)#648
sergio-sisternes-epam wants to merge 1 commit intomicrosoft:mainfrom
sergio-sisternes-epam:fix/604-prompt-encoding

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Summary\n\nFixes #604apm run start crashes with UnicodeDecodeError when .prompt.md contains non-ASCII characters on Windows systems with non-UTF-8 locale encoding (CP950, CP936, CP932).\n\n## Root Cause\n\nThree open() calls in script_runner.py did not pass an explicit encoding parameter:\n- PromptCompiler.compile() — reads the source .prompt.md\n- PromptCompiler.compile() — writes the compiled output file\n- ScriptRunner._execute_script() — reads the compiled file\n\nWithout encoding=\"utf-8\", Python uses the platform default (CP950 on affected systems), which cannot decode UTF-8 multi-byte sequences.\n\n## Fix\n\nAdded encoding=\"utf-8\" to all three open() calls. Full audit of the codebase confirmed no other prompt-related open() calls are affected.\n\n## Changes\n\n- src/apm_cli/core/script_runner.py — Add encoding=\"utf-8\" to 3 open() calls\n- tests/unit/test_script_runner.py — Add 2 tests for CJK and Cyrillic content in prompt compilation\n- CHANGELOG.md — Add fix entry\n\n## Testing\n\nAll 3792 tests pass. New tests verify:\n- CJK characters with frontmatter and parameter substitution\n- Cyrillic + emoji content without frontmatter\n

…osoft#604)

All three open() calls in PromptCompiler and ScriptRunner that handle
.prompt.md files now specify encoding="utf-8", preventing
UnicodeDecodeError on systems where the default locale is not UTF-8
(e.g., Windows CP950).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 9, 2026 17:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Windows-specific UnicodeDecodeError in apm run start by forcing prompt compilation/read paths to use UTF-8, and adds regression tests + a changelog entry.

Changes:

  • Add encoding="utf-8" to prompt read/write open() calls in the script runner prompt compilation flow.
  • Add unit tests covering prompt compilation with non-ASCII content.
  • Add an Unreleased changelog entry for the fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/apm_cli/core/script_runner.py Forces UTF-8 for reading source prompts, writing compiled output, and reading compiled prompts to avoid Windows locale decode errors.
tests/unit/test_script_runner.py Adds regression tests for compiling prompts containing non-ASCII characters.
CHANGELOG.md Records the fix under ## [Unreleased] -> ### Fixed.

Comment on lines +338 to +345
cjk_content = (
"---\n"
"description: 国際化テスト\n"
"---\n"
"\n"
"你好世界!こんにちは ${input:name}!\n"
"Ünïcödé résumé naïve café"
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test introduces non-ASCII characters directly in a Python source file (CJK and accented Latin in string literals). Repo encoding rules require all source files to remain printable ASCII; please represent these characters via Unicode escape sequences (e.g., \uXXXX/\UXXXXXXXX) while still writing UTF-8 bytes to the prompt file, so the test coverage remains but the source stays ASCII-only.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +369 to +371
prompt_path.write_text(
"Привет ${input:who}! 🚀", encoding="utf-8"
)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test includes Cyrillic text and an emoji in the Python source string literal. The repo encoding rules require Python source files to be printable ASCII only; please rewrite the literal using Unicode escape sequences (e.g., \u041f\u0440... and \U0001F680) to keep the source ASCII while still exercising UTF-8 file I/O.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +334 to +336
tmp_dir = tempfile.mkdtemp()
try:
prompt_dir = Path(tmp_dir)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests manually manage a temp directory via tempfile.mkdtemp()/shutil.rmtree(). The rest of this file already uses tempfile.TemporaryDirectory() context managers; switching these new tests to TemporaryDirectory() would reduce cleanup risk (e.g., if rmtree fails on Windows due to open handles) and match existing conventions.

Copilot uses AI. Check for mistakes.

### Fixed

- Add explicit UTF-8 encoding to prompt file read/write operations to prevent `UnicodeDecodeError` on non-UTF-8 default locales (e.g., Windows CP950) (#604)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog entries are expected to end with the PR number (e.g., '(#648)') rather than the linked issue number. Please update this line to reference the PR ID, and optionally mention the issue in the text (e.g., 'Fixes #604') if desired.

Copilot generated this review using guidance from repository custom instructions.
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

Closing this PR — there are already two community contributions addressing #604:

Deferring to the existing community PRs. Thank you to both contributors!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants