fix(run): add explicit UTF-8 encoding to prompt file operations (#604)#648
fix(run): add explicit UTF-8 encoding to prompt file operations (#604)#648sergio-sisternes-epam wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…osoft#604) All three open() calls in PromptCompiler and ScriptRunner that handle .prompt.md files now specify encoding="utf-8", preventing UnicodeDecodeError on systems where the default locale is not UTF-8 (e.g., Windows CP950). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a Windows-specific UnicodeDecodeError in apm run start by forcing prompt compilation/read paths to use UTF-8, and adds regression tests + a changelog entry.
Changes:
- Add
encoding="utf-8"to prompt read/writeopen()calls in the script runner prompt compilation flow. - Add unit tests covering prompt compilation with non-ASCII content.
- Add an Unreleased changelog entry for the fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/apm_cli/core/script_runner.py |
Forces UTF-8 for reading source prompts, writing compiled output, and reading compiled prompts to avoid Windows locale decode errors. |
tests/unit/test_script_runner.py |
Adds regression tests for compiling prompts containing non-ASCII characters. |
CHANGELOG.md |
Records the fix under ## [Unreleased] -> ### Fixed. |
| cjk_content = ( | ||
| "---\n" | ||
| "description: 国際化テスト\n" | ||
| "---\n" | ||
| "\n" | ||
| "你好世界!こんにちは ${input:name}!\n" | ||
| "Ünïcödé résumé naïve café" | ||
| ) |
There was a problem hiding this comment.
This test introduces non-ASCII characters directly in a Python source file (CJK and accented Latin in string literals). Repo encoding rules require all source files to remain printable ASCII; please represent these characters via Unicode escape sequences (e.g., \uXXXX/\UXXXXXXXX) while still writing UTF-8 bytes to the prompt file, so the test coverage remains but the source stays ASCII-only.
| prompt_path.write_text( | ||
| "Привет ${input:who}! 🚀", encoding="utf-8" | ||
| ) |
There was a problem hiding this comment.
This test includes Cyrillic text and an emoji in the Python source string literal. The repo encoding rules require Python source files to be printable ASCII only; please rewrite the literal using Unicode escape sequences (e.g., \u041f\u0440... and \U0001F680) to keep the source ASCII while still exercising UTF-8 file I/O.
| tmp_dir = tempfile.mkdtemp() | ||
| try: | ||
| prompt_dir = Path(tmp_dir) |
There was a problem hiding this comment.
These tests manually manage a temp directory via tempfile.mkdtemp()/shutil.rmtree(). The rest of this file already uses tempfile.TemporaryDirectory() context managers; switching these new tests to TemporaryDirectory() would reduce cleanup risk (e.g., if rmtree fails on Windows due to open handles) and match existing conventions.
|
|
||
| ### Fixed | ||
|
|
||
| - Add explicit UTF-8 encoding to prompt file read/write operations to prevent `UnicodeDecodeError` on non-UTF-8 default locales (e.g., Windows CP950) (#604) |
There was a problem hiding this comment.
|
Closing this PR — there are already two community contributions addressing #604:
Deferring to the existing community PRs. Thank you to both contributors! |
Summary\n\nFixes #604 —
apm run startcrashes withUnicodeDecodeErrorwhen.prompt.mdcontains non-ASCII characters on Windows systems with non-UTF-8 locale encoding (CP950, CP936, CP932).\n\n## Root Cause\n\nThreeopen()calls inscript_runner.pydid not pass an explicitencodingparameter:\n-PromptCompiler.compile()— reads the source.prompt.md\n-PromptCompiler.compile()— writes the compiled output file\n-ScriptRunner._execute_script()— reads the compiled file\n\nWithoutencoding=\"utf-8\", Python uses the platform default (CP950 on affected systems), which cannot decode UTF-8 multi-byte sequences.\n\n## Fix\n\nAddedencoding=\"utf-8\"to all threeopen()calls. Full audit of the codebase confirmed no other prompt-relatedopen()calls are affected.\n\n## Changes\n\n-src/apm_cli/core/script_runner.py— Addencoding=\"utf-8\"to 3open()calls\n-tests/unit/test_script_runner.py— Add 2 tests for CJK and Cyrillic content in prompt compilation\n-CHANGELOG.md— Add fix entry\n\n## Testing\n\nAll 3792 tests pass. New tests verify:\n- CJK characters with frontmatter and parameter substitution\n- Cyrillic + emoji content without frontmatter\n