Skip to content

[BUG] write tool produces LF-only .bat/.cmd files on Windows and doesn't handle non-UTF-8 code pages #31276

@LifetimeVip

Description

@LifetimeVip

Description

On Windows, the write tool produces .bat / .cmd files that fail in two ways:

  1. Line endings: LF-only (\n) instead of CRLF (\r\n). Cmd.exe expects CRLF; LF-only causes the script to exit immediately with no error.

  2. Code page: The file contains UTF-8 encoded non-ASCII text, but cmd.exe interprets the bytes using the system's active code page (e.g., 936 GBK for zh-CN, 932 Shift-JIS for ja-JP, 949 for ko-KR, 1251 for ru-RU, 1252 for Western European). Result: non-ASCII characters display as garbage.

Both issues affect all non-English Windows users regardless of language.

Root Cause

Line endings

write.ts passes AI-generated content directly to fs.writeWithDirs()fs.writeFileString() with no line ending normalization. The AI model generates LF (\n) by default.

edit.ts already handles this correctly at packages/opencode/src/tool/edit.ts:22-33:

function normalizeLineEndings(text: string): string {
  return text.replaceAll("\r\n", "\n")
}
function detectLineEnding(text: string): "\n" | "\r\n" {
  return text.includes("\r\n") ? "\r\n" : "\n"
}
function convertToLineEnding(text: string, ending: "\n" | "\r\n"): string {
  if (ending === "\n") return text
  return text.replaceAll("\n", "\r\n")
}

But these are local to edit.ts and are not used in write.ts.

Encoding

write.ts (line 47) reads existing files with TextDecoder("utf-8", { ignoreBOM: true }). writeFileString() on Node.js writes UTF-8. This is correct — the write tool always handles UTF-8 properly.

The problem is that cmd.exe on Windows defaults to the active code page (e.g., code page 936 for Chinese, 932 for Japanese). A .bat file written as UTF-8 will have its non-ASCII bytes misinterpreted. The fix is to instruct cmd.exe to switch to UTF-8 with chcp 65001 >nul as the second line, but the write tool has no mechanism to ensure this.

Global scope: this is NOT a Chinese-specific issue. Every Windows system outside of English/Western European locales uses a non-UTF-8 code page by default. On Japanese Windows (932), Korean Windows (949), Russian Windows (1251), etc., UTF-8 .bat files without chcp 65001 will all show garbled non-ASCII text.

Proposed Fix

Fix 1: Line endings

Move normalizeLineEndings, detectLineEnding, and convertToLineEnding from edit.ts into a shared utility (e.g., packages/opencode/src/util/line-endings.ts).

In write.ts, between reading the existing file (line 47) and writing (line 64), add:

// Preserve existing line endings; for new .bat/.cmd on Windows, use CRLF
if (exists) {
  const ending = detectLineEnding(contentOld)
  contentNew = convertToLineEnding(normalizeLineEndings(contentNew), ending)
} else if (process.platform === "win32" && /\.(bat|cmd)$/i.test(filepath)) {
  contentNew = convertToLineEnding(normalizeLineEndings(contentNew), "\r\n")
}

Logic:

  • Existing files: detect and preserve the original file's line ending style. If the file uses CRLF, the new content will use CRLF too. This matches edit.ts behavior.
  • New .bat/.cmd files on Windows: default to CRLF (Windows batch files require CRLF to avoid crashes).
  • All other new files: no transformation (keep LF, which is the cross-platform git standard).

Fix 2: Code page for .bat/.cmd on Windows

When writing .bat/.cmd files on Windows that contain non-ASCII characters, automatically insert chcp 65001 >nul as the second line — unless the file already has a chcp command or the user explicitly opted out.

Implementation in write.ts:

function ensureChcpUtf8(content: string): string {
  // Only for .bat/.cmd on Windows with non-ASCII text
  const hasNonAscii = /[\x80-\uFFFF]/.test(content)
  if (!hasNonAscii) return content

  const lines = content.split(/\r?\n/)
  // Don't inject if any line already has a chcp command
  if (lines.some(l => /^\s*chcp\s+\d+/i.test(l))) return content

  // Find the first non-comment, non-empty line to insert after
  // If the first line is @echo off/on, insert as line 2
  // Otherwise insert at the beginning (becomes line 1)
  if (lines.length > 0 && /^@echo\s+(off|on)/i.test(lines[0].trim())) {
    lines.splice(1, 0, "chcp 65001 >nul")
  } else {
    lines.unshift("chcp 65001 >nul")
  }
  return lines.join("\n")  // the caller will handle CRLF via Fix 1
}

Then call contentNew = ensureChcpUtf8(contentNew) before writing.

Why this is safe:

  • Only fires for .bat/.cmd on Windows with non-ASCII text
  • Skips if ANY chcp is already present (no double injection)
  • Insertion respects @echo off positioning (goes on line 2, not before it)
  • Line ending normalization (Fix 1) runs after this, so CRLF is still applied

Caveat: if the content has chcp only in a comment block, we'd skip injection. This is a minor edge case that can be refined.

Why not just use UTF-8 BOM?

Adding a UTF-8 BOM (byte order mark) to .bat files would also tell cmd.exe to interpret them as UTF-8 on recent Windows 10/11. However:

  • On older Windows, BOM causes cmd.exe to crash (it passes the BOM bytes to @echo off, which then fails silently)
  • BOM before @echo off violates the well-known Windows batch file convention
  • BOM is an invisible character that confuses users and tools

So chcp 65001 >nul is the safer, more compatible approach.

Testing

  1. On any non-English Windows (zh-CN, ja-JP, ko-KR, ru-RU, etc.):
    • Write a .bat file with non-ASCII characters → should run without crash, Chinese/Japanese/etc. should display correctly
  2. On English Windows:
    • Same test → should work (chcp 65001 is a no-op on UTF-8 codepage systems but harmless)
  3. On Unix:
    • No behavior change (process.platform !== "win32" guards all new logic)
  4. Editing an existing CRLF file via write:
    • CRLF should be preserved (same as edit.ts behavior)
  5. Editing an existing LF file via write:
    • LF should be preserved (no unwanted conversion)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions