Conversation
📝 WalkthroughWalkthroughThe PR adds UTF-8 BOM detection/stripping utilities and applies them across changelog workflows. New Utf8TextNormalization and a Changelog-facing forwarder expose BOM constants, StripLeadingUtf8Bom and HasUtf8Bom. Services that read or write changelog YAML (serialization, bundling, file creation, artifact preparation, GitHub release, and docs-builder command) now strip any leading U+FEFF and write UTF-8 without emitting a BOM. Tests were added to validate the normalization utilities and that generated/processed YAML files do not contain a BOM. Sequence Diagram(s)sequenceDiagram
participant Producer as Docs-builder / Generator
participant Service as Changelog Service
participant Normalizer as Utf8Normalization Utility
participant FS as File System
Producer->>Service: provide YAML content (may include BOM)
Service->>Normalizer: StripLeadingUtf8Bom(text)
Normalizer-->>Service: normalized text (no leading U+FEFF)
Service->>FS: Write file with UTF8Encoding(emitBOM:false)
FS-->>Service: file persisted (no BOM bytes)
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs (1)
378-383: 💤 Low valuePrivate helper duplicates the shared utility — consider consolidating.
ReleaseNotesSerialization.StripLeadingUtf8BomChar(private) is a local copy ofChangelogUtf8Normalization.StripLeadingUtf8BomChar. BecauseElastic.Documentation.Configurationcannot referenceElastic.Changelog.Utilities, this duplication is understandable, but it means the two implementations could diverge.If the BOM-normalization utility ever needs to handle edge cases (e.g., multiple leading BOMs, or ZERO-WIDTH NO-BREAK SPACE variants), the private copy will need a separate update. Consider moving
ChangelogUtf8Normalization(or a minimalStringUtilitiesbase) toElastic.Documentation.Configurationso it can be shared from a lower-level assembly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs` around lines 378 - 383, ReleaseNotesSerialization.StripLeadingUtf8BomChar duplicates ChangelogUtf8Normalization.StripLeadingUtf8BomChar causing maintenance drift; consolidate by moving the BOM/string normalization helper into a lower-level/shared assembly consumed by both projects (either relocate ChangelogUtf8Normalization or create a minimal StringUtilities in Elastic.Documentation.Configuration) and update ReleaseNotesSerialization to call the shared method (keep the same behavior but ensure edge cases like multiple BOMs or related zero-width variants are handled in the single shared implementation).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs`:
- Around line 378-383: ReleaseNotesSerialization.StripLeadingUtf8BomChar
duplicates ChangelogUtf8Normalization.StripLeadingUtf8BomChar causing
maintenance drift; consolidate by moving the BOM/string normalization helper
into a lower-level/shared assembly consumed by both projects (either relocate
ChangelogUtf8Normalization or create a minimal StringUtilities in
Elastic.Documentation.Configuration) and update ReleaseNotesSerialization to
call the shared method (keep the same behavior but ensure edge cases like
multiple BOMs or related zero-width variants are handled in the single shared
implementation).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2edaba19-33b8-49bc-b017-273766006b75
📒 Files selected for processing (12)
src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cssrc/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cssrc/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cssrc/services/Elastic.Changelog/Creation/ChangelogFileWriter.cssrc/services/Elastic.Changelog/Evaluation/ChangelogPrepareArtifactService.cssrc/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cssrc/services/Elastic.Changelog/Utilities/ChangelogUtf8Normalization.cssrc/tooling/docs-builder/Commands/ChangelogCommand.cstests/Elastic.Changelog.Tests/Changelogs/BundleChangelogsTests.cstests/Elastic.Changelog.Tests/Creation/ChangelogCreationServiceTests.cstests/Elastic.Changelog.Tests/Evaluation/ChangelogPrepareArtifactServiceTests.cstests/Elastic.Changelog.Tests/Utilities/ChangelogUtf8NormalizationTests.cs
| await _fileSystem.File.WriteAllTextAsync(amendFilePath, yaml, Encoding.UTF8, ctx); | ||
| // Strip any leading BOM to ensure clean UTF-8 output for tooling compatibility | ||
| var normalizedYaml = ChangelogUtf8Normalization.StripLeadingUtf8BomChar(yaml); | ||
| await _fileSystem.File.WriteAllBytesAsync(amendFilePath, Encoding.UTF8.GetBytes(normalizedYaml), ctx); |
There was a problem hiding this comment.
I believe we can simply do WriteAllText with new UTF8Encoding(false) instead of Encoding.UTF8.
That way we allocate less strings, see also: https://danielwertheim.se/utf-8-bom-adventures-in-c/
There was a problem hiding this comment.
460f419 is my attempt to address this comment as well as #3243 (review)
The AI summary for that change is as follows:
Summary of Changes in 460f419
This commit addresses PR review feedback by consolidating BOM helper duplication and optimizing UTF-8 write style:
🔄 Consolidated BOM/String Normalization
- Created shared utility: New
Utf8TextNormalization.csinElastic.Documentation/Text/with enhanced BOM stripping that removes all consecutive leading U+FEFF characters (handles double-BOM scenarios) - Updated
ReleaseNotesSerialization.cs: Removed privateStripLeadingUtf8BomCharmethod, now uses sharedUtf8TextNormalization.StripLeadingUtf8Bom - Converted
ChangelogUtf8Normalization.cs: Now serves as a thin forwarder to the shared implementation, maintaining backward compatibility
⚡ Optimized UTF-8 File Writes
Replaced WriteAllBytesAsync(path, Encoding.UTF8.GetBytes(content)) with WriteAllTextAsync(path, content, Utf8NoBom) in all changelog writers:
ChangelogFileWriter.csChangelogBundlingService.csChangelogBundleAmendService.csGitHubReleaseChangelogService.csChangelogPrepareArtifactService.cs
Each now uses static readonly UTF8Encoding Utf8NoBom = new(encoderShouldEmitUTF8Identifier: false) to avoid per-write byte array allocations.
✅ Enhanced Testing
- New test file:
Utf8TextNormalizationTests.cswith comprehensive shared utility tests - Enhanced existing tests: Added consecutive BOM test cases in
ChangelogUtf8NormalizationTests.cs - Edge case coverage: Tests for multiple BOMs, BOM-only strings, and BOMs in middle/end positions
The changes eliminate code duplication, improve memory efficiency, and provide more robust BOM handling across the codebase.
There was a problem hiding this comment.
🧹 Nitpick comments (4)
src/Elastic.Documentation/Text/Utf8TextNormalization.cs (1)
20-20: ⚡ Quick winMake
Utf8BomBytesimmutable to callers.
public static readonly byte[]is still externally mutable (Utf8TextNormalization.Utf8BomBytes[0] = ...). For a shared normalization constant, expose a read-only view instead.Proposed change
-public static readonly byte[] Utf8BomBytes = [0xEF, 0xBB, 0xBF]; +private static readonly byte[] Utf8BomBytesValue = [0xEF, 0xBB, 0xBF]; +public static ReadOnlySpan<byte> Utf8BomBytes => Utf8BomBytesValue;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Elastic.Documentation/Text/Utf8TextNormalization.cs` at line 20, Utf8BomBytes is currently a public mutable byte[]; make the array immutable to callers by keeping a private static readonly byte[] backing field (e.g. _utf8BomBytes with the same byte values 0xEF,0xBB,0xBF) and replace the public symbol Utf8BomBytes with a read-only view such as a public static ReadOnlySpan<byte> Utf8BomBytes => _utf8BomBytes (or public static ReadOnlyMemory<byte> if preferred); this prevents external mutation while preserving the constant bytes.src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs (1)
312-312: ⚡ Quick winAdd
ConfigureAwait(false)to the async file write.Line 312 introduces a new library async I/O await without
ConfigureAwait(false), which can cause unnecessary context capture.Suggested patch
- await _fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx); + await _fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx).ConfigureAwait(false);As per coding guidelines: "Use ConfigureAwait(false) in library code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs` at line 312, The await call to _fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx) captures the synchronization context; update the await in GitHubReleaseChangelogService (method containing this call) to append ConfigureAwait(false) so the Task does not capture context (i.e., await _fileSystem.File.WriteAllTextAsync(...).ConfigureAwait(false)); keep the same arguments (filePath, normalizedContent, Utf8NoBom, ctx) and preserve existing error handling/flow.src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs (1)
267-267: ⚡ Quick winUse
ConfigureAwait(false)on this new await.Line 267 is a library async write and should avoid context capture.
Suggested patch
- await _fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom, ctx); + await _fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom, ctx).ConfigureAwait(false);As per coding guidelines: "Use ConfigureAwait(false) in library code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs` at line 267, The await in ChangelogBundleAmendService (the call to _fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom, ctx)) is capturing the synchronization context; change the await to use ConfigureAwait(false) to avoid context capture in library code (i.e., await _fileSystem.File.WriteAllTextAsync(..., ctx).ConfigureAwait(false)) so the write operates without capturing the caller context.src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs (1)
56-56: ⚡ Quick winApply
ConfigureAwait(false)for this library async I/O call.Line 56 should use
ConfigureAwait(false)to follow library async guidance.Suggested patch
- await fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx); + await fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx).ConfigureAwait(false);As per coding guidelines: "Use ConfigureAwait(false) in library code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs` at line 56, In ChangelogFileWriter (the async method that calls fileSystem.File.WriteAllTextAsync), append ConfigureAwait(false) to the awaited I/O call: change the await fileSystem.File.WriteAllTextAsync(filePath, normalizedContent, Utf8NoBom, ctx) usage so it calls .ConfigureAwait(false) on the Task (keep the same parameters: filePath, normalizedContent, Utf8NoBom, ctx) to avoid capturing the synchronization context in this library code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/Elastic.Documentation/Text/Utf8TextNormalization.cs`:
- Line 20: Utf8BomBytes is currently a public mutable byte[]; make the array
immutable to callers by keeping a private static readonly byte[] backing field
(e.g. _utf8BomBytes with the same byte values 0xEF,0xBB,0xBF) and replace the
public symbol Utf8BomBytes with a read-only view such as a public static
ReadOnlySpan<byte> Utf8BomBytes => _utf8BomBytes (or public static
ReadOnlyMemory<byte> if preferred); this prevents external mutation while
preserving the constant bytes.
In `@src/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cs`:
- Line 267: The await in ChangelogBundleAmendService (the call to
_fileSystem.File.WriteAllTextAsync(amendFilePath, normalizedYaml, Utf8NoBom,
ctx)) is capturing the synchronization context; change the await to use
ConfigureAwait(false) to avoid context capture in library code (i.e., await
_fileSystem.File.WriteAllTextAsync(..., ctx).ConfigureAwait(false)) so the write
operates without capturing the caller context.
In `@src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs`:
- Line 56: In ChangelogFileWriter (the async method that calls
fileSystem.File.WriteAllTextAsync), append ConfigureAwait(false) to the awaited
I/O call: change the await fileSystem.File.WriteAllTextAsync(filePath,
normalizedContent, Utf8NoBom, ctx) usage so it calls .ConfigureAwait(false) on
the Task (keep the same parameters: filePath, normalizedContent, Utf8NoBom, ctx)
to avoid capturing the synchronization context in this library code.
In
`@src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs`:
- Line 312: The await call to _fileSystem.File.WriteAllTextAsync(filePath,
normalizedContent, Utf8NoBom, ctx) captures the synchronization context; update
the await in GitHubReleaseChangelogService (method containing this call) to
append ConfigureAwait(false) so the Task does not capture context (i.e., await
_fileSystem.File.WriteAllTextAsync(...).ConfigureAwait(false)); keep the same
arguments (filePath, normalizedContent, Utf8NoBom, ctx) and preserve existing
error handling/flow.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b11b7d00-8826-404a-96d5-625803a67b3c
📒 Files selected for processing (11)
src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cssrc/Elastic.Documentation/Text/Utf8TextNormalization.cssrc/services/Elastic.Changelog/Bundling/ChangelogBundleAmendService.cssrc/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cssrc/services/Elastic.Changelog/Creation/ChangelogFileWriter.cssrc/services/Elastic.Changelog/Evaluation/ChangelogPrepareArtifactService.cssrc/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cssrc/services/Elastic.Changelog/Utilities/ChangelogUtf8Normalization.cssrc/tooling/docs-builder/Commands/ChangelogCommand.cstests/Elastic.Changelog.Tests/Utilities/ChangelogUtf8NormalizationTests.cstests/Elastic.Documentation.Configuration.Tests/Text/Utf8TextNormalizationTests.cs
✅ Files skipped from review due to trivial changes (1)
- tests/Elastic.Changelog.Tests/Utilities/ChangelogUtf8NormalizationTests.cs
🚧 Files skipped from review as they are similar to previous changes (5)
- src/Elastic.Documentation.Configuration/ReleaseNotes/ReleaseNotesSerialization.cs
- src/services/Elastic.Changelog/Utilities/ChangelogUtf8Normalization.cs
- src/tooling/docs-builder/Commands/ChangelogCommand.cs
- src/services/Elastic.Changelog/Evaluation/ChangelogPrepareArtifactService.cs
- src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cs
Fixes #3191
AI implementation details
The implementation ensures that:
BOM Detection Utility Created
ChangelogUtf8Normalizationclass in/src/services/Elastic.Changelog/Utilities/StripLeadingUtf8BomChar()- removes BOM character from stringsHasUtf8Bom()- detects BOM in byte arraysFile Writers Updated
All key YAML writing operations now strip BOMs before writing:
ChangelogFileWriter- changelog creationChangelogBundlingService- bundling operationsChangelogBundleAmendService- bundle amendmentsGitHubReleaseChangelogService- GitHub release processingFile Readers Updated
ChangelogPrepareArtifactService- replacedFile.Copywith read-normalize-write for YAML filesChangelogCommandinit - normalizes content when updating existingchangelog.ymlComprehensive Tests Added
ChangelogUtf8NormalizationTests- 12 utility tests covering all scenariosBundleChangelogsTests- BOM handling in bundling operationsChangelogCreationServiceTests- BOM-free creation verificationChangelogPrepareArtifactServiceTests- File copying normalization testsVerification Complete
Generative AI disclosure
Tool(s) and model(s) used: composer-2, claude-4-sonnet-thinking