Skip to content

fix(core): Sanitize lone surrogates in log body and attributes#20245

Draft
antonis wants to merge 2 commits intodevelopfrom
fix/sanitize-lone-surrogates-in-logs
Draft

fix(core): Sanitize lone surrogates in log body and attributes#20245
antonis wants to merge 2 commits intodevelopfrom
fix/sanitize-lone-surrogates-in-logs

Conversation

@antonis
Copy link
Copy Markdown
Contributor

@antonis antonis commented Apr 13, 2026

Summary

Lone UTF-16 surrogates (U+D800–U+DFFF not part of a valid pair) in log message bodies or string attribute values/keys cause serde_json on the server to reject the entire queued log batch. This means one bad log entry silently drops all healthy logs in the same flush.

This PR sanitizes unpaired surrogates by replacing them with U+FFFD (replacement character) at log capture time, scoped exclusively to the logs code path (packages/core/src/logs/internal.ts).

  • Sanitizes log body (string messages)
  • Sanitizes log attribute string values
  • Sanitizes log attribute keys
  • Uses native String.prototype.toWellFormed() when available (Node 20+, modern browsers), with a manual charCodeAt fallback for older runtimes (Hermes, Node 18)
  • Fast path avoids allocation when no surrogates are present

Fixes getsentry/sentry-react-native#5186

Test plan

  • Unit tests for _INTERNAL_removeLoneSurrogates covering: clean strings, empty strings, valid emoji (surrogate pairs), lone high/low surrogates, surrogates at boundaries, consecutive lone surrogates, mixed valid pairs + lone surrogates, exact repro case from issue
  • Integration tests for _INTERNAL_captureLog verifying sanitization of body, attribute values, and attribute keys
  • TypeScript compilation passes (ES2020 target)
  • All 55 log tests pass

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

Core

  • Automatically disable truncation when span streaming is enabled in LangGraph integration by andreiborza in #20231
  • Automatically disable truncation when span streaming is enabled in LangChain integration by andreiborza in #20230
  • Automatically disable truncation when span streaming is enabled in Google GenAI integration by andreiborza in #20229
  • Automatically disable truncation when span streaming is enabled in Anthropic AI integration by andreiborza in #20228
  • Automatically disable truncation when span streaming is enabled in Vercel AI integration by andreiborza in #20232
  • Automatically disable truncation when span streaming is enabled in OpenAI integration by andreiborza in #20227
  • Add enableTruncation option to Vercel AI integration by nicohrubec in #20195
  • Add enableTruncation option to Google GenAI integration by andreiborza in #20184
  • Add enableTruncation option to Anthropic AI integration by andreiborza in #20181
  • Add enableTruncation option to LangGraph integration by andreiborza in #20183
  • Add enableTruncation option to LangChain integration by andreiborza in #20182
  • Add enableTruncation option to OpenAI integration by andreiborza in #20167
  • Export a reusable function to add tracing headers by JPeer264 in #20076

Deps

  • Bump axios from 1.13.5 to 1.15.0 by dependabot in #20180
  • Bump hono from 4.12.7 to 4.12.12 by dependabot in #20118
  • Bump defu from 6.1.4 to 6.1.6 by dependabot in #20104

Other

  • (cloudflare) Propagate traceparent to RPC calls - via fetch by JPeer264 in #19991

Bug Fixes 🐛

Deno

  • Handle reader.closed rejection from releaseLock() in streaming by andreiborza in #20187
  • Avoid inferring invalid span op from Deno tracer by Lms24 in #20128

Other

  • (ci) Prevent command injection in ci-metadata workflow by fix-it-felix-sentry in #19899
  • (core) Sanitize lone surrogates in log body and attributes by antonis in #20245
  • (e2e) Add op check to waitForTransaction in React Router e2e tests by copilot-swe-agent in #20193
  • (node-integration-tests) Fix flaky kafkajs test race condition by copilot-swe-agent in #20189

Internal Changes 🔧

Deps

  • Bump hono from 4.12.7 to 4.12.12 in /dev-packages/e2e-tests/test-applications/cloudflare-hono by dependabot in #20119
  • Bump axios from 1.13.5 to 1.15.0 in /dev-packages/e2e-tests/test-applications/nestjs-basic by dependabot in #20179

Other

  • (bugbot) Add rules to flag test-flake-provoking patterns by Lms24 in #20192
  • (ci) Remove codecov steps from jobs that produce no coverage/JUnit data by mydea in #20244
  • (deps-dev) Bump vite from 7.2.0 to 7.3.2 in /dev-packages/e2e-tests/test-applications/tanstackstart-react by dependabot in #20107
  • (react) Remove duplicated test mock by s1gr1d in #20200
  • (size-limit) Bump failing size limit scenario by Lms24 in #20186
  • Fix flaky ANR test by increasing blocking duration by JPeer264 in #20239
  • Add automatic flaky test detector by nicohrubec in #18684

🤖 This preview updates automatically when you update the PR.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Body sanitization skipped for parameterized string messages
    • Replaced the primitive-only string guard with isString(message) and added a regression test to ensure fmt/parameterize messages with lone surrogates are sanitized in log bodies.

Create PR

Or push these changes by commenting:

@cursor push 5a6da5722e
Preview (5a6da5722e)
diff --git a/packages/core/src/logs/internal.ts b/packages/core/src/logs/internal.ts
--- a/packages/core/src/logs/internal.ts
+++ b/packages/core/src/logs/internal.ts
@@ -7,7 +7,7 @@
 import type { Integration } from '../types-hoist/integration';
 import type { Log, SerializedLog } from '../types-hoist/log';
 import { consoleSandbox, debug } from '../utils/debug-logger';
-import { isParameterizedString } from '../utils/is';
+import { isParameterizedString, isString } from '../utils/is';
 import { getCombinedScopeData } from '../utils/scopeData';
 import { _getSpanForScope } from '../utils/spanOnScope';
 import { timestampInSeconds } from '../utils/time';
@@ -162,7 +162,7 @@
   const serializedLog: SerializedLog = {
     timestamp,
     level,
-    body: typeof message === 'string' ? _INTERNAL_removeLoneSurrogates(message) : message,
+    body: isString(message) ? _INTERNAL_removeLoneSurrogates(String(message)) : message,
     trace_id: traceContext?.trace_id,
     severity_number: severityNumber ?? SEVERITY_TEXT_TO_SEVERITY_NUMBER[level],
     attributes: sanitizeLogAttributes({

diff --git a/packages/core/test/lib/logs/internal.test.ts b/packages/core/test/lib/logs/internal.test.ts
--- a/packages/core/test/lib/logs/internal.test.ts
+++ b/packages/core/test/lib/logs/internal.test.ts
@@ -1280,6 +1280,18 @@
       expect(logBuffer?.[0]?.body).toBe('bad surrogate \uFFFD here');
     });
 
+    it('sanitizes lone surrogates in parameterized log message body', () => {
+      const options = getDefaultTestClientOptions({ dsn: PUBLIC_DSN, enableLogs: true });
+      const client = new TestClient(options);
+      const scope = new Scope();
+      scope.setClient(client);
+
+      _INTERNAL_captureLog({ level: 'error', message: fmt`bad surrogate ${'\uD800'} here` }, scope);
+
+      const logBuffer = _INTERNAL_getLogBuffer(client);
+      expect(logBuffer?.[0]?.body).toBe('bad surrogate \uFFFD here');
+    });
+
     it('sanitizes lone surrogates in log attribute values', () => {
       const options = getDefaultTestClientOptions({ dsn: PUBLIC_DSN, enableLogs: true });
       const client = new TestClient(options);

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Reviewed by Cursor Bugbot for commit afbd06e. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

size-limit report 📦

Path Size % Change Change
@sentry/browser 25.72 kB - -
@sentry/browser - with treeshaking flags 24.21 kB - -
@sentry/browser (incl. Tracing) 42.73 kB - -
@sentry/browser (incl. Tracing, Profiling) 47.35 kB - -
@sentry/browser (incl. Tracing, Replay) 81.54 kB - -
@sentry/browser (incl. Tracing, Replay) - with treeshaking flags 71.11 kB - -
@sentry/browser (incl. Tracing, Replay with Canvas) 86.25 kB - -
@sentry/browser (incl. Tracing, Replay, Feedback) 98.45 kB - -
@sentry/browser (incl. Feedback) 42.51 kB - -
@sentry/browser (incl. sendFeedback) 30.39 kB - -
@sentry/browser (incl. FeedbackAsync) 35.38 kB - -
@sentry/browser (incl. Metrics) 27.04 kB - -
@sentry/browser (incl. Logs) 27.44 kB +0.96% +259 B 🔺
⛔️ @sentry/browser (incl. Metrics & Logs) (max: 28 kB) 28.12 kB +0.95% +263 B 🔺
@sentry/react 27.48 kB - -
@sentry/react (incl. Tracing) 45.05 kB - -
@sentry/vue 30.56 kB - -
@sentry/vue (incl. Tracing) 44.59 kB - -
@sentry/svelte 25.74 kB - -
CDN Bundle 28.41 kB - -
CDN Bundle (incl. Tracing) 43.75 kB - -
⛔️ CDN Bundle (incl. Logs, Metrics) (max: 30 kB) 30.05 kB +0.89% +265 B 🔺
⛔️ CDN Bundle (incl. Tracing, Logs, Metrics) (max: 45 kB) 45.1 kB +0.61% +273 B 🔺
CDN Bundle (incl. Replay, Logs, Metrics) 68.82 kB +0.34% +228 B 🔺
CDN Bundle (incl. Tracing, Replay) 80.64 kB - -
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) 81.94 kB +0.35% +280 B 🔺
CDN Bundle (incl. Tracing, Replay, Feedback) 86.17 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) 87.45 kB +0.29% +247 B 🔺
CDN Bundle - uncompressed 82.99 kB - -
CDN Bundle (incl. Tracing) - uncompressed 129.77 kB - -
CDN Bundle (incl. Logs, Metrics) - uncompressed 87.88 kB +0.85% +738 B 🔺
CDN Bundle (incl. Tracing, Logs, Metrics) - uncompressed 133.92 kB +0.56% +738 B 🔺
CDN Bundle (incl. Replay, Logs, Metrics) - uncompressed 210.86 kB +0.36% +738 B 🔺
CDN Bundle (incl. Tracing, Replay) - uncompressed 246.65 kB - -
CDN Bundle (incl. Tracing, Replay, Logs, Metrics) - uncompressed 250.79 kB +0.3% +738 B 🔺
CDN Bundle (incl. Tracing, Replay, Feedback) - uncompressed 259.56 kB - -
CDN Bundle (incl. Tracing, Replay, Feedback, Logs, Metrics) - uncompressed 263.69 kB +0.29% +738 B 🔺
@sentry/nextjs (client) 47.47 kB - -
@sentry/sveltekit (client) 43.2 kB - -
@sentry/node-core 57.86 kB +0.02% +6 B 🔺
@sentry/node 174.93 kB +0.01% +10 B 🔺
@sentry/node - without tracing 97.97 kB +0.03% +20 B 🔺
@sentry/aws-serverless 115.22 kB +0.02% +20 B 🔺

View base workflow run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

node-overhead report 🧳

Note: This is a synthetic benchmark with a minimal express app and does not necessarily reflect the real-world performance impact in an application.

Scenario Requests/s % of Baseline Prev. Requests/s Change %
GET Baseline 8,688 - 8,882 -2%
GET With Sentry 1,780 20% 1,782 -0%
GET With Sentry (error only) 5,841 67% 5,873 -1%
POST Baseline 1,207 - 1,203 +0%
POST With Sentry 586 49% 597 -2%
POST With Sentry (error only) 1,060 88% 1,069 -1%
MYSQL Baseline 3,264 - 3,197 +2%
MYSQL With Sentry 421 13% 479 -12%
MYSQL With Sentry (error only) 2,649 81% 2,657 -0%

View base workflow run

* Sanitizes serialized log attributes by replacing lone surrogates in both
* keys and string values with U+FFFD.
*/
function sanitizeLogAttributes(attributes: Attributes): Attributes {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: just thinking out loud here but wdyt about doing this within serializeAttributes so that this applies to all telemetry attributes? I'm assuming the same limitation occurs for metrics and span attributes? Ideally we can keep this code path as performant as possible, so if we do it in the same loop instead of having to loop over the object again, it should be a bit faster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good feedback @Lms24 👍 I hesitated a bit to make a wider change in the scope of this PR to avoid possible side effects and limited the scope just to logs which I guess are more vulnerable to free-form user/developer text input. Looking at it again I think it makes sense to do this in serializeAttributes.
I'll be happy to revise to cover everything if it makes more sense from the Js maintainer's perspective 🙇

* JSON-escaped form (e.g. `\uD800`). Replacing them at the SDK level ensures
* only the offending characters are lost instead of the whole payload.
*/
export function _INTERNAL_removeLoneSurrogates(str: string): string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm given this is quite a lot of code and hence a bundle size impact, have we considered alternatives?

  • can we remove prior to serde parsing in Relay?
  • would it be the end of the world if we only use the toWellFormed native method but avoid the fallback? Not sure if ReactNative supports this already.

If both options are a "No", that's fine, too. Just want to make sure we have our bases covered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove prior to serde parsing in Relay?

Yes, handling this in Relay makes a lot more sense. The SDK side fix is probably more of an extra check if a relay fix is deployed. I'll check what would this involve.

would it be the end of the world if we only use the toWellFormed native method but avoid the fallback? Not sure if ReactNative supports this already.

I think it is supported. I'll further look into this.

@antonis antonis marked this pull request as draft April 13, 2026 12:35
antonis and others added 2 commits April 13, 2026 14:36
Lone UTF-16 surrogates (U+D800–U+DFFF) in log message bodies or
attribute strings cause serde_json on the server to reject the entire
log batch. This replaces unpaired surrogates with U+FFFD at capture
time, scoped to the logs path only.

Fixes getsentry/sentry-react-native#5186

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`parameterize`/`fmt` creates a `String` object via `new String()`, so
`typeof message` returns `'object'` not `'string'`, bypassing the
sanitization. Use `String(message)` to coerce to a primitive before
sanitizing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antonis antonis force-pushed the fix/sanitize-lone-surrogates-in-logs branch from 574a082 to b61af8f Compare April 13, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logs: entire log queue is dropped if a single log is bad

2 participants