Skip to content

fix: improve serialization error messages and add default=str fallback#965

Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1774545807-improve-serialization-error-message
Draft

fix: improve serialization error messages and add default=str fallback#965
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1774545807-improve-serialization-error-message

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Improves the user-facing warning messages in airbyte_message_to_string() and adds a second serialization fallback to prevent unhandled exceptions (and consequent deadlocks in the concurrent source pipeline) when record data contains types that neither orjson nor stdlib json can serialize.

Before: The warning said "There was an error during the serialization of an AirbyteMessage: Type is not JSON serializable: complex. This might impact the sync performances." — misleading (actual impact is a deadlock, not a perf hit), leaks implementation details, and json.dumps() on the next line could also throw unhandled.

After:

  • orjson → json fallback: "Record serialization fell back to slower method. Sync will continue with reduced performance."
  • json → json(default=str) fallback: "Record contains a value that could not be serialized to JSON. The value was converted to a string representation."
  • Technical exception details logged at DEBUG level only
  • Each warning logged once (separate global flags) to avoid log spam

Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16049

Related: #954 (overlapping scope — this PR supersedes it with the addition of message improvements)

Review & Testing Checklist for Human

  • TypeError catch scope may be too narrow: The inner fallback catches TypeError only. json.dumps() can also raise ValueError (circular references) or OverflowError. If those occur, the exception still propagates unhandled. Decide if the catch should be broadened to Exception here as a last-resort deadlock prevention, or if the narrow TypeError is acceptable since circular references in AirbyteMessage data are extremely unlikely (data originates from JSON API responses).
  • Silent data coercion via default=str: When this fallback fires, non-serializable values are silently converted to strings (e.g., complex(0.0423, 0)"(0.0423+0j)"). This is data quality degradation. Confirm this is acceptable as a last-resort path, and that connector-level fixes (like the companion PR fix(source-google-search-console): guard numeric fields against complex types (AI-Triage PR) airbyte#75426 for source-google-search-console) are the right place to handle data correctness.
  • Verify the deadlock scenario is addressed: The original issue reports deadlocks in source-google-search-console with complex type values. Ideally, run the connector with a search_analytics_by_query stream to confirm the sync completes without hanging.

Notes

  • The root cause of complex values entering the data flow was separately fixed in CDK PR #579 (Jinja interpolation rejecting complex types, merged in v6.54.5). This PR hardens the serialization layer as defense-in-depth.
  • PR #954 addresses only the fallback logic without improving the message text. This PR combines both concerns. If this PR is merged, fix: fail fast on non-JSON-serializable types in serialization fallback (AI-Triage PR) #954 can be closed.
  • The CDK version is managed via semantic-pr-release-drafter — no manual version bump is needed. The fix: title prefix will trigger a patch release.

Link to Devin session: https://app.devin.ai/sessions/132db25cd0f34fa995908b456e5ee57c

- Replace misleading warning message with clear, guideline-compliant text
- Add json.dumps(default=str) fallback to prevent deadlocks when both
  orjson and json.dumps fail on non-serializable types (e.g. complex)
- Catch specific TypeError instead of broad Exception for json fallback
- Log technical details at DEBUG level, user-facing messages at WARNING
- Add separate flag to log second fallback warning only once

Resolves airbytehq/airbyte-internal-issues#16049

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1774545807-improve-serialization-error-message#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1774545807-improve-serialization-error-message

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

"The value was converted to a string representation."
)
logger.debug("json serialization error: %s", json_error)
_HAS_LOGGED_FOR_SERIALIZATION_FALLBACK = True
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a false positive. _HAS_LOGGED_FOR_SERIALIZATION_FALLBACK is already declared at module level on line 52 and is read/written inside the function via global _HAS_LOGGED_FOR_SERIALIZATION_FALLBACK. The suggested fix describes exactly what this PR already does.

@github-actions
Copy link
Copy Markdown

PyTest Results (Fast)

3 935 tests  +1   3 924 ✅ +1   7m 34s ⏱️ +35s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit ea995b9. ± Comparison against base commit acafc75.

@github-actions
Copy link
Copy Markdown

PyTest Results (Full)

3 938 tests  +1   3 926 ✅ +1   11m 16s ⏱️ +27s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit ea995b9. ± Comparison against base commit acafc75.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants