feat(cdk): add sentry_sdk.capture_message() to MemoryMonitor for high-memory warnings (AI-Triage PR)#958
Conversation
…-memory warnings Add Sentry capture_message() call when memory usage exceeds 90% threshold. The Sentry alert fires once per sync to reduce noise. Uses a top-level try/except ImportError guard so environments without sentry_sdk degrade gracefully. Resolves airbytehq/airbyte-internal-issues#16092 Co-Authored-By: bot_apk <apk@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1774305843-sentry-memory-monitor-capture-message#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1774305843-sentry-memory-monitor-capture-messagePR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
sentry_sdk is a soft import in memory_monitor.py, provided at runtime by the connector-execution-service. It is not a declared CDK dependency. Co-Authored-By: bot_apk <apk@cognition.ai>
|
Thanks for picking this up! One change I'd like to make on the import approach: Instead of the soft import with import sentry_sdkRationale:
This means:
The See https://github.com/airbytehq/airbyte-internal-issues/issues/16092 for the full context. |
There was a problem hiding this comment.
Pull request overview
This PR enhances the CDK’s MemoryMonitor so high-memory warnings (≥90% of container limit) are also surfaced in Sentry (via sentry_sdk.capture_message()), enabling aggregation/fingerprinting beyond connector logs.
Changes:
- Soft-import
sentry_sdkand emit a one-shot Sentry warning perMemoryMonitorinstance when memory exceeds the threshold. - Add unit tests covering Sentry capture behavior, one-shot semantics, below-threshold behavior, and graceful degradation when Sentry is unavailable.
- Update
deptryconfiguration to ignore the soft-importedsentry_sdkdependency.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
airbyte_cdk/utils/memory_monitor.py |
Soft-imports Sentry SDK and emits a one-shot Sentry warning on high memory. |
unit_tests/utils/test_memory_monitor.py |
Adds unit tests validating Sentry capture behavior and fallback behavior. |
pyproject.toml |
Adds sentry_sdk to deptry DEP001 ignore list to match soft-import pattern. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
airbyte_cdk/utils/memory_monitor.py
Outdated
| self._sentry_alerted = True | ||
| sentry_sdk.capture_message( | ||
| "Source memory usage at %d%% of container limit (%.2f / %.2f GB)." | ||
| % (usage_percent, usage_gb, limit_gb), | ||
| level="warning", | ||
| ) |
There was a problem hiding this comment.
_sentry_alerted is set to True before calling sentry_sdk.capture_message(). If capture_message() raises (misconfiguration, transport error, unexpected SDK behavior), this will both crash the sync and permanently suppress future alerts for this monitor instance. Consider wrapping the Sentry call in a broad try/except (and logging at debug) and only setting _sentry_alerted = True after a successful capture.
| self._sentry_alerted = True | |
| sentry_sdk.capture_message( | |
| "Source memory usage at %d%% of container limit (%.2f / %.2f GB)." | |
| % (usage_percent, usage_gb, limit_gb), | |
| level="warning", | |
| ) | |
| try: | |
| sentry_sdk.capture_message( | |
| "Source memory usage at %d%% of container limit (%.2f / %.2f GB)." | |
| % (usage_percent, usage_gb, limit_gb), | |
| level="warning", | |
| ) | |
| self._sentry_alerted = True | |
| except Exception as exc: | |
| logger.debug( | |
| "Failed to send memory usage alert to Sentry: %s", | |
| exc, | |
| exc_info=True, | |
| ) |
airbyte_cdk/utils/memory_monitor.py
Outdated
| sentry_sdk.capture_message( | ||
| "Source memory usage at %d%% of container limit (%.2f / %.2f GB)." | ||
| % (usage_percent, usage_gb, limit_gb), | ||
| level="warning", |
There was a problem hiding this comment.
The warning message format string is duplicated: once in logger.warning(...) and again in the capture_message(...) call (with manual % formatting). To avoid the two drifting over time (and breaking Sentry fingerprinting expectations), build the message once (e.g., message = ...) and reuse it for both logging and Sentry capture.
airbyte_cdk/utils/memory_monitor.py
Outdated
| from typing import Optional | ||
|
|
||
| try: | ||
| import sentry_sdk | ||
| except ImportError: | ||
| sentry_sdk = None # type: ignore[assignment] |
There was a problem hiding this comment.
The soft-import fallback uses sentry_sdk = None # type: ignore[assignment], which suppresses type checking for this symbol. To keep mypy coverage stronger, consider explicitly typing the variable (e.g., sentry_sdk: object | None = None / Optional[Any]) and using # type: ignore[import-not-found] (or a TYPE_CHECKING import) instead of ignoring the assignment.
| from typing import Optional | |
| try: | |
| import sentry_sdk | |
| except ImportError: | |
| sentry_sdk = None # type: ignore[assignment] | |
| from typing import Any, Optional | |
| sentry_sdk: Optional[Any] | |
| try: | |
| import sentry_sdk as _sentry_sdk # type: ignore[import-not-found] | |
| sentry_sdk = _sentry_sdk | |
| except ImportError: | |
| sentry_sdk = None |
Address PR feedback from @pnilan: - Add sentry-sdk as a required dependency in pyproject.toml - Use standard top-level import instead of try/except ImportError - Remove sentry_sdk from deptry DEP001 ignore list - Drop the 'sentry_sdk is not None' guard (always available) - Deduplicate warning message format string (build once, reuse) - Replace graceful degradation test with message content test Co-Authored-By: bot_apk <apk@cognition.ai>
|
Patrick Nilan (@pnilan) Thanks for the feedback! I've addressed all your points in 374528a:
The |
|
Responding to Copilot's suggestions on the original diff: Comment 4 (try/except around Comment 5 (deduplicate message format string): Valid suggestion. Implemented in 374528a — the message is now built once and reused for both Comment 6 (typing for soft import): This suggestion is moot — the soft import has been replaced with a standard hard import per Patrick Nilan (@pnilan)'s feedback. No longer applicable. |
Co-Authored-By: bot_apk <apk@cognition.ai>
…endency upgrades Co-Authored-By: bot_apk <apk@cognition.ai>
|
I think this should be fixed before merge. I’d suggest making the Sentry send explicitly best-effort, e.g.: if not self._sentry_alerted and sentry_sdk is not None:
try:
sentry_sdk.capture_message(
"Source memory usage at %d%% of container limit (%.2f / %.2f GB)."
% (usage_percent, usage_gb, limit_gb),
level="warning",
)
except Exception:
logger.debug("Failed to send high-memory warning to Sentry.", exc_info=True)
else:
self._sentry_alerted = TrueTwo reasons for this structure:
I’d also add a regression test that patches |
…sient failure - Wrap capture_message() in try/except so Sentry failures never abort the sync - Move _sentry_alerted = True to else clause (only set after successful send) - Log Sentry failures at DEBUG level with exc_info for diagnostics - Add regression test: capture_message raises → check_memory_usage does not raise - Add test: transient failure → retry succeeds on next check interval Co-Authored-By: bot_apk <apk@cognition.ai>
|
Patrick Nilan (@pnilan) Addressed your feedback in cfc7f66 — thanks for the catch, this is a much better structure. Changes:
All 19 tests pass locally (ruff format + lint clean). |
Summary
Adds a
sentry_sdk.capture_message()call toMemoryMonitor.check_memory_usage()so that high-memory warnings (≥90% of container limit) are surfaced in Sentry, not just in connector logs. This enables Sentry fingerprinting rules (already merged in #494) to aggregate these events.Key design decisions:
_sentry_alertedflag ensures only one Sentry event perMemoryMonitorlifetime, reducing noise. Thelogger.warning()continues to fire on every check interval as before.sentry-sdkis declared as a required dependency (>=2.0,<3.0) inpyproject.toml. It is imported directly at module level — no soft import / fallback pattern.capture_message()call is wrapped intry/except Exceptionso a Sentry failure can never abort a sync._sentry_alertedis set only in theelseclause (after successful send), allowing automatic retry on subsequent check intervals if a transient failure occurs.logger.warning()andsentry_sdk.capture_message(), avoiding string duplication.Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16092
Updates since initial revision
Round 1 — Per Patrick Nilan (@pnilan)'s review feedback:
try/except ImportError) to hard dependency —sentry-sdkis now declared inpyproject.tomland imported directly.sentry_sdkfrom theDEP001ignore list inpyproject.toml(no longer needed).poetry.lockwas regenerated with Poetry 2.0.1 after adding the dependency.Round 2 — Per Patrick Nilan (@pnilan)'s second review:
sentry_sdk.capture_message()intry/except Exception— a Sentry failure can never escapecheck_memory_usage()and crash the sync._sentry_alerted = Truefrom before thecapture_message()call to theelseclause (only set after a successful send). This allows retry on transient failures instead of permanently suppressing alerts after a single failure.DEBUGlevel withexc_info=Truefor diagnostics.test_sentry_failure_does_not_crash_syncandtest_sentry_retries_after_transient_failure.Review & Testing Checklist for Human
sentry-sdkas a hard CDK dependency is acceptable. This addssentry-sdk >=2.0,<3.0as a required dependency for all CDK consumers. Confirm this aligns with the runtime environment (connector-execution-service already provides it, but now it's also pulled in at install time).poetry.lockchanges. The lock file was regenerated with Poetry 2.0.1; the diff should be minimal (only the newsentry-sdkentry and updated content-hash). Verify no unintended transitive dependency changes were introduced.except Exceptionaroundcapture_message()is acceptable. This intentionally catches all exceptions to ensure observability never breaks the sync. Confirm this aligns with the team's exception handling conventions — the tradeoff is that Sentry SDK bugs would be silently logged at DEBUG rather than surfacing.logger.warning()format change is safe. The call changed from lazy formatting (logger.warning("...", arg1, arg2)) to eager formatting (message = "..." % (args); logger.warning(message)). Functionally equivalent but log aggregation tools that rely on the unformatted template string may behave differently.Test plan: Run
poetry run pytest unit_tests/utils/test_memory_monitor.py -v— all 19 tests should pass (13 existing + 6 new). To verify end-to-end, deploy a connector withsentry_sdkinitialized and trigger memory pressure above 90% of cgroup limit; confirm a single Sentry event appears with message format"Source memory usage at X% of container limit (Y / Z GB).".Notes
capture_messagecalled once on high memory, (2) one-shot behavior across multiple checks, (3) no call below threshold, (4) message includes memory percentage and GB values, (5) Sentry failure does not crash sync and allows retry, (6) transient failure retries successfully on next check.logger.warning()format exactly, so Sentry fingerprinting rules can group them.mainand unrelated to this PR.Link to Devin session: https://app.devin.ai/sessions/61cc78be4ed1488dbeec9afe64bdea24