Skip to content

feat(cdk): add sentry_sdk.capture_message() to MemoryMonitor for high-memory warnings (AI-Triage PR)#958

Open
devin-ai-integration[bot] wants to merge 6 commits intomainfrom
devin/1774305843-sentry-memory-monitor-capture-message
Open

feat(cdk): add sentry_sdk.capture_message() to MemoryMonitor for high-memory warnings (AI-Triage PR)#958
devin-ai-integration[bot] wants to merge 6 commits intomainfrom
devin/1774305843-sentry-memory-monitor-capture-message

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 23, 2026

Summary

Adds a sentry_sdk.capture_message() call to MemoryMonitor.check_memory_usage() so that high-memory warnings (≥90% of container limit) are surfaced in Sentry, not just in connector logs. This enables Sentry fingerprinting rules (already merged in #494) to aggregate these events.

Key design decisions:

  • One-shot per sync: A _sentry_alerted flag ensures only one Sentry event per MemoryMonitor lifetime, reducing noise. The logger.warning() continues to fire on every check interval as before.
  • Hard dependency: sentry-sdk is declared as a required dependency (>=2.0,<3.0) in pyproject.toml. It is imported directly at module level — no soft import / fallback pattern.
  • Best-effort Sentry reporting: The capture_message() call is wrapped in try/except Exception so a Sentry failure can never abort a sync. _sentry_alerted is set only in the else clause (after successful send), allowing automatic retry on subsequent check intervals if a transient failure occurs.
  • Message deduplication: The warning message is formatted into a local variable shared by both logger.warning() and sentry_sdk.capture_message(), avoiding string duplication.

Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16092

Updates since initial revision

Round 1 — Per Patrick Nilan (@pnilan)'s review feedback:

  • Switched from soft import (try/except ImportError) to hard dependency — sentry-sdk is now declared in pyproject.toml and imported directly.
  • Removed sentry_sdk from the DEP001 ignore list in pyproject.toml (no longer needed).
  • Replaced the "graceful degradation when sentry_sdk unavailable" test with a test verifying message content details (percentage + GB values).
  • poetry.lock was regenerated with Poetry 2.0.1 after adding the dependency.

Round 2 — Per Patrick Nilan (@pnilan)'s second review:

  • Wrapped sentry_sdk.capture_message() in try/except Exception — a Sentry failure can never escape check_memory_usage() and crash the sync.
  • Moved _sentry_alerted = True from before the capture_message() call to the else clause (only set after a successful send). This allows retry on transient failures instead of permanently suppressing alerts after a single failure.
  • Sentry failures are logged at DEBUG level with exc_info=True for diagnostics.
  • Added two regression tests: test_sentry_failure_does_not_crash_sync and test_sentry_retries_after_transient_failure.

Review & Testing Checklist for Human

  • Verify sentry-sdk as a hard CDK dependency is acceptable. This adds sentry-sdk >=2.0,<3.0 as a required dependency for all CDK consumers. Confirm this aligns with the runtime environment (connector-execution-service already provides it, but now it's also pulled in at install time).
  • Inspect poetry.lock changes. The lock file was regenerated with Poetry 2.0.1; the diff should be minimal (only the new sentry-sdk entry and updated content-hash). Verify no unintended transitive dependency changes were introduced.
  • Verify the broad except Exception around capture_message() is acceptable. This intentionally catches all exceptions to ensure observability never breaks the sync. Confirm this aligns with the team's exception handling conventions — the tradeoff is that Sentry SDK bugs would be silently logged at DEBUG rather than surfacing.
  • Verify logger.warning() format change is safe. The call changed from lazy formatting (logger.warning("...", arg1, arg2)) to eager formatting (message = "..." % (args); logger.warning(message)). Functionally equivalent but log aggregation tools that rely on the unformatted template string may behave differently.

Test plan: Run poetry run pytest unit_tests/utils/test_memory_monitor.py -v — all 19 tests should pass (13 existing + 6 new). To verify end-to-end, deploy a connector with sentry_sdk initialized and trigger memory pressure above 90% of cgroup limit; confirm a single Sentry event appears with message format "Source memory usage at X% of container limit (Y / Z GB).".

Notes

  • The 6 new tests cover: (1) capture_message called once on high memory, (2) one-shot behavior across multiple checks, (3) no call below threshold, (4) message includes memory percentage and GB values, (5) Sentry failure does not crash sync and allows retry, (6) transient failure retries successfully on next check.
  • The message format string intentionally matches the existing logger.warning() format exactly, so Sentry fingerprinting rules can group them.
  • CI failures for MyPy and Pytest (Fast) are pre-existing on main and unrelated to this PR.

Link to Devin session: https://app.devin.ai/sessions/61cc78be4ed1488dbeec9afe64bdea24


Open with Devin

…-memory warnings

Add Sentry capture_message() call when memory usage exceeds 90% threshold.
The Sentry alert fires once per sync to reduce noise. Uses a top-level
try/except ImportError guard so environments without sentry_sdk degrade
gracefully.

Resolves airbytehq/airbyte-internal-issues#16092

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1774305843-sentry-memory-monitor-capture-message#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1774305843-sentry-memory-monitor-capture-message

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

sentry_sdk is a soft import in memory_monitor.py, provided at runtime
by the connector-execution-service. It is not a declared CDK dependency.

Co-Authored-By: bot_apk <apk@cognition.ai>
@pnilan
Copy link
Copy Markdown
Contributor

Thanks for picking this up! One change I'd like to make on the import approach:

Instead of the soft import with try/except ImportError, let's add sentry-sdk as a required dependency in pyproject.toml and use a standard top-of-file import:

import sentry_sdk

Rationale:

  • sentry_sdk is already present in the production runtime (installed by connector-execution-service)
  • A standard import is cleaner — no None sentinel, no type: ignore, no DEP001 ignore entry needed
  • This is a temporary approach; once Sentry capture moves to the service level, we can remove the dependency

This means:

  1. Add sentry-sdk to the dependencies in pyproject.toml
  2. Remove the try/except ImportError block and replace with import sentry_sdk
  3. Remove the sentry_sdk entry from the DEP001 ignore list
  4. The if sentry_sdk is not None guard can be dropped (it will always be available)
  5. Update the test for graceful degradation when sentry is unavailable — this scenario no longer applies

The _sentry_alerted one-shot flag is a good addition — keep that.

See https://github.com/airbytehq/airbyte-internal-issues/issues/16092 for the full context.

@pnilan Patrick Nilan (pnilan) marked this pull request as ready for review March 23, 2026 22:53
Copilot AI review requested due to automatic review settings March 23, 2026 22:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the CDK’s MemoryMonitor so high-memory warnings (≥90% of container limit) are also surfaced in Sentry (via sentry_sdk.capture_message()), enabling aggregation/fingerprinting beyond connector logs.

Changes:

  • Soft-import sentry_sdk and emit a one-shot Sentry warning per MemoryMonitor instance when memory exceeds the threshold.
  • Add unit tests covering Sentry capture behavior, one-shot semantics, below-threshold behavior, and graceful degradation when Sentry is unavailable.
  • Update deptry configuration to ignore the soft-imported sentry_sdk dependency.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
airbyte_cdk/utils/memory_monitor.py Soft-imports Sentry SDK and emits a one-shot Sentry warning on high memory.
unit_tests/utils/test_memory_monitor.py Adds unit tests validating Sentry capture behavior and fallback behavior.
pyproject.toml Adds sentry_sdk to deptry DEP001 ignore list to match soft-import pattern.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +148 to +153
self._sentry_alerted = True
sentry_sdk.capture_message(
"Source memory usage at %d%% of container limit (%.2f / %.2f GB)."
% (usage_percent, usage_gb, limit_gb),
level="warning",
)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sentry_alerted is set to True before calling sentry_sdk.capture_message(). If capture_message() raises (misconfiguration, transport error, unexpected SDK behavior), this will both crash the sync and permanently suppress future alerts for this monitor instance. Consider wrapping the Sentry call in a broad try/except (and logging at debug) and only setting _sentry_alerted = True after a successful capture.

Suggested change
self._sentry_alerted = True
sentry_sdk.capture_message(
"Source memory usage at %d%% of container limit (%.2f / %.2f GB)."
% (usage_percent, usage_gb, limit_gb),
level="warning",
)
try:
sentry_sdk.capture_message(
"Source memory usage at %d%% of container limit (%.2f / %.2f GB)."
% (usage_percent, usage_gb, limit_gb),
level="warning",
)
self._sentry_alerted = True
except Exception as exc:
logger.debug(
"Failed to send memory usage alert to Sentry: %s",
exc,
exc_info=True,
)

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +152
sentry_sdk.capture_message(
"Source memory usage at %d%% of container limit (%.2f / %.2f GB)."
% (usage_percent, usage_gb, limit_gb),
level="warning",
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message format string is duplicated: once in logger.warning(...) and again in the capture_message(...) call (with manual % formatting). To avoid the two drifting over time (and breaking Sentry fingerprinting expectations), build the message once (e.g., message = ...) and reuse it for both logging and Sentry capture.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +14
from typing import Optional

try:
import sentry_sdk
except ImportError:
sentry_sdk = None # type: ignore[assignment]
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The soft-import fallback uses sentry_sdk = None # type: ignore[assignment], which suppresses type checking for this symbol. To keep mypy coverage stronger, consider explicitly typing the variable (e.g., sentry_sdk: object | None = None / Optional[Any]) and using # type: ignore[import-not-found] (or a TYPE_CHECKING import) instead of ignoring the assignment.

Suggested change
from typing import Optional
try:
import sentry_sdk
except ImportError:
sentry_sdk = None # type: ignore[assignment]
from typing import Any, Optional
sentry_sdk: Optional[Any]
try:
import sentry_sdk as _sentry_sdk # type: ignore[import-not-found]
sentry_sdk = _sentry_sdk
except ImportError:
sentry_sdk = None

Copilot uses AI. Check for mistakes.
Address PR feedback from @pnilan:
- Add sentry-sdk as a required dependency in pyproject.toml
- Use standard top-level import instead of try/except ImportError
- Remove sentry_sdk from deptry DEP001 ignore list
- Drop the 'sentry_sdk is not None' guard (always available)
- Deduplicate warning message format string (build once, reuse)
- Replace graceful degradation test with message content test

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Patrick Nilan (@pnilan) Thanks for the feedback! I've addressed all your points in 374528a:

  1. Added sentry-sdk = ">=2.0,<3.0" to pyproject.toml dependencies
  2. Replaced try/except ImportError with standard import sentry_sdk at module level
  3. Removed sentry_sdk from the DEP001 ignore list
  4. Dropped the if sentry_sdk is not None guard
  5. Replaced the graceful degradation test with a message content test (verifies % and GB values in the Sentry message)
  6. Deduplicated the warning message format string — now built once and reused for both logger.warning() and sentry_sdk.capture_message()

The _sentry_alerted one-shot flag is preserved as requested.


Devin session

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Responding to Copilot's suggestions on the original diff:

Comment 4 (try/except around capture_message): This suggestion was made on the soft-import version. Now that sentry_sdk is a hard dependency, the sentry_sdk is not None guard is gone. Regarding wrapping capture_message() in a broad except Exception — per this repo's coding standards, we avoid catching broad exceptions unless we can add meaningful value. capture_message() is a well-tested SDK method unlikely to raise in practice; if it does, that's a genuine unexpected error that should surface rather than be silently swallowed. The flag ordering (_sentry_alerted = True before capture) is intentional — in the unlikely event of a Sentry transport error, we still want to suppress future attempts to avoid repeated failures on every check interval. Not implementing.

Comment 5 (deduplicate message format string): Valid suggestion. Implemented in 374528a — the message is now built once and reused for both logger.warning() and sentry_sdk.capture_message().

Comment 6 (typing for soft import): This suggestion is moot — the soft import has been replaced with a standard hard import per Patrick Nilan (@pnilan)'s feedback. No longer applicable.


Devin session

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 23, 2026

PyTest Results (Fast)

3 940 tests  +6   3 928 ✅ +6   6m 50s ⏱️ -19s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit cfc7f66. ± Comparison against base commit 0e57414.

♻️ This comment has been updated with latest results.

Co-Authored-By: bot_apk <apk@cognition.ai>
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 23, 2026

PyTest Results (Full)

3 943 tests  +6   3 931 ✅ +6   11m 8s ⏱️ -6s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit cfc7f66. ± Comparison against base commit 0e57414.

♻️ This comment has been updated with latest results.

…endency upgrades

Co-Authored-By: bot_apk <apk@cognition.ai>
@pnilan
Copy link
Copy Markdown
Contributor

I think this should be fixed before merge. MemoryMonitor.check_memory_usage() runs inside the main read loop, and this PR adds an unguarded sentry_sdk.capture_message() call there. That means a failure in Sentry reporting can now escape check_memory_usage() and fail the sync, even though this is only observability.

I’d suggest making the Sentry send explicitly best-effort, e.g.:

if not self._sentry_alerted and sentry_sdk is not None:
    try:
        sentry_sdk.capture_message(
            "Source memory usage at %d%% of container limit (%.2f / %.2f GB)."
            % (usage_percent, usage_gb, limit_gb),
            level="warning",
        )
    except Exception:
        logger.debug("Failed to send high-memory warning to Sentry.", exc_info=True)
    else:
        self._sentry_alerted = True

Two reasons for this structure:

  1. A Sentry failure should never abort the sync.
  2. _sentry_alerted should only flip after a successful send. In the current code it flips before the call, so a transient Sentry failure means no event is sent and no retry happens later in the same sync.

I’d also add a regression test that patches capture_message() to raise and asserts check_memory_usage() does not raise.

…sient failure

- Wrap capture_message() in try/except so Sentry failures never abort the sync
- Move _sentry_alerted = True to else clause (only set after successful send)
- Log Sentry failures at DEBUG level with exc_info for diagnostics
- Add regression test: capture_message raises → check_memory_usage does not raise
- Add test: transient failure → retry succeeds on next check interval

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Patrick Nilan (@pnilan) Addressed your feedback in cfc7f66 — thanks for the catch, this is a much better structure.

Changes:

  1. sentry_sdk.capture_message() is now wrapped in try/except Exception — a Sentry failure can never abort the sync.
  2. _sentry_alerted = True moved to the else clause — only flips after a successful send, so a transient failure allows retry on the next check interval.
  3. Failures logged at DEBUG level with exc_info=True for diagnostics.
  4. Added two regression tests:
    • test_sentry_failure_does_not_crash_sync — patches capture_message to raise RuntimeError, asserts check_memory_usage() does not raise, verifies _sentry_alerted stays False.
    • test_sentry_retries_after_transient_failure — first call raises, second succeeds; verifies retry behavior and flag flip.

All 19 tests pass locally (ruff format + lint clean).


Devin session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants