Skip to content

fix(cdk): retry async job download on HTTP 403 with fresh polling URL#959

Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1774342850-fix-async-download-403-retry
Draft

fix(cdk): retry async job download on HTTP 403 with fresh polling URL#959
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1774342850-fix-async-download-403-retry

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

When AsyncHttpJobRepository downloads async job results (e.g., Bing Ads reports via Azure Blob Storage), the download URL may contain a time-limited SAS token that expires before the download starts. This causes an HTTP 403 that the CDK currently treats as a fatal config_error, killing all running jobs for the stream.

This PR adds retry logic to fetch_records(): when a download fails with HTTP 403, it re-polls the polling endpoint to obtain a fresh download URL and retries the full download. The retry is limited to 1 attempt (_MAX_DOWNLOAD_RETRIES = 1).

Changes:

  • Extracted download logic from fetch_records() into _download_records_for_job() (pure refactor of existing code)
  • fetch_records() now wraps the download call in a retry loop that catches AirbyteTracedException containing "status code '403'" in internal_message
  • Added _is_retriable_download_error() — string-matches on the CDK's internal error message format
  • Added _refresh_polling_response() — re-polls the polling endpoint and updates the cached response
  • Minor autoformat changes from ruff (wrapping long lambda strings)

Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16096
Related to https://github.com/airbytehq/oncall/issues/11749

Review & Testing Checklist for Human

  • Duplicate records on partial failure: If a job has multiple download targets and the 403 occurs on a later target, yield from self._download_records_for_job(job) will re-yield records from earlier targets that already succeeded. Verify whether this causes duplicate records downstream or if the platform deduplicates. This is the highest-risk item.
  • String matching fragility: _is_retriable_download_error checks for "status code '403'" in internal_message. Confirm this matches the exact string produced by HttpClient._handle_error_resolution() / DefaultErrorHandler. If the format changes, the retry silently stops working.
  • No new unit tests: The retry path is not tested. Consider whether a test simulating a 403 → re-poll → success flow should be added before merge.
  • Generator semantics: Confirm that yield from inside a try/except inside a for loop behaves correctly — specifically that the except fires if the inner generator raises mid-iteration.

Notes

  • The 403 error message in default_error_mapping.py is not changed in this PR. The fix addresses the root cause (expired download URL) rather than just improving the error string.
  • _MAX_DOWNLOAD_RETRIES = 1 means at most 2 total attempts (1 original + 1 retry). This is intentionally conservative.
  • The two formatting-only changes (lambda string wrapping) are from ruff format and are unrelated to the functional change.

Link to Devin session: https://app.devin.ai/sessions/9df259ec2e4f40f5a10f4b86151705f1

When downloading async job results (e.g., Bing Ads reports), the download
URL may contain a time-limited SAS token that expires before the download
starts. This causes an HTTP 403 that the CDK misclassifies as config_error,
preventing retries and killing all running jobs.

This change adds retry logic to AsyncHttpJobRepository.fetch_records that:
1. Catches HTTP 403 errors during the download phase
2. Re-polls the polling endpoint to obtain a fresh download URL
3. Retries the download with the new URL

Resolves airbytehq/airbyte-internal-issues#16096
Related to airbytehq/oncall#11749

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1774342850-fix-async-download-403-retry#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1774342850-fix-async-download-403-retry

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

PyTest Results (Fast)

3 934 tests  ±0   3 923 ✅ +1   7m 42s ⏱️ +33s
    1 suites ±0      11 💤  - 1 
    1 files   ±0       0 ❌ ±0 

Results for commit 5579e29. ± Comparison against base commit 0e57414.

@github-actions
Copy link
Copy Markdown

PyTest Results (Full)

3 937 tests  ±0   3 925 ✅ ±0   11m 13s ⏱️ -1s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 5579e29. ± Comparison against base commit 0e57414.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants