Skip to content

chore(connectors): update nltk via CDK for source-s3, source-google-drive, source-azure-blob-storage#75557

Open
Aaron ("AJ") Steers (aaronsteers) wants to merge 7 commits intomasterfrom
devin/1774669596-pin-connectors-cdk-nltk-update
Open

chore(connectors): update nltk via CDK for source-s3, source-google-drive, source-azure-blob-storage#75557
Aaron ("AJ") Steers (aaronsteers) wants to merge 7 commits intomasterfrom
devin/1774669596-pin-connectors-cdk-nltk-update

Conversation

@aaronsteers
Copy link
Copy Markdown
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Mar 28, 2026

What

Pins three file-based connectors to the CDK branch devin/1774667708-update-nltk-cryptography for integration testing of the nltk 3.9.1 → 3.9.4 update in the Python CDK.

⚠️ The git branch pins must be reverted to stable CDK version pins before merge. This is enforced by TK-TODO comments and the tk-todo-check CI gate.

How

  • Updated airbyte-cdk dependency in each connector's pyproject.toml from a versioned PyPI reference to a git branch reference pointing at the CDK PR branch.
  • Regenerated poetry.lock for each connector using Poetry 1.8.5 (matching CI).
  • Bumped connector versions: source-s3 (4.15.2 → 4.15.3), source-google-drive (0.5.12 → 0.5.15), source-azure-blob-storage (0.8.15 → 0.8.16).
  • Added changelog entries for all three connectors.

Updates since last revision

  • Second merge conflict resolved for source-google-drive: master bumped source-google-drive to 0.5.14 (via #75573), colliding with our version again. Re-bumped to 0.5.15. This is the second collision — master previously took 0.5.13 via #75368. Changelog, metadata.yaml, and pyproject.toml all updated consistently.
  • Version bumps added for all three connectors via bump_version_in_repo (patch bumps). Changelog entries reference this PR.
  • Cryptography scope dropped from CDK PR: The CDK branch now only updates nltk (3.9.1 → 3.9.4). The cryptography range widening was reverted back to >=44.0.0,<45.0.0.
  • Fixed psutil missing dependency in source-s3 (from earlier revision): source-s3 directly imports psutil but was relying on it as an undeclared transitive dependency from the CDK. Added psutil = ">=5.8,<7" as an explicit dependency. This fix should persist even after the CDK pin is reverted.

Review guide

  1. airbyte-integrations/connectors/source-s3/pyproject.toml — exercises file-based CDK + unstructured parsing (nltk); also adds psutil as a direct dependency (was previously undeclared transitive)
  2. airbyte-integrations/connectors/source-google-drive/pyproject.toml — exercises file-based CDK
  3. airbyte-integrations/connectors/source-azure-blob-storage/pyproject.toml — exercises file-based CDK + unstructured parsing (nltk)
  4. docs/integrations/sources/{s3,google-drive,azure-blob-storage}.md — changelog entries for the patch bumps

Human review checklist

  • Verify connector lint and test CI jobs pass with the updated CDK dependency
  • source-google-drive version is 0.5.15 (not 0.5.13 or 0.5.14) due to two successive merge conflicts with master's #75368 and #75573. Confirm version is consistent across pyproject.toml, metadata.yaml, and changelog.
  • Lockfile diffs are large but expected — Poetry re-resolves dependency metadata differently when sourcing from git vs PyPI (e.g., ^X vs >=X,<Y formatting). The actual resolved dependency set should be equivalent.
  • Confirm psutil = ">=5.8,<7" is an appropriate version range for source-s3 (uses psutil.disk_usage and psutil.virtual_memory)
  • Before merge: revert the three airbyte-cdk git branch pins back to stable version pins (enforced by TK-TODO comments + tk-todo-check CI gate)

User Impact

Patch version bumps for three connectors with an updated nltk dependency (3.9.1 → 3.9.4) via the CDK. No breaking changes. source-s3 also gains psutil as an explicit dependency (previously undeclared transitive).

Can this PR be safely reverted and rolled back?

  • YES 💚

Link to Devin session: https://app.devin.ai/sessions/51acbfaadcd441d782d3a1817d6d413d
Requested by: Aaron ("AJ") Steers (@aaronsteers)


Open with Devin

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2026

source-azure-blob-storage Connector Test Results

37 tests   23 ✅  1m 31s ⏱️
 2 suites  14 💤
 2 files     0 ❌

Results for commit 6af3560.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2026

source-google-drive Connector Test Results

42 tests   38 ✅  31s ⏱️
 2 suites   4 💤
 2 files     0 ❌

Results for commit 6af3560.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2026

source-s3 Connector Test Results

208 tests   185 ✅  6m 38s ⏱️
  3 suites   23 💤
  3 files      0 ❌

Results for commit 6af3560.

♻️ This comment has been updated with latest results.

@aaronsteers
Copy link
Copy Markdown
Member Author

Aaron ("AJ") Steers (aaronsteers) commented Mar 30, 2026

/ai-prove-fix

Run regression tests and propose internal airbyte connectors to pin in order to prove the version bumps do not break functionality.

AI Prove Fix Started

Running readiness checks and testing against customer connections.
View workflow run
🔍 AI Prove Fix session starting... Running readiness checks and testing against customer connections. View playbook

Devin AI session created successfully!

@aaronsteers
Copy link
Copy Markdown
Member Author

Aaron ("AJ") Steers (aaronsteers) commented Mar 30, 2026

/ai-review

Do a deep investigation on whether any behavioral changes on the upstream CDK version bumps would break or change behaviors in the connectors.

PR AI Review Started

Evaluating connector PR for safety and quality.
View workflow run
AI PR Review starting...

Reviewing PR for connector safety and quality.
View playbook

Devin AI session created successfully!

@devin-ai-integration
Copy link
Copy Markdown
Contributor

devin-ai-integration Bot commented Mar 30, 2026

Fix Validation Evidence

Outcome: Fix/Feature Proven Successfully

Evidence Summary

Validated CDK dependency updates (nltk 3.9.1 → 3.9.4, cryptography upper bound <45 → <47) across three file-based connectors. All three regression test suites passed with zero regressions. CI connector tests on the PR branch also pass (208/208 source-s3, 42/42 source-google-drive, 37/37 source-azure-blob-storage). Deep analysis of upstream changes confirms no behavioral impact on these connectors.

Next Steps
  1. The CDK dependency updates appear safe for these three connectors. Once the CDK PR (airbytehq/airbyte-python-cdk#968) is merged and a stable CDK version is published, the TK-TODO pins in this PR should be reverted to the new stable version.
  2. For broader validation before the CDK release, consider running /ai-canary-prerelease on additional connectors that use the file-based or jwt CDK extras.
  3. The daily_hands_free_triage automation will monitor the release rollout after the CDK version is published.

Connector & PR Details

Connectors: source-s3, source-google-drive, source-azure-blob-storage
PR: #75557
Upstream CDK PR: airbytehq/airbyte-python-cdk#968
Session: Devin AI Session

Evidence Plan

Proving Criteria

  • Regression tests pass for all three connectors with the updated CDK branch dependency (no regressions in read/discover/check operations)
  • CI connector lint and test suites pass on the PR branch
  • Deep code analysis confirms no behavioral changes from the upstream dependency bumps

Disproving Criteria

  • Regression test failures showing data mismatches or new errors
  • CI test failures related to nltk, cryptography, or file parsing functionality
  • Evidence that the dependency changes alter connector behavior (schema changes, data differences, new error modes)

Cases Attempted

# Connector Test Type Result
1 source-s3 Regression Test PASSRun 23724882459
2 source-google-drive Regression Test PASSRun 23724893374
3 source-azure-blob-storage Regression Test PASSRun 23724900786
4 source-s3 CI Tests (PR) PASS — 208 tests, 185 passed, 23 skipped, 0 failed
5 source-google-drive CI Tests (PR) PASS — 42 tests, 38 passed, 4 skipped, 0 failed
6 source-azure-blob-storage CI Tests (PR) PASS — 37 tests, 23 passed, 14 skipped, 0 failed

Internal Connections Available for Live Testing (if needed)

Identified internal Airbyte org connections for potential live pinning:

  • source-s3: 137 internal connections (82 non-sandbox, 55 sandbox/integration-test)
  • source-google-drive: 153 internal connections
  • source-azure-blob-storage: 25 internal connections (4 non-sandbox, 21 sandbox)

Live connection pinning was not performed because:

  1. This PR pins connectors to a CDK git branch, not a published pre-release connector version — there is no Docker image to pin connections to
  2. The regression tests and CI tests provide sufficient evidence that the dependency updates do not break functionality
  3. The changes are dependency-only (no code changes to the connectors themselves)
Pre-flight Checks
  • Viability: CDK dependency updates (nltk security patch, cryptography range widening) are reasonable and address known issues
  • Safety: No malicious code, tampering, or suspicious patterns. Changes are limited to pyproject.toml and poetry.lock files
  • Breaking Change: No breaking changes detected. No schema changes, field removals, PK/cursor changes, spec changes, stream removals, or state format changes. The dependency updates are internal implementation details
  • Reversibility: Fully reversible — the TK-TODO comments explicitly block merge, and reverting to stable version pins is trivial

Design Intent Check: The upstream changes are intentional:

  • nltk 3.9.4: Security fix for CVE-2025-14009 (path traversal in ZIP extraction) — purely defensive, no API changes
  • cryptography <47: Upper bound widening to allow newer versions. CDK's usage (load_pem_private_key, asymmetric key types) is unaffected by breaking changes in cryptography 45.x/46.x. These three connectors don't even use the cryptography-dependent JWT code path
Detailed Evidence Log

Regression Tests (all passed)

Connector Run ID Status Duration URL
source-s3 23724882459 Success Completed View
source-google-drive 23724893374 Success Completed View
source-azure-blob-storage 23724900786 Success Completed View

CI Test Results (PR branch)

All connector tests pass on the PR branch (commit 203979ab):

  • source-s3: 208 tests, 185 passed, 23 skipped, 0 failed (7m 8s)
  • source-google-drive: 42 tests, 38 passed, 4 skipped, 0 failed (31s)
  • source-azure-blob-storage: 37 tests, 23 passed, 14 skipped, 0 failed (1m 52s)

AI Deep Review Analysis

A parallel AI Review session performed deep investigation of upstream CDK behavioral changes and confirmed:

Change Risk Behavioral Impact
nltk 3.9.1 → 3.9.4 Low Security improvements only. No API changes.
cryptography <45 → <47 Low Widened range. CDK APIs used are stable. These connectors don't use the cryptography code path.
psutil added to source-s3 None Formalizes existing undeclared transitive dependency.

Overall: No behavioral changes expected from the upstream CDK version bumps.

@octavia-bot octavia-bot Bot marked this pull request as ready for review March 30, 2026 02:05
@devin-ai-integration
Copy link
Copy Markdown
Contributor

devin-ai-integration Bot commented Mar 30, 2026

AI PR Review Report

Review Action: NO ACTION (INCONCLUSIVE) — CI checks still pending; dependency widening flagged for awareness.

Gate Status
PR Hygiene PASS
Code Hygiene WARNING
Code Security PASS
Per-Record Performance PASS
Breaking Dependencies WARNING
Backwards Compatibility PASS
Forwards Compatibility PASS
Behavioral Changes PASS
Out-of-Scope Changes PASS
CI Checks UNKNOWN
Live / E2E Tests UNKNOWN

📋 PR Details & Eligibility

Connector & PR Info

Connector(s): source-s3, source-google-drive, source-azure-blob-storage
PR: #75557
HEAD SHA: 203979abae8ed6633865f23ffdc696f03a3fd2ba
Session: https://app.devin.ai/sessions/0ead84e7e1b847aabbf1ff8acf93a1ed

Auto-Approve Eligibility

Eligible: No
Category: not-eligible
Reason: This PR changes functional dependency versions (pyproject.toml and poetry.lock files) across three connectors and adds a new direct dependency (psutil) to source-s3. These are not trivial comment/whitespace changes, docs-only changes, or additive spec changes.

Review Action Details

NO ACTION (INCONCLUSIVE) — Core CI checks (lint, test, build for all three connectors) are still pending at time of review. No enforced gates are definitively FAIL. The tk-todo-check failure is expected and intentional (merge blocker by design). Human review is recommended once CI completes.

Note: This bot can approve PRs when all gates pass AND the PR is eligible for auto-approval (docs-only, additive spec changes, patch/minor dependency bumps, or comment/whitespace-only changes). PRs with other types of changes require human review even if all gates pass.

🔍 Deep Investigation: Upstream CDK Behavioral Changes

Upstream CDK PR airbytehq/airbyte-python-cdk#968

The CDK branch devin/1774667708-update-nltk-cryptography makes exactly two dependency changes in pyproject.toml (plus a lockfile regeneration):

1. nltk: 3.9.13.9.4 (patch bump)

Changelog between versions:

Version Key Changes
3.9.2 (Oct 2025) Bug fixes: Wordnet interoperability, PerceptronTagger saving, tkinter import guard, Python 3.13 support added. No API changes.
3.9.3 (Feb 2026) Security fix: CVE-2025-14009 — secure ZIP extraction in nltk.downloader to block path traversal. Also blocks path traversal in corpus readers and FS pointers.
3.9.4 Continuation of 3.9.3 security hardening.

How the CDK uses nltk (airbyte_cdk/sources/file_based/file_types/unstructured_parser.py):

  • Calls nltk.data.find() and nltk.download() to fetch tokenizer models (punkt, punkt_tab, averaged_perceptron_tagger_eng)
  • These are called at module import time (lines 64-73)
  • The security fix in 3.9.3 makes nltk.download() safer by validating ZIP extraction paths — this is a pure security improvement with no functional API change

Risk assessment: LOW — All changes are bug fixes and security hardening. No API surface changes. The tokenizer APIs used by the CDK (nltk.data.find, nltk.download) are stable across all these versions. The CVE-2025-14009 fix actually improves security posture.

2. cryptography: >=44.0.0,<45.0.0>=44.0.0,<47.0.0 (upper bound widened)

Breaking changes in the widened range:

Version Breaking Changes Relevance to CDK
45.0.0 (May 2025) load_ssh_private_key() behavior change (TypeError on password mismatch). Refactored PEM/DER private key loading. CDK uses load_pem_private_key() in jwt.py — the 45.0.0 release notes say the refactor is "intended to be backwards compatible for all well-formed keys." Not affected.
46.0.0 (Sep 2025) Dropped Python 3.7. Removed deprecated ciphers: CAST5, SEED, IDEA, Blowfish. Removed get_attribute_for_oid method on CSR. CDK does not use any removed ciphers or deprecated APIs. CDK requires Python >=3.10. Not affected.

How the CDK uses cryptography (airbyte_cdk/sources/declarative/auth/jwt.py):

  • serialization.load_pem_private_key() — stable API, no changes
  • Type imports: RSAPrivateKey, EllipticCurvePrivateKey, Ed25519PrivateKey, Ed448PrivateKey — stable types

How these connectors use cryptography: None of the three connectors (source-s3, source-google-drive, source-azure-blob-storage) directly import or use the cryptography library. They only receive it as a transitive dependency through the CDK. Since these are file-based connectors using the file-based CDK extra, they primarily exercise the unstructured parser (nltk path) rather than the JWT authenticator (cryptography path).

Risk assessment: LOW — The widening allows cryptography 45.x/46.x to resolve, but the CDK's usage patterns (load_pem_private_key, asymmetric key types) are unaffected by the breaking changes in those versions. These three connectors don't even use the cryptography-dependent code path.

3. Additional change: psutil added as direct dependency to source-s3

Source-s3 directly imports psutil (in source_s3/v4/stream_reader.py lines 15, 225, 229) for disk_usage() and virtual_memory(). Previously this was an undeclared transitive dependency from the CDK. When the CDK is sourced from git (vs PyPI), Poetry resolves differently and psutil was dropped. Adding it as psutil = ">=5.8,<7" is correct — these are stable APIs available across all versions in that range.

Risk assessment: NONE — This formalizes an existing dependency, reducing fragility.

Summary

Change Risk Behavioral Impact
nltk 3.9.1 → 3.9.4 Low Security improvements only. No API changes.
cryptography <45 → <47 Low Widened range allows newer versions. CDK APIs used are stable. These connectors don't use the cryptography code path.
psutil added to source-s3 None Formalizes existing undeclared transitive dependency.
poetry.lock regeneration None Different formatting from Poetry version (1.8.4 → 1.8.5), same resolved packages.

Overall assessment: No behavioral changes are expected in these connectors from the upstream CDK version bumps. The changes are limited to security hardening (nltk) and dependency range widening (cryptography), neither of which alters runtime behavior for file-based connectors.

🔍 Gate Evaluation Details

Gate-by-Gate Analysis

Gate Status Enforced? Details
PR Hygiene PASS Yes PR description is thorough with review checklist, linked upstream CDK PR, clear scope. No connector version bumps (intentional for test-only PR).
Code Hygiene WARNING WARNING No new test files added, but this is a dependency-only change — existing connector tests exercise the changed dependencies via CI.
Code Security PASS Yes No auth/credential patterns in diff. Changes are dependency version pins and lockfile regeneration only.
Per-Record Performance PASS WARNING No changes to record processing logic. Dependency bumps do not affect per-record hot paths.
Breaking Dependencies WARNING WARNING cryptography upper bound widened from <45.0.0 to <47.0.0. Versions 45.0.0 and 46.0.0 contain breaking changes (SSH key loading, removed deprecated ciphers). However, deep analysis confirms the CDK's usage (load_pem_private_key, asymmetric key types) is unaffected, and these three connectors don't use the cryptography code path at all.
Backwards Compatibility PASS Blocks Auto-Approve No spec changes, no stream changes, no config changes. Dependency versions are internal implementation detail.
Forwards Compatibility PASS Blocks Auto-Approve No state format changes. The TK-TODO comments explicitly block merge until reverted to stable pins.
Behavioral Changes PASS Blocks Auto-Approve Deep investigation confirms no behavioral changes — see detailed analysis above. nltk changes are security-only, cryptography widening doesn't affect used APIs.
Out-of-Scope Changes PASS Skip All changes are within airbyte-integrations/connectors/ scope.
CI Checks UNKNOWN Yes Core CI checks (lint, test, build for all 3 connectors) are still in progress. tk-todo-check failed as expected (intentional merge blocker). Previous CI run on commit 47582251 showed all connector tests passing (208/208 source-s3, 42/42 source-google-drive, 37/37 source-azure-blob-storage).
Live / E2E Tests UNKNOWN Yes /ai-prove-fix has been triggered (see session) but results are not yet available. No pre-release validation labels present.
📚 Evidence Consulted

Evidence

  • Changed files: 6 files (+201 -247)
    • airbyte-integrations/connectors/source-azure-blob-storage/pyproject.toml — CDK pin to git branch + TK-TODO
    • airbyte-integrations/connectors/source-azure-blob-storage/poetry.lock — regenerated
    • airbyte-integrations/connectors/source-google-drive/pyproject.toml — CDK pin to git branch + TK-TODO
    • airbyte-integrations/connectors/source-google-drive/poetry.lock — regenerated
    • airbyte-integrations/connectors/source-s3/pyproject.toml — CDK pin to git branch + TK-TODO + psutil added
    • airbyte-integrations/connectors/source-s3/poetry.lock — regenerated
  • CI checks: 27 passed, 20 pending, 1 failed (tk-todo-check — intentional), 11 skipped
  • PR labels: (auto-labeled based on changed files)
  • PR description: Present and thorough
  • Existing bot reviews: None for current HEAD SHA
  • Upstream CDK PR: airbytehq/airbyte-python-cdk#968 — CDK CI: 3937/3937 tests passing
  • CDK usage analysis: Reviewed unstructured_parser.py (nltk usage) and jwt.py (cryptography usage) in CDK source
❓ How to Respond

Providing Context or Justification

The CI Checks and Live / E2E Tests gates are UNKNOWN (pending). Once CI completes and /ai-prove-fix results are available, re-run /ai-review for an updated assessment.

Note: The tk-todo-check failure is intentional — the TK-TODO comments were added specifically to block merge until the git branch references are reverted to stable version pins. This is expected behavior for a CI validation PR.


Devin session

devin-ai-integration[bot]

This comment was marked as resolved.

…e-drive (0.5.13), source-azure-blob-storage (0.8.16) and regenerate lockfiles
@devin-ai-integration devin-ai-integration Bot changed the title chore(connectors): pin source-s3, source-google-drive, source-azure-blob-storage to CDK nltk update branch chore(connectors): update nltk via CDK for source-s3, source-google-drive, source-azure-blob-storage Mar 30, 2026
@@ -356,6 +356,7 @@ This connector utilizes the open source [Unstructured](https://unstructured-io.g

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint-fix] reported by reviewdog 🐶

Suggested change

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

Deploy preview for airbyte-docs ready!

✅ Preview
https://airbyte-docs-404h33yhd-airbyte-growth.vercel.app

Built with commit 6af3560.
This pull request is being automatically deployed with vercel-action

…pdate - resolve google-drive version conflict (bump to 0.5.14)
…pdate - resolve google-drive version conflict (bump to 0.5.15)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 3 additional findings in Devin Review.

Open in Devin Review

python = "^3.11,<3.14"
pytz = "^2024.1"
airbyte-cdk = {extras = ["file-based"], version = "^7.0.0"}
airbyte-cdk = {extras = ["file-based"], git = "https://github.com/airbytehq/airbyte-python-cdk.git", branch = "devin/1774667708-update-nltk-cryptography"} # TK-TODO: Revert to stable version pin before merge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Production dependency pinned to unstable git branch instead of stable version release

All three connectors (source-azure-blob-storage, source-google-drive, source-s3) have their airbyte-cdk dependency changed from a stable PyPI version pin (e.g., ^7.0.0) to an unstable git branch: git = "https://github.com/airbytehq/airbyte-python-cdk.git", branch = "devin/1774667708-update-nltk-cryptography". The comment on each line explicitly states # TK-TODO: Revert to stable version pin before merge, confirming this was intended as a temporary development change. If merged as-is, all three connectors will depend on a mutable, non-release branch — meaning builds are non-reproducible, the branch could be deleted or force-pushed, and the connectors would be shipping with untested/unreleased CDK code. This affects source-azure-blob-storage/pyproject.toml:21, source-google-drive/pyproject.toml:24, and source-s3/pyproject.toml:25.

Prompt for agents
Revert the airbyte-cdk dependency in all three pyproject.toml files from the git branch reference back to a stable PyPI version pin. The TODO comment on each line says to do this before merge. The files to update are:

1. airbyte-integrations/connectors/source-azure-blob-storage/pyproject.toml line 21: change back to something like airbyte-cdk = {extras = ["file-based"], version = "^7.x.x"} with the appropriate version that includes the nltk 3.9.4 update.

2. airbyte-integrations/connectors/source-google-drive/pyproject.toml line 24: change back to something like airbyte-cdk = {extras = ["file-based"], version = "^7.x.x"} with the appropriate version.

3. airbyte-integrations/connectors/source-s3/pyproject.toml line 25: change back to something like airbyte-cdk = {extras = ["file-based"], version = "^7.x.x"} with the appropriate version.

First publish a stable release of the airbyte-python-cdk from the devin/1774667708-update-nltk-cryptography branch, then pin to that released version.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional — the git branch pins are temporary for integration testing the CDK's nltk 3.9.4 update. The TK-TODO comments and tk-todo-check CI gate are specifically designed to block merge until these are reverted to stable version pins after the CDK PR (airbytehq/airbyte-python-cdk#968) is merged and released.

google-auth-oauthlib = "==1.1.0"
google-api-python-client-stubs = "==1.18.0"
airbyte-cdk = {extras = ["file-based"], version = "^7.0.1"}
airbyte-cdk = {extras = ["file-based"], git = "https://github.com/airbytehq/airbyte-python-cdk.git", branch = "devin/1774667708-update-nltk-cryptography"} # TK-TODO: Revert to stable version pin before merge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Production dependency pinned to unstable git branch (source-google-drive)

Same issue as in source-azure-blob-storage: source-google-drive/pyproject.toml:24 pins airbyte-cdk to the devin/1774667708-update-nltk-cryptography git branch with an explicit # TK-TODO: Revert to stable version pin before merge comment. This must not be merged to production.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

transformers = "^4.38.2"
urllib3 = "<2"
airbyte-cdk = {extras = ["file-based"], version = "^7.0.4"}
airbyte-cdk = {extras = ["file-based"], git = "https://github.com/airbytehq/airbyte-python-cdk.git", branch = "devin/1774667708-update-nltk-cryptography"} # TK-TODO: Revert to stable version pin before merge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Production dependency pinned to unstable git branch (source-s3)

Same issue as in the other two connectors: source-s3/pyproject.toml:25 pins airbyte-cdk to the devin/1774667708-update-nltk-cryptography git branch with an explicit # TK-TODO: Revert to stable version pin before merge comment. This must not be merged to production.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants