Skip to content

Bug: file_based_stream_reader.filter_files_by_globs_and_start_date uses strict datetime parsing, rejecting valid ISO8601 dates #920

@devin-ai-integration

Description

@devin-ai-integration

Description

AbstractFileBasedStreamReader.filter_files_by_globs_and_start_date() in airbyte_cdk/sources/file_based/file_based_stream_reader.py (line 108) uses datetime.strptime(self.config.start_date, self.DATE_TIME_FORMAT) where DATE_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ".

This strictly requires microsecond digits in the start_date value. A valid ISO8601 date like 2025-01-01T00:00:00Z (without microseconds) is rejected with:

ValueError: time data '2025-01-01T00:00:00Z' does not match format '%Y-%m-%dT%H:%M:%S.%fZ'

Context

CDK v7.7.1 (via airbytehq/airbyte-python-cdk PR 887) fixed the spec validation side of this issue by:

  1. Updating the JSON Schema start_date pattern to accept flexible date formats
  2. Updating the Pydantic validator to use ab_datetime_try_parse

However, the runtime code path in filter_files_by_globs_and_start_date was not updated and still uses the strict datetime.strptime with %Y-%m-%dT%H:%M:%S.%fZ. This means config validation passes but the connector fails at runtime when listing/filtering files.

Impact

This affects all file-based connectors that inherit from AbstractFileBasedStreamReader, including:

  • source-sharepoint-enterprise
  • source-microsoft-sharepoint
  • source-microsoft-onedrive
  • source-s3
  • source-gcs
  • source-azure-blob-storage
  • source-sftp-bulk

The issue is triggered when the Terraform provider (or any other integration) normalizes datetime values by stripping microseconds (e.g., 2025-01-01T00:00:00.000000Z2025-01-01T00:00:00Z).

Steps to Reproduce

  1. Configure any file-based connector with start_date = "2025-01-01T00:00:00Z" (no microseconds)
  2. Run a sync or discover
  3. Observe ValueError from filter_files_by_globs_and_start_date

Suggested Fix

Update filter_files_by_globs_and_start_date in file_based_stream_reader.py to use the flexible ab_datetime_try_parse helper from datetime_helpers instead of strict datetime.strptime. This is consistent with the approach already taken for spec validation in CDK v7.7.1.

Related

  • Oncall issue: airbytehq/oncall#9390
  • CDK spec fix: airbytehq/airbyte-python-cdk PR 887 (CDK v7.7.1)

Requested by Aaron ("AJ") Steers (@aaronsteers).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions