Skip to content

fix(firestore-bigquery-export): accept ISO 8601 string partition values#2813

Open
cabljac wants to merge 5 commits intonextfrom
fix/bq-part
Open

fix(firestore-bigquery-export): accept ISO 8601 string partition values#2813
cabljac wants to merge 5 commits intonextfrom
fix/bq-part

Conversation

@cabljac
Copy link
Copy Markdown
Contributor

@cabljac cabljac commented May 8, 2026

Summary

  • Restore 0.2.x behavior where Firestore string fields used as time partition values (e.g. order_week: "2026-01-01" with TIME_PARTITIONING_FIELD_TYPE=DATE) are accepted. The 0.3.0 refactor narrowed PartitionValueConverter to Firestore Timestamp / timestamp-like / Date only, silently coercing strings to NULL and landing rows in the __NULL__ partition.
  • PartitionValueConverter.convert now parses strings via new Date(value). Unparseable strings still return null and trigger the existing warning log.
  • Defensive try/catch around the BigQuery formatter switch so any serialization failure degrades to null + warn rather than crashing the row write.

Fixes #2803

Test plan

  • cd firestore-bigquery-export/firestore-bigquery-change-tracker && npm run test:local -- src/__tests__/bigquery/partitioning/converter.test.ts — 29/29 pass
  • Full tracker npm run test:local — converter + config + unit suites pass; the 8 pre-existing failures in e2e.test.ts are present on origin/next baseline (verified) and unrelated to this change
  • npx tsc --noEmit — clean
  • Maintainer to bump extension.yaml version to 0.3.2 as part of release workflow (CHANGELOG entry added)

Restores 0.2.x behavior for Firestore string partition fields. The 0.3.0
refactor of PartitionValueConverter narrowed accepted inputs to Firestore
Timestamp / timestamp-like / Date, silently coercing strings (including
ISO 8601 dates such as "2026-01-01") to NULL and landing rows in the
__NULL__ partition.

PartitionValueConverter.convert now parses strings via new Date(value).
Unparseable strings still return null and trigger the existing warning.
Adds a defensive try/catch around the BigQuery formatter switch so any
serialization failure degrades to null + warn rather than crashing the
row write.

Fixes #2803
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request restores support for ISO 8601 date/datetime strings as partition field values, addressing a regression introduced in version 0.3.0. The changes update the PartitionValueConverter to parse string inputs and include comprehensive unit tests for various date formats and edge cases. Feedback indicates that the DATETIME conversion needs to strip the 'Z' suffix for BigQuery compatibility and suggests adding a default case to the switch statement for improved robustness.

…witch

Adds default: return null to PartitionValueConverter.convert. fieldType is
typed as a strict union, so exhaustiveness is already enforced at compile
time, but the value comes from external config — a runtime mismatch falls
through cleanly to null instead of returning undefined.

Per gemini-code-assist review on #2813.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

Thanks for the review. Addressing each point:

1. Stripping Z from DATETIME: I verified empirically — BigQuery.datetime("2024-01-15T10:30:00.000Z") already normalizes to "2024-01-15 10:30:00.000" (canonical BQ DATETIME with space separator). Stripping Z would actually regress the output to "2024-01-15T10:30:00.000" (T-separator passthrough). This same Z-suffixed code path has been shipping since 0.3.0 for Firestore Timestamp / Date inputs, so we'd see existing breakage if the library rejected it.

2. default: return null: Good catch — applied in d0654fa. TS exhaustiveness covers this at compile time, but fieldType comes from external config so a runtime mismatch is worth handling cleanly.

3. Logging in catch: The caller Partitioning.getPartitionValue() already logs whenever convert() returns null (index.ts L79–L86) — so the existing warn fires for the catch path too. By design, the converter is pure and the caller logs.

@cabljac cabljac marked this pull request as ready for review May 8, 2026 09:44
@cabljac cabljac requested a review from CorieW May 8, 2026 09:49
@CorieW
Copy link
Copy Markdown
Member

CorieW commented May 8, 2026

The overall direction looks right, but I think this needs one more change before merging.

The current implementation uses new Date(value) to validate string partition values. That is risky because JavaScript date parsing is too permissive and can silently normalize invalid or partial dates.

For example:

new Date("2024-02-30").toISOString(); // Wrong: "2024-02-30" is not a real date, but JS normalizes it to "2024-03-01T00:00:00.000Z".
new Date("2024-01").toISOString();    // Wrong: this is a partial date, but JS treats it as "2024-01-01T00:00:00.000Z".
new Date("1").toISOString();          // Wrong: this is not a date format we should accept, but JS parses it as "2001-01-01T00:00:00.000Z".

…ion strings

JavaScript's Date parser is too permissive for partition value validation:

  new Date("2024-02-30")        -> 2024-03-01  (silent month overflow)
  new Date("2024-01")           -> 2024-01-01  (silent partial-date fill)
  new Date("1")                 -> 2001-01-01  (bare numeric as year)
  new Date("2023-02-29")        -> 2023-03-01  (non-leap-year overflow)
  new Date("2024-01-15T10:30")  -> engine-dependent local-time interpretation

Replaces the loose new Date() check with a strict YYYY-MM-DD prefix regex
that requires an explicit timezone designator (Z or +/-HH:MM) when a time
component is present, plus a calendar validator built via setUTCFullYear
that rejects calendar-invalid components like Feb 30, non-leap-year Feb 29,
month 13, and day 32.

Per CorieW review on #2813.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

Good catch, agreed — new Date() alone is too permissive. Verified your three examples plus a couple more silent normalizations on Node 22:

"2024-02-30"        -> 2024-03-01  (month overflow)
"2024-01"           -> 2024-01-01  (partial-date fill)
"1"                 -> 2001-01-01  (bare numeric as year)
"2023-02-29"        -> 2023-03-01  (non-leap-year overflow)
"2024-01-15T10:30"  -> engine-dependent (local-time interpretation)

Tightened in 4392a4f:

  • Strict regex requiring YYYY-MM-DD prefix, with optional T (or space) followed by HH:MM[:SS[.ffffff]] and a required timezone designator (Z or ±HH:MM) when the time portion is present. Rejects partial dates, bare numerics, and implicit-local-time datetimes.
  • Calendar validator built via setUTCFullYear(y, m-1, d) then comparing UTC components. Catches Feb 30, non-leap-year Feb 29, month 13, day 32, etc. Uses setUTCFullYear rather than Date.UTC() to avoid the legacy 2-digit-year quirk for years 1–99.
  • Existing pinned tests (date-only, Z-suffixed datetime, timezone-shifted DATE column) still pass; added 10 new rejection tests.

BigQuery DATE / DATETIME / TIMESTAMP all reject year 0 — the supported
range is 0001-01-01 to 9999-12-31. The previous strict validator passed
"0000-01-01" through (setUTCFullYear(0, 0, 1) yields year 0, matching
input components) but BigQuery rejects it server-side, causing the same
NULL-partition symptom this PR is fixing.

Reject yearN < 1 client-side so the user gets a clear warning log.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

One more gap caught on self-review, fixed in fba4ca3.

"0000-01-01" was passing through all my validation layers:

  • regex matches (\d{4} accepts 0000)
  • calendar validator passes (setUTCFullYear(0, 0, 1) yields year 0, matching the input components)
  • new Date("0000-01-01") parses fine

But BigQuery DATE / DATETIME / TIMESTAMP all reject year 0 (the supported range is 0001-01-01 to 9999-12-31), so the row would have failed server-side and ended up as the same NULL-partition symptom this PR is meant to fix, just on a more obscure input. Added an explicit yearN < 1 guard plus tests covering year 0 (rejected), year 0001 (accepted, BQ minimum), and year 9999 (accepted, BQ maximum).

Other edge cases I probed against the regex + validator + parser chain and confirmed are handled:

  • hour 25, minute 60, offset > 24h: new Date() returns NaN, the isNaN(parsed.getTime()) check catches
  • Feb 30 with a non-UTC offset (e.g. "2024-02-30T22:00:00-08:00"): calendar validator catches the Y/M/D mismatch independent of timezone
  • single-digit month like "2024-1-15": regex \d{2} requires two digits, rejected
  • comma fractional separator ("...,123Z"): regex requires ., rejected

41/41 converter tests passing.

@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 8, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the firestore-bigquery-export extension to version 0.3.2, restoring support for ISO 8601 date and datetime strings as partition field values. The implementation introduces strict regex-based parsing and calendar validation within the PartitionValueConverter to ensure data integrity before BigQuery insertion. Comprehensive tests have been added to verify various date formats and boundary conditions. Feedback suggests optimizing performance by moving the regex to a static constant.

Comment on lines +37 to +39
const m = value.match(
/^(\d{4})-(\d{2})-(\d{2})(?:[T ](\d{2}):(\d{2})(?::(\d{2})(?:\.\d+)?)?(Z|[+-]\d{2}:?\d{2}))?$/
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ISO 8601 regex is created on every call to convert for string values. To improve performance and maintainability, consider moving this regex to a static constant outside the convert method.

  private static readonly ISO_8601_REGEX = /^(\d{4})-(\d{2})-(\d{2})(?:[T ](\d{2}):(\d{2})(?::(\d{2})(?:\.\d+)?)?(Z|[+-]\d{2}:?\d{2}))?$/;

  convert(value: unknown): string | null {
    // ... existing code ...
    } else if (typeof value === "string") {
      const m = value.match(PartitionValueConverter.ISO_8601_REGEX);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [firestore-bigquery-export] String partition values silently become NULL after 0.3.0 (regression vs 0.2.x)

2 participants