Skip to content

fix(firestore-bigquery-change-tracker): unmask BQ errors and stop spurious table updates#2817

Open
cabljac wants to merge 1 commit intonextfrom
fix/bq-change-tracker-bugs
Open

fix(firestore-bigquery-change-tracker): unmask BQ errors and stop spurious table updates#2817
cabljac wants to merge 1 commit intonextfrom
fix/bq-change-tracker-bugs

Conversation

@cabljac
Copy link
Copy Markdown
Contributor

@cabljac cabljac commented May 8, 2026

Summary

Two small but high-impact fixes in firestore-bigquery-change-tracker. Both bugs were diagnosed in detail by the issue reporters with the offending lines called out.

#2801: dead-letter path crashes on warm containers

bigquery/handleFailedTransactions.ts called db.settings({ ignoreUndefinedProperties: true }) inside the exported async body. The Firestore Node SDK contract requires settings() be called at most once and before any other method runs on the singleton. On warm Cloud Functions containers both conditions are violated by the second invocation; the resulting Firestore has already been initialized throw masks the real BigQuery insert failure that put us on the dead-letter path in the first place.

Fix: guard with a per-instance `Set`, plus a `try/catch` so that even if the singleton was used elsewhere in the process before this module's first call, the dead-letter write to the backup collection still proceeds and the real BQ error surfaces upstream.

#2194: `tableRequiresUpdate` always returns true

Two independent bugs in the same function:

  1. Clustering check:
    ```ts
    const configCluster = JSON.stringify(config.clustering); // "null" when unconfigured
    const tableCluster = JSON.stringify(metadata.clustering?.fields || []); // "[]"
    if (configCluster !== tableCluster) return true; // always true
    ```
    Fixed by normalising null/undefined to `[]` before stringifying: `JSON.stringify(config.clustering || [])`.

  2. `pathParamsColExists` strict-equality against `undefined`:
    `bigquery/index.ts` assigned the result of `fields.find(...)` (which is `Field | undefined`) to a parameter typed `boolean`. The check `!!config.wildcardIds !== pathParamsColExists` then resolved to `false !== undefined` -> always true.
    Fixed by coercing in the caller (`!!fields.find(...)`). Applied to `documentIdColExists` and `oldDataColExists` too since the function signature already promises `boolean` for all three (downstream checks happened to tolerate `undefined` for those two, but the type was being violated).

Test plan

  • `npm run build` (`tsc`) clean in `firestore-bigquery-export/firestore-bigquery-change-tracker`
  • CI integration tests against the configured BQ project (the existing `tests` directory hits a real BQ dataset; not runnable locally without `PROJECT_ID` and credentials)

The diff is intentionally tight (3 files, +23 / -8). I haven't added new integration tests since they all target a live BigQuery project; happy to add unit-level mocks for both fixes in a follow-up if reviewers prefer.

…rious table updates

Fixes #2801, fixes #2194.

#2801 - handleFailedTransactions

The dead-letter path called `db.settings({ ignoreUndefinedProperties: true })`
on every invocation. `Firestore.settings()` throws if the singleton has already
been used, so on warm Cloud Functions containers the second invocation crashed
with "Firestore has already been initialized", masking the original BigQuery
insert error that triggered the dead-letter path in the first place.

Guard the call with a per-instance Set and a try/catch: settings is applied
once per process per instance, and any throw is swallowed so the dead-letter
write to the backup collection still runs and the real BQ error surfaces.

#2194 - tableRequiresUpdate always returning true

Two bugs:

1. Clustering check: `JSON.stringify(config.clustering)` returns "null" when
   clustering is not configured, but `JSON.stringify(metadata.clustering?.fields || [])`
   returns "[]". They never match, so the table is always flagged as needing
   update. Normalise null/undefined to `[]` before stringifying.

2. `pathParamsColExists` was assigned the result of `fields.find()` (which is
   `Field | undefined`), then strict-compared against a boolean
   (`!!config.wildcardIds !== pathParamsColExists`). When the column is absent,
   `false !== undefined` is always true. The function signature already
   declares these params as `boolean`; coerce in the caller. Same fix applied
   to `documentIdColExists` and `oldDataColExists` for type-honesty (they
   already worked correctly under truthy checks but the type was being violated).
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the robustness of the BigQuery change tracker by handling undefined clustering configurations and safely applying Firestore settings during failed transactions to avoid runtime errors. It also ensures boolean values for column existence checks. Feedback suggests enhancing type safety for the Firestore instance ID and using the more idiomatic Array.prototype.some() method instead of !!find() for checking the presence of specific columns.

Comment on lines +10 to +12
const settingsApplied = new Set<string>();

const getConfiguredFirestore = (instanceId: string) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The instanceId parameter and the settingsApplied Set should ideally account for undefined values to be type-safe, as config.firestoreInstanceId is optional in the configuration. While the current caller ensures a default value, making this utility more robust prevents potential type errors in other contexts.

Suggested change
const settingsApplied = new Set<string>();
const getConfiguredFirestore = (instanceId: string) => {
const settingsApplied = new Set<string | undefined>();
const getConfiguredFirestore = (instanceId?: string) => {

db.settings({
ignoreUndefinedProperties: true,
});
const db = getConfiguredFirestore(config.firestoreInstanceId!);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the updated signature of getConfiguredFirestore, the non-null assertion is no longer necessary.

Suggested change
const db = getConfiguredFirestore(config.firestoreInstanceId!);
const db = getConfiguredFirestore(config.firestoreInstanceId);

Comment on lines +344 to 346
const documentIdColExists = !!fields.find(
(column) => column.name === "document_id"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Array.prototype.some() is more idiomatic and readable when you only need to check for the existence of an element that matches a condition, as it returns a boolean directly.

Suggested change
const documentIdColExists = !!fields.find(
(column) => column.name === "document_id"
);
const documentIdColExists = fields.some(
(column) => column.name === "document_id"
);

Comment on lines +347 to 349
const pathParamsColExists = !!fields.find(
(column) => column.name === "path_params"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Array.prototype.some() is more idiomatic and readable when you only need to check for the existence of an element that matches a condition, as it returns a boolean directly.

Suggested change
const pathParamsColExists = !!fields.find(
(column) => column.name === "path_params"
);
const pathParamsColExists = fields.some(
(column) => column.name === "path_params"
);

Comment on lines +351 to 353
const oldDataColExists = !!fields.find(
(column) => column.name === "old_data"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Array.prototype.some() is more idiomatic and readable when you only need to check for the existence of an element that matches a condition, as it returns a boolean directly.

Suggested change
const oldDataColExists = !!fields.find(
(column) => column.name === "old_data"
);
const oldDataColExists = fields.some(
(column) => column.name === "old_data"
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant