Skip to content

refactor(bigframes): Modularize compiler routing as proxy executor#16907

Open
TrevorBergeron wants to merge 10 commits intomainfrom
tbegeron_proxy_exec
Open

refactor(bigframes): Modularize compiler routing as proxy executor#16907
TrevorBergeron wants to merge 10 commits intomainfrom
tbegeron_proxy_exec

Conversation

@TrevorBergeron
Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the DualCompilerProxyExecutor to manage the transition between ibis and sqlglot compilers, moving fallback logic out of BigQueryCachingExecutor. The BigQueryCachingExecutor is updated to support specific compiler selection and shared caching. Review feedback points out several critical issues in the new proxy executor, including the use of an undefined self._main attribute, missing imports for bigquery and CacheConfig, and type errors when passing labels to ExecutionSpec. Additionally, a bug was identified in bq_caching_executor.py where extra_labels were not correctly passed to the query execution method.

Comment thread packages/bigframes/bigframes/session/proxy_executor.py Outdated
Comment thread packages/bigframes/bigframes/session/bq_caching_executor.py Outdated
Comment thread packages/bigframes/bigframes/session/proxy_executor.py
Comment thread packages/bigframes/bigframes/session/proxy_executor.py Outdated
Comment thread packages/bigframes/bigframes/session/proxy_executor.py Outdated
Comment thread packages/bigframes/bigframes/session/proxy_executor.py Outdated
Comment thread packages/bigframes/bigframes/session/proxy_executor.py Outdated
Comment thread packages/bigframes/bigframes/session/proxy_executor.py
@TrevorBergeron TrevorBergeron changed the title refactor: Modularize compiler routing as proxy executor refactor(bigframes): Modularize compiler routing as proxy executor May 1, 2026
@TrevorBergeron TrevorBergeron added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 5, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 5, 2026
@TrevorBergeron TrevorBergeron added the kokoro:run Add this label to force Kokoro to re-run the tests. label May 5, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label May 5, 2026
@TrevorBergeron
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a DualCompilerProxyExecutor to facilitate the rollout of a new sqlglot-based compiler alongside the existing ibis implementation. It also refactors BigQueryCachingExecutor to support configurable compilers and shared caching. Feedback highlights that several methods in the new proxy executor (to_sql, dry_run) are currently hardcoded to use the legacy compiler, ignoring experimental settings. Additionally, there are recommendations to improve the robustness of the fallback logic by catching a broader range of exceptions and to avoid using mutable default arguments in constructor signatures.

Comment on lines +94 to +99
return self._ibis_executor.to_sql(
array_value,
offset_column=offset_column,
ordered=ordered,
enable_cache=enable_cache,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The to_sql method currently always routes to the ibis executor, ignoring the sql_compiler experiment option. It should follow the same routing logic as execute() to ensure consistency across the API. When falling back to ibis in stable mode, ensure the exception is logged to aid in debugging and prevent masking underlying issues.

        compiler_option = bigframes.options.experiments.sql_compiler
        if compiler_option == "legacy":
            return self._ibis_executor.to_sql(
                array_value,
                offset_column=offset_column,
                ordered=ordered,
                enable_cache=enable_cache,
            )
        elif compiler_option == "experimental":
            return self._sqlglot_executor.to_sql(
                array_value,
                offset_column=offset_column,
                ordered=ordered,
                enable_cache=enable_cache,
            )
        else:  # stable
            try:
                return self._sqlglot_executor.to_sql(
                    array_value,
                    offset_column=offset_column,
                    ordered=ordered,
                    enable_cache=enable_cache,
                )
            except Exception as e:
                logger.warning(f"sqlglot-first to_sql failed, falling back to ibis: {e}")
                return self._ibis_executor.to_sql(
                    array_value,
                    offset_column=offset_column,
                    ordered=ordered,
                    enable_cache=enable_cache,
                )
References
  1. Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.


Does not actually execute the data but will get stats and indicate any invalid query errors.
"""
return self._ibis_executor.dry_run(array_value, ordered=ordered)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to to_sql, the dry_run method is hardcoded to use the ibis executor. It should be updated to route to the appropriate executor based on the configuration. Ensure that any broad exception caught during the fallback in stable mode is logged to aid in debugging.

        compiler_option = bigframes.options.experiments.sql_compiler
        if compiler_option == "legacy":
            return self._ibis_executor.dry_run(array_value, ordered=ordered)
        elif compiler_option == "experimental":
            return self._sqlglot_executor.dry_run(array_value, ordered=ordered)
        else:  # stable
            try:
                return self._sqlglot_executor.dry_run(array_value, ordered=ordered)
            except Exception as e:
                logger.warning(f"sqlglot-first dry_run failed, falling back to ibis: {e}")
                return self._ibis_executor.dry_run(array_value, ordered=ordered)
References
  1. Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.

@@ -84,10 +81,14 @@ def __init__(
enable_polars_execution: bool = False,
publisher: bigframes.core.events.Publisher,
labels: Mapping[str, str] = {},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a mutable dictionary as a default value for an argument is a common Python pitfall. While it might not be causing issues here, it is generally safer to use None and initialize it to an empty dictionary inside the method to avoid shared state across instances if the dictionary were ever mutated.

metrics: Optional[bigframes.session.metrics.ExecutionMetrics] = None,
enable_polars_execution: bool = False,
publisher: bigframes.core.events.Publisher,
labels: Mapping[str, str] = {},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a mutable dictionary as a default value for the labels argument is discouraged in Python. It is better to use None as the default and initialize it to {} within the __init__ method.

Suggested change
labels: Mapping[str, str] = {},
labels: Optional[Mapping[str, str]] = None,
):
labels = labels or {}
self._enable_polars_execution = enable_polars_execution

{_COMPILER_LABEL_KEY: f"sqlglot-{correlation_id}"}
),
)
except google.cloud.exceptions.BadRequest as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback logic only catches google.cloud.exceptions.BadRequest. If the sqlglot compiler fails locally, the execution will crash instead of falling back to ibis. Consider catching a broader range of exceptions during the sqlglot attempt, but ensure they are logged (e.g., using logger.warning) to prevent masking underlying issues as per repository guidelines.

References
  1. Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.

@TrevorBergeron TrevorBergeron marked this pull request as ready for review May 5, 2026 20:06
@TrevorBergeron TrevorBergeron requested review from a team as code owners May 5, 2026 20:06
@TrevorBergeron TrevorBergeron requested review from chelsea-lin and tswast and removed request for a team and chelsea-lin May 5, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants