refactor(bigframes): Modularize compiler routing as proxy executor#16907
refactor(bigframes): Modularize compiler routing as proxy executor#16907TrevorBergeron wants to merge 10 commits intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the DualCompilerProxyExecutor to manage the transition between ibis and sqlglot compilers, moving fallback logic out of BigQueryCachingExecutor. The BigQueryCachingExecutor is updated to support specific compiler selection and shared caching. Review feedback points out several critical issues in the new proxy executor, including the use of an undefined self._main attribute, missing imports for bigquery and CacheConfig, and type errors when passing labels to ExecutionSpec. Additionally, a bug was identified in bq_caching_executor.py where extra_labels were not correctly passed to the query execution method.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a DualCompilerProxyExecutor to facilitate the rollout of a new sqlglot-based compiler alongside the existing ibis implementation. It also refactors BigQueryCachingExecutor to support configurable compilers and shared caching. Feedback highlights that several methods in the new proxy executor (to_sql, dry_run) are currently hardcoded to use the legacy compiler, ignoring experimental settings. Additionally, there are recommendations to improve the robustness of the fallback logic by catching a broader range of exceptions and to avoid using mutable default arguments in constructor signatures.
| return self._ibis_executor.to_sql( | ||
| array_value, | ||
| offset_column=offset_column, | ||
| ordered=ordered, | ||
| enable_cache=enable_cache, | ||
| ) |
There was a problem hiding this comment.
The to_sql method currently always routes to the ibis executor, ignoring the sql_compiler experiment option. It should follow the same routing logic as execute() to ensure consistency across the API. When falling back to ibis in stable mode, ensure the exception is logged to aid in debugging and prevent masking underlying issues.
compiler_option = bigframes.options.experiments.sql_compiler
if compiler_option == "legacy":
return self._ibis_executor.to_sql(
array_value,
offset_column=offset_column,
ordered=ordered,
enable_cache=enable_cache,
)
elif compiler_option == "experimental":
return self._sqlglot_executor.to_sql(
array_value,
offset_column=offset_column,
ordered=ordered,
enable_cache=enable_cache,
)
else: # stable
try:
return self._sqlglot_executor.to_sql(
array_value,
offset_column=offset_column,
ordered=ordered,
enable_cache=enable_cache,
)
except Exception as e:
logger.warning(f"sqlglot-first to_sql failed, falling back to ibis: {e}")
return self._ibis_executor.to_sql(
array_value,
offset_column=offset_column,
ordered=ordered,
enable_cache=enable_cache,
)References
- Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.
|
|
||
| Does not actually execute the data but will get stats and indicate any invalid query errors. | ||
| """ | ||
| return self._ibis_executor.dry_run(array_value, ordered=ordered) |
There was a problem hiding this comment.
Similar to to_sql, the dry_run method is hardcoded to use the ibis executor. It should be updated to route to the appropriate executor based on the configuration. Ensure that any broad exception caught during the fallback in stable mode is logged to aid in debugging.
compiler_option = bigframes.options.experiments.sql_compiler
if compiler_option == "legacy":
return self._ibis_executor.dry_run(array_value, ordered=ordered)
elif compiler_option == "experimental":
return self._sqlglot_executor.dry_run(array_value, ordered=ordered)
else: # stable
try:
return self._sqlglot_executor.dry_run(array_value, ordered=ordered)
except Exception as e:
logger.warning(f"sqlglot-first dry_run failed, falling back to ibis: {e}")
return self._ibis_executor.dry_run(array_value, ordered=ordered)References
- Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.
| @@ -84,10 +81,14 @@ def __init__( | |||
| enable_polars_execution: bool = False, | |||
| publisher: bigframes.core.events.Publisher, | |||
| labels: Mapping[str, str] = {}, | |||
There was a problem hiding this comment.
Using a mutable dictionary as a default value for an argument is a common Python pitfall. While it might not be causing issues here, it is generally safer to use None and initialize it to an empty dictionary inside the method to avoid shared state across instances if the dictionary were ever mutated.
| metrics: Optional[bigframes.session.metrics.ExecutionMetrics] = None, | ||
| enable_polars_execution: bool = False, | ||
| publisher: bigframes.core.events.Publisher, | ||
| labels: Mapping[str, str] = {}, |
There was a problem hiding this comment.
Using a mutable dictionary as a default value for the labels argument is discouraged in Python. It is better to use None as the default and initialize it to {} within the __init__ method.
| labels: Mapping[str, str] = {}, | |
| labels: Optional[Mapping[str, str]] = None, | |
| ): | |
| labels = labels or {} | |
| self._enable_polars_execution = enable_polars_execution |
| {_COMPILER_LABEL_KEY: f"sqlglot-{correlation_id}"} | ||
| ), | ||
| ) | ||
| except google.cloud.exceptions.BadRequest as e: |
There was a problem hiding this comment.
The fallback logic only catches google.cloud.exceptions.BadRequest. If the sqlglot compiler fails locally, the execution will crash instead of falling back to ibis. Consider catching a broader range of exceptions during the sqlglot attempt, but ensure they are logged (e.g., using logger.warning) to prevent masking underlying issues as per repository guidelines.
References
- Avoid broad except Exception: blocks that silently return None. Instead, log the exception (e.g., using logger.warning) to aid in debugging and prevent masking underlying issues.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕