|
| 1 | +## Adding a scalar operator |
| 2 | + |
| 3 | +For an example, see commit |
| 4 | +[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425). |
| 5 | + |
| 6 | +To add a new scalar operator, follow these steps: |
| 7 | + |
| 8 | +1. **Define the operation dataclass:** |
| 9 | + - In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one. |
| 10 | + - Create a new dataclass inheriting from `base_ops.UnaryOp` for unary |
| 11 | + operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp` |
| 12 | + for ternary operators, or `base_ops.NaryOp for operators with many |
| 13 | + arguments. Note that these operators are counting the number column-like |
| 14 | + arguments. A function that takes only a single column but several literal |
| 15 | + values would still be a `UnaryOp`. |
| 16 | + - Define the `name` of the operation and any parameters it requires. |
| 17 | + - Implement the `output_type` method to specify the data type of the result. |
| 18 | + |
| 19 | +2. **Export the new operation:** |
| 20 | + - In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list. |
| 21 | + |
| 22 | +3. **Implement the user-facing function (pandas-like):** |
| 23 | + |
| 24 | + - Identify the canonical function from pandas / geopandas / awkward array / |
| 25 | + other popular Python package that this operator implements. |
| 26 | + - Find the corresponding class in BigFrames. For example, the implementation |
| 27 | + for most geopandas.GeoSeries methods is in |
| 28 | + `bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented |
| 29 | + in `bigframes/series.py` or one of the accessors, such as `StringMethods` |
| 30 | + in `bigframes/operations/strings.py`. |
| 31 | + - Create the user-facing function that will be called by users (e.g., `length`). |
| 32 | + - If the SQL method differs from pandas or geopandas in a way that can't be |
| 33 | + made the same, raise a `NotImplementedError` with an appropriate message and |
| 34 | + link to the feedback form. |
| 35 | + - Add the docstring to the corresponding file in |
| 36 | + `third_party/bigframes_vendored`, modeled after pandas / geopandas. |
| 37 | + |
| 38 | +4. **Implement the user-facing function (SQL-like):** |
| 39 | + |
| 40 | + - In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one. |
| 41 | + - Create the user-facing function that will be called by users (e.g., `st_length`). |
| 42 | + - This function should take a `Series` for any column-like inputs, plus any other parameters. |
| 43 | + - Inside the function, call `series._apply_unary_op`, |
| 44 | + `series._apply_binary_op`, or similar passing the operation dataclass you |
| 45 | + created. |
| 46 | + - Add a comprehensive docstring with examples. |
| 47 | + - In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list. |
| 48 | + |
| 49 | +5. **Implement the compilation logic:** |
| 50 | + - In `bigframes/core/compile/scalar_op_compiler.py`: |
| 51 | + - If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method. |
| 52 | + - If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature. |
| 53 | + - Create a new compiler implementation function (e.g., `geo_length_op_impl`). |
| 54 | + - Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`. |
| 55 | + - This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression. |
| 56 | + |
| 57 | +6. **Add Tests:** |
| 58 | + - Add system tests in the `tests/system/` directory to verify the end-to-end |
| 59 | + functionality of the new operator. Test various inputs, including edge cases |
| 60 | + and `NULL` values. |
| 61 | + |
| 62 | + Where possible, run the same test code against pandas or GeoPandas and |
| 63 | + compare that the outputs are the same (except for dtypes if BigFrames |
| 64 | + differs from pandas). |
| 65 | + - If you are overriding a pandas or GeoPandas property, add a unit test to |
| 66 | + ensure the correct behavior (e.g., raising `NotImplementedError` if the |
| 67 | + functionality is not supported). |
0 commit comments