Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gemini/common/constraints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Constraints

- Only add git commits. Do not change git history.
- Follow the spec file for development.
- Check off items in the "Acceptance
criteria" and "Detailed steps" sections with `[x]`.
- Please do this as they are completed.
- Refer back to the spec after each step.
9 changes: 9 additions & 0 deletions .gemini/common/docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Documentation

If a method or property is implementing the same interface as a third-party
package such as pandas or scikit-learn, place the relevant docstring in the
corresponding `third_party/bigframes_vendored/package_name` directory, not in
the `bigframes` directory. Implementations may be placed in the `bigframes`
directory, though.

@../tools/test_docs.md
67 changes: 67 additions & 0 deletions .gemini/tasks/scalar_op.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
## Adding a scalar operator

For an example, see commit
[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425).

To add a new scalar operator, follow these steps:

1. **Define the operation dataclass:**
- In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one.
- Create a new dataclass inheriting from `base_ops.UnaryOp` for unary
operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp`
for ternary operators, or `base_ops.NaryOp for operators with many
arguments. Note that these operators are counting the number column-like
arguments. A function that takes only a single column but several literal
values would still be a `UnaryOp`.
- Define the `name` of the operation and any parameters it requires.
- Implement the `output_type` method to specify the data type of the result.

2. **Export the new operation:**
- In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list.

3. **Implement the user-facing function (pandas-like):**

- Identify the canonical function from pandas / geopandas / awkward array /
other popular Python package that this operator implements.
- Find the corresponding class in BigFrames. For example, the implementation
for most geopandas.GeoSeries methods is in
`bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented
in `bigframes/series.py` or one of the accessors, such as `StringMethods`
in `bigframes/operations/strings.py`.
- Create the user-facing function that will be called by users (e.g., `length`).
- If the SQL method differs from pandas or geopandas in a way that can't be
made the same, raise a `NotImplementedError` with an appropriate message and
link to the feedback form.
- Add the docstring to the corresponding file in
`third_party/bigframes_vendored`, modeled after pandas / geopandas.

4. **Implement the user-facing function (SQL-like):**

- In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one.
- Create the user-facing function that will be called by users (e.g., `st_length`).
- This function should take a `Series` for any column-like inputs, plus any other parameters.
- Inside the function, call `series._apply_unary_op`,
`series._apply_binary_op`, or similar passing the operation dataclass you
created.
- Add a comprehensive docstring with examples.
- In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list.

5. **Implement the compilation logic:**
- In `bigframes/core/compile/scalar_op_compiler.py`:
- If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method.
- If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature.
- Create a new compiler implementation function (e.g., `geo_length_op_impl`).
- Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`.
- This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression.

6. **Add Tests:**
- Add system tests in the `tests/system/` directory to verify the end-to-end
functionality of the new operator. Test various inputs, including edge cases
and `NULL` values.

Where possible, run the same test code against pandas or GeoPandas and
compare that the outputs are the same (except for dtypes if BigFrames
differs from pandas).
- If you are overriding a pandas or GeoPandas property, add a unit test to
ensure the correct behavior (e.g., raising `NotImplementedError` if the
functionality is not supported).
18 changes: 18 additions & 0 deletions .gemini/tools/style_nox.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Code Style with nox

- We use the automatic code formatter `black`. You can run it using
the nox session `format`. This will eliminate many lint errors. Run via:

```bash
nox -r -s format
```

- PEP8 compliance is required, with exceptions defined in the linter configuration.
If you have ``nox`` installed, you can test that you have not introduced
any non-compliant code via:

```
nox -r -s lint
```

- When writing tests, use the idiomatic "pytest" style.
10 changes: 10 additions & 0 deletions .gemini/tools/test_docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
## Testing code samples

Code samples are very important for accurate documentation. We use the "doctest"
framework to ensure the samples are functioning as expected. After adding a code
sample, please ensure it is correct by running doctest. To run the samples
doctests for just a single method, refer to the following example:

```bash
pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut
```
28 changes: 28 additions & 0 deletions .gemini/tools/test_nox.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Testing with nox

Use `nox` to instrument our tests.

- To test your changes, run unit tests with `nox`:

```bash
nox -r -s unit
```

- To run a single unit test:

```bash
nox -r -s unit-3.14 -- -k <name of test>
```

- Ignore this step if you lack access to Google Cloud resources. To run system
tests, you can execute::

# Run all system tests
$ nox -r -s system

# Run a single system test
$ nox -r -s system-3.14 -- -k <name of test>

- The codebase must have better coverage than it had previously after each
change. You can test coverage via `nox -s unit system cover` (takes a long
time). Omit `system` if you lack access to cloud resources.
9 changes: 9 additions & 0 deletions .gemini/tools/test_pytest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Testing with pytest

Use `pytest` to instrument our tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run pytest, do we need to check if the current python environment is correct or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure AI can correctly do so. It still need human to approve bash command. For my case, I run gemini under venv, and it suites well.


- To test your changes, run `pytest`:

```bash
pytest <test_file>::<test_case>
```
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,6 @@ pylintrc
pylintrc.test
dummy.pkl
.mypy_cache/

# Gemini
GEMINI.md
147 changes: 2 additions & 145 deletions GEMINI.md
Original file line number Diff line number Diff line change
@@ -1,148 +1,5 @@
# Contribution guidelines, tailored for LLM agents

## Testing
@.gemini/common/docs.md

We use `nox` to instrument our tests.

- To test your changes, run unit tests with `nox`:

```bash
nox -r -s unit
```

- To run a single unit test:

```bash
nox -r -s unit-3.14 -- -k <name of test>
```

- Ignore this step if you lack access to Google Cloud resources. To run system
tests, you can execute::

# Run all system tests
$ nox -r -s system

# Run a single system test
$ nox -r -s system-3.14 -- -k <name of test>

- The codebase must have better coverage than it had previously after each
change. You can test coverage via `nox -s unit system cover` (takes a long
time). Omit `system` if you lack access to cloud resources.

## Code Style

- We use the automatic code formatter `black`. You can run it using
the nox session `format`. This will eliminate many lint errors. Run via:

```bash
nox -r -s format
```

- PEP8 compliance is required, with exceptions defined in the linter configuration.
If you have ``nox`` installed, you can test that you have not introduced
any non-compliant code via:

```
nox -r -s lint
```

- When writing tests, use the idiomatic "pytest" style.

## Documentation

If a method or property is implementing the same interface as a third-party
package such as pandas or scikit-learn, place the relevant docstring in the
corresponding `third_party/bigframes_vendored/package_name` directory, not in
the `bigframes` directory. Implementations may be placed in the `bigframes`
directory, though.

### Testing code samples

Code samples are very important for accurate documentation. We use the "doctest"
framework to ensure the samples are functioning as expected. After adding a code
sample, please ensure it is correct by running doctest. To run the samples
doctests for just a single method, refer to the following example:

```bash
pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut
```

## Tips for implementing common BigFrames features

### Adding a scalar operator

For an example, see commit
[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425).

To add a new scalar operator, follow these steps:

1. **Define the operation dataclass:**
- In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one.
- Create a new dataclass inheriting from `base_ops.UnaryOp` for unary
operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp`
for ternary operators, or `base_ops.NaryOp for operators with many
arguments. Note that these operators are counting the number column-like
arguments. A function that takes only a single column but several literal
values would still be a `UnaryOp`.
- Define the `name` of the operation and any parameters it requires.
- Implement the `output_type` method to specify the data type of the result.

2. **Export the new operation:**
- In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list.

3. **Implement the user-facing function (pandas-like):**

- Identify the canonical function from pandas / geopandas / awkward array /
other popular Python package that this operator implements.
- Find the corresponding class in BigFrames. For example, the implementation
for most geopandas.GeoSeries methods is in
`bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented
in `bigframes/series.py` or one of the accessors, such as `StringMethods`
in `bigframes/operations/strings.py`.
- Create the user-facing function that will be called by users (e.g., `length`).
- If the SQL method differs from pandas or geopandas in a way that can't be
made the same, raise a `NotImplementedError` with an appropriate message and
link to the feedback form.
- Add the docstring to the corresponding file in
`third_party/bigframes_vendored`, modeled after pandas / geopandas.

4. **Implement the user-facing function (SQL-like):**

- In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one.
- Create the user-facing function that will be called by users (e.g., `st_length`).
- This function should take a `Series` for any column-like inputs, plus any other parameters.
- Inside the function, call `series._apply_unary_op`,
`series._apply_binary_op`, or similar passing the operation dataclass you
created.
- Add a comprehensive docstring with examples.
- In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list.

5. **Implement the compilation logic:**
- In `bigframes/core/compile/scalar_op_compiler.py`:
- If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method.
- If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature.
- Create a new compiler implementation function (e.g., `geo_length_op_impl`).
- Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`.
- This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression.

6. **Add Tests:**
- Add system tests in the `tests/system/` directory to verify the end-to-end
functionality of the new operator. Test various inputs, including edge cases
and `NULL` values.

Where possible, run the same test code against pandas or GeoPandas and
compare that the outputs are the same (except for dtypes if BigFrames
differs from pandas).
- If you are overriding a pandas or GeoPandas property, add a unit test to
ensure the correct behavior (e.g., raising `NotImplementedError` if the
functionality is not supported).


## Constraints

- Only add git commits. Do not change git history.
- Follow the spec file for development.
- Check off items in the "Acceptance
criteria" and "Detailed steps" sections with `[x]`.
- Please do this as they are completed.
- Refer back to the spec after each step.
@.gemini/common/constraints.md
Loading