-
Notifications
You must be signed in to change notification settings - Fork 68
chore: modularize GEMINI.md file #2479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| ## Constraints | ||
|
|
||
| - Only add git commits. Do not change git history. | ||
| - Follow the spec file for development. | ||
| - Check off items in the "Acceptance | ||
| criteria" and "Detailed steps" sections with `[x]`. | ||
| - Please do this as they are completed. | ||
| - Refer back to the spec after each step. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| ## Documentation | ||
|
|
||
| If a method or property is implementing the same interface as a third-party | ||
| package such as pandas or scikit-learn, place the relevant docstring in the | ||
| corresponding `third_party/bigframes_vendored/package_name` directory, not in | ||
| the `bigframes` directory. Implementations may be placed in the `bigframes` | ||
| directory, though. | ||
|
|
||
| @../tools/test_docs.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| ## Adding a scalar operator | ||
|
|
||
| For an example, see commit | ||
| [c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425). | ||
|
|
||
| To add a new scalar operator, follow these steps: | ||
|
|
||
| 1. **Define the operation dataclass:** | ||
| - In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one. | ||
| - Create a new dataclass inheriting from `base_ops.UnaryOp` for unary | ||
| operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp` | ||
| for ternary operators, or `base_ops.NaryOp for operators with many | ||
| arguments. Note that these operators are counting the number column-like | ||
| arguments. A function that takes only a single column but several literal | ||
| values would still be a `UnaryOp`. | ||
| - Define the `name` of the operation and any parameters it requires. | ||
| - Implement the `output_type` method to specify the data type of the result. | ||
|
|
||
| 2. **Export the new operation:** | ||
| - In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list. | ||
|
|
||
| 3. **Implement the user-facing function (pandas-like):** | ||
|
|
||
| - Identify the canonical function from pandas / geopandas / awkward array / | ||
| other popular Python package that this operator implements. | ||
| - Find the corresponding class in BigFrames. For example, the implementation | ||
| for most geopandas.GeoSeries methods is in | ||
| `bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented | ||
| in `bigframes/series.py` or one of the accessors, such as `StringMethods` | ||
| in `bigframes/operations/strings.py`. | ||
| - Create the user-facing function that will be called by users (e.g., `length`). | ||
| - If the SQL method differs from pandas or geopandas in a way that can't be | ||
| made the same, raise a `NotImplementedError` with an appropriate message and | ||
| link to the feedback form. | ||
| - Add the docstring to the corresponding file in | ||
| `third_party/bigframes_vendored`, modeled after pandas / geopandas. | ||
|
|
||
| 4. **Implement the user-facing function (SQL-like):** | ||
|
|
||
| - In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one. | ||
| - Create the user-facing function that will be called by users (e.g., `st_length`). | ||
| - This function should take a `Series` for any column-like inputs, plus any other parameters. | ||
| - Inside the function, call `series._apply_unary_op`, | ||
| `series._apply_binary_op`, or similar passing the operation dataclass you | ||
| created. | ||
| - Add a comprehensive docstring with examples. | ||
| - In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list. | ||
|
|
||
| 5. **Implement the compilation logic:** | ||
| - In `bigframes/core/compile/scalar_op_compiler.py`: | ||
| - If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method. | ||
| - If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature. | ||
| - Create a new compiler implementation function (e.g., `geo_length_op_impl`). | ||
| - Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`. | ||
| - This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression. | ||
|
|
||
| 6. **Add Tests:** | ||
| - Add system tests in the `tests/system/` directory to verify the end-to-end | ||
| functionality of the new operator. Test various inputs, including edge cases | ||
| and `NULL` values. | ||
|
|
||
| Where possible, run the same test code against pandas or GeoPandas and | ||
| compare that the outputs are the same (except for dtypes if BigFrames | ||
| differs from pandas). | ||
| - If you are overriding a pandas or GeoPandas property, add a unit test to | ||
| ensure the correct behavior (e.g., raising `NotImplementedError` if the | ||
| functionality is not supported). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| ## Code Style with nox | ||
|
|
||
| - We use the automatic code formatter `black`. You can run it using | ||
| the nox session `format`. This will eliminate many lint errors. Run via: | ||
|
|
||
| ```bash | ||
| nox -r -s format | ||
| ``` | ||
|
|
||
| - PEP8 compliance is required, with exceptions defined in the linter configuration. | ||
| If you have ``nox`` installed, you can test that you have not introduced | ||
| any non-compliant code via: | ||
|
|
||
| ``` | ||
| nox -r -s lint | ||
| ``` | ||
|
|
||
| - When writing tests, use the idiomatic "pytest" style. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| ## Testing code samples | ||
|
|
||
| Code samples are very important for accurate documentation. We use the "doctest" | ||
| framework to ensure the samples are functioning as expected. After adding a code | ||
| sample, please ensure it is correct by running doctest. To run the samples | ||
| doctests for just a single method, refer to the following example: | ||
|
|
||
| ```bash | ||
| pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| ## Testing with nox | ||
|
|
||
| Use `nox` to instrument our tests. | ||
|
|
||
| - To test your changes, run unit tests with `nox`: | ||
|
|
||
| ```bash | ||
| nox -r -s unit | ||
| ``` | ||
|
|
||
| - To run a single unit test: | ||
|
|
||
| ```bash | ||
| nox -r -s unit-3.14 -- -k <name of test> | ||
| ``` | ||
|
|
||
| - Ignore this step if you lack access to Google Cloud resources. To run system | ||
| tests, you can execute:: | ||
|
|
||
| # Run all system tests | ||
| $ nox -r -s system | ||
|
|
||
| # Run a single system test | ||
| $ nox -r -s system-3.14 -- -k <name of test> | ||
|
|
||
| - The codebase must have better coverage than it had previously after each | ||
| change. You can test coverage via `nox -s unit system cover` (takes a long | ||
| time). Omit `system` if you lack access to cloud resources. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| ## Testing with pytest | ||
|
|
||
| Use `pytest` to instrument our tests. | ||
|
|
||
| - To test your changes, run `pytest`: | ||
|
|
||
| ```bash | ||
| pytest <test_file>::<test_case> | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -65,3 +65,6 @@ pylintrc | |
| pylintrc.test | ||
| dummy.pkl | ||
| .mypy_cache/ | ||
|
|
||
| # Gemini | ||
| GEMINI.md | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,148 +1,5 @@ | ||
| # Contribution guidelines, tailored for LLM agents | ||
|
|
||
| ## Testing | ||
| @.gemini/common/docs.md | ||
|
|
||
| We use `nox` to instrument our tests. | ||
|
|
||
| - To test your changes, run unit tests with `nox`: | ||
|
|
||
| ```bash | ||
| nox -r -s unit | ||
| ``` | ||
|
|
||
| - To run a single unit test: | ||
|
|
||
| ```bash | ||
| nox -r -s unit-3.14 -- -k <name of test> | ||
| ``` | ||
|
|
||
| - Ignore this step if you lack access to Google Cloud resources. To run system | ||
| tests, you can execute:: | ||
|
|
||
| # Run all system tests | ||
| $ nox -r -s system | ||
|
|
||
| # Run a single system test | ||
| $ nox -r -s system-3.14 -- -k <name of test> | ||
|
|
||
| - The codebase must have better coverage than it had previously after each | ||
| change. You can test coverage via `nox -s unit system cover` (takes a long | ||
| time). Omit `system` if you lack access to cloud resources. | ||
|
|
||
| ## Code Style | ||
|
|
||
| - We use the automatic code formatter `black`. You can run it using | ||
| the nox session `format`. This will eliminate many lint errors. Run via: | ||
|
|
||
| ```bash | ||
| nox -r -s format | ||
| ``` | ||
|
|
||
| - PEP8 compliance is required, with exceptions defined in the linter configuration. | ||
| If you have ``nox`` installed, you can test that you have not introduced | ||
| any non-compliant code via: | ||
|
|
||
| ``` | ||
| nox -r -s lint | ||
| ``` | ||
|
|
||
| - When writing tests, use the idiomatic "pytest" style. | ||
|
|
||
| ## Documentation | ||
|
|
||
| If a method or property is implementing the same interface as a third-party | ||
| package such as pandas or scikit-learn, place the relevant docstring in the | ||
| corresponding `third_party/bigframes_vendored/package_name` directory, not in | ||
| the `bigframes` directory. Implementations may be placed in the `bigframes` | ||
| directory, though. | ||
|
|
||
| ### Testing code samples | ||
|
|
||
| Code samples are very important for accurate documentation. We use the "doctest" | ||
| framework to ensure the samples are functioning as expected. After adding a code | ||
| sample, please ensure it is correct by running doctest. To run the samples | ||
| doctests for just a single method, refer to the following example: | ||
|
|
||
| ```bash | ||
| pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut | ||
| ``` | ||
|
|
||
| ## Tips for implementing common BigFrames features | ||
|
|
||
| ### Adding a scalar operator | ||
|
|
||
| For an example, see commit | ||
| [c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425). | ||
|
|
||
| To add a new scalar operator, follow these steps: | ||
|
|
||
| 1. **Define the operation dataclass:** | ||
| - In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one. | ||
| - Create a new dataclass inheriting from `base_ops.UnaryOp` for unary | ||
| operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp` | ||
| for ternary operators, or `base_ops.NaryOp for operators with many | ||
| arguments. Note that these operators are counting the number column-like | ||
| arguments. A function that takes only a single column but several literal | ||
| values would still be a `UnaryOp`. | ||
| - Define the `name` of the operation and any parameters it requires. | ||
| - Implement the `output_type` method to specify the data type of the result. | ||
|
|
||
| 2. **Export the new operation:** | ||
| - In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list. | ||
|
|
||
| 3. **Implement the user-facing function (pandas-like):** | ||
|
|
||
| - Identify the canonical function from pandas / geopandas / awkward array / | ||
| other popular Python package that this operator implements. | ||
| - Find the corresponding class in BigFrames. For example, the implementation | ||
| for most geopandas.GeoSeries methods is in | ||
| `bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented | ||
| in `bigframes/series.py` or one of the accessors, such as `StringMethods` | ||
| in `bigframes/operations/strings.py`. | ||
| - Create the user-facing function that will be called by users (e.g., `length`). | ||
| - If the SQL method differs from pandas or geopandas in a way that can't be | ||
| made the same, raise a `NotImplementedError` with an appropriate message and | ||
| link to the feedback form. | ||
| - Add the docstring to the corresponding file in | ||
| `third_party/bigframes_vendored`, modeled after pandas / geopandas. | ||
|
|
||
| 4. **Implement the user-facing function (SQL-like):** | ||
|
|
||
| - In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one. | ||
| - Create the user-facing function that will be called by users (e.g., `st_length`). | ||
| - This function should take a `Series` for any column-like inputs, plus any other parameters. | ||
| - Inside the function, call `series._apply_unary_op`, | ||
| `series._apply_binary_op`, or similar passing the operation dataclass you | ||
| created. | ||
| - Add a comprehensive docstring with examples. | ||
| - In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list. | ||
|
|
||
| 5. **Implement the compilation logic:** | ||
| - In `bigframes/core/compile/scalar_op_compiler.py`: | ||
| - If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method. | ||
| - If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature. | ||
| - Create a new compiler implementation function (e.g., `geo_length_op_impl`). | ||
| - Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`. | ||
| - This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression. | ||
|
|
||
| 6. **Add Tests:** | ||
| - Add system tests in the `tests/system/` directory to verify the end-to-end | ||
| functionality of the new operator. Test various inputs, including edge cases | ||
| and `NULL` values. | ||
|
|
||
| Where possible, run the same test code against pandas or GeoPandas and | ||
| compare that the outputs are the same (except for dtypes if BigFrames | ||
| differs from pandas). | ||
| - If you are overriding a pandas or GeoPandas property, add a unit test to | ||
| ensure the correct behavior (e.g., raising `NotImplementedError` if the | ||
| functionality is not supported). | ||
|
|
||
|
|
||
| ## Constraints | ||
|
|
||
| - Only add git commits. Do not change git history. | ||
| - Follow the spec file for development. | ||
| - Check off items in the "Acceptance | ||
| criteria" and "Detailed steps" sections with `[x]`. | ||
| - Please do this as they are completed. | ||
| - Refer back to the spec after each step. | ||
| @.gemini/common/constraints.md |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run
pytest, do we need to check if the current python environment is correct or not?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure AI can correctly do so. It still need human to approve bash command. For my case, I run gemini under venv, and it suites well.