pygmt.select: Select data table subsets based on multiple spatial criteria

This issue serves as the central place for discussing and tracking the implementation of the `pygmt.select` function in PyGMT. The issue will be closed when the initial implementation is complete. Progress is tracked at https://github.com/orgs/GenericMappingTools/projects/3.

## Documentation

- GMT: https://docs.generic-mapping-tools.org/dev/gmtselect.html
- GMT.jl: https://www.generic-mapping-tools.org/GMTjl_doc/documentation/modules/gmtselect
- PyGMT: https://www.pygmt.org/dev/api/generated/pygmt.select.html

## GMT Option Flags and Modifiers

☑️: *Implemented*; ⬜: *To be implemented/discussed*; ~~Strikethrough~~: *Won't implement*.

- ☑️ `-A` (`area_thresh`): Threshold for excluding small features based on area; skip polygons or coastline features smaller than this threshold.
- ☑️ `-C` (`dist2pt`): *pointfile*|*lon*/*lat***+d** *dist*. Pass all records within *dist* of any point in *pointfile* (or a single lon/lat point).
- ☑️ `-D` (`resolution`): `"full"`, `"high"`, `"intermediate"`, `"low"`, `"crude"`, or `"auto"`. Coastline resolution used with `mask_values`.
- ☑️ `-F` (`polygon`): Pass all records whose locations are inside one of the closed polygons in *polygonfile*.
- ☑️ `-G` (`mask_grid`): Pass all records that fall inside valid (non-NaN, non-zero) nodes of a grid mask.
- ☑️ `-I` (`reverse`): [**cflrsz**]. Reverse the sense of the test for one or more of the spatial criteria.
- ☑️ `-J` (`projection`): Map projection used when computing Cartesian distances from geographic coordinates.
- ☑️ `-L` (`dist2line`): *linefile***+d** *dist*[**+p**]. Pass all records within *dist* of any line segment in *linefile*.
- ☑️ `-N` (`mask_values`): *wet/dry* or *ocean/land/lake/island/pond*. Pass records based on whether they fall on land, ocean, or other geographic features.
- ☑️ `-R` (`region`): Rectangular region filter; pass only records inside the specified bounding box.
- ☑️ `-V` (`verbose`): Verbosity level.
- ~~`-X`/`-Y`~~: Use `Figure.shift_origin` instead.
- ☑️ `-Z` (`z_subregion`): *min*[/*max*][**+a**][**+c** *col*][**+i**]. Pass records whose *z* (or other column) value lies within the given range.
- ☑️ `-b` (`binary`): Binary input/output.
- ☑️ `-d` (`nodata`): Replace NaN with a specified nodata value on input/output.
- ☑️ `-e` (`find`): Pattern matching to select input rows.
- ☑️ `-f` (`coltypes`): Column data types.
- ☑️ `-g` (`gap`): Gap detection.
- ☑️ `-h` (`header`): Read/write header records.
- ☑️ `-i` (`incols`): Select input columns.
- ☑️ `-o` (`outcols`): Select output columns.
- ⬜ `-q`: Select rows by row number or range.
- ☑️ `-s` (`skiprows`): Skip rows containing NaN values.
- ☑️ `-w` (`wrap`): Wrap repeated cycles.
- ~~`--PAR=value`~~: Use `pygmt.config` instead.

## Notes on Input Formats

- `data`: Accepts a file path, 2-D `numpy.ndarray`, or `pandas.DataFrame` with (x, y) in the first two columns.
- `output_type`: `"pandas"` (default), `"numpy"`, or `"file"`. Use `"file"` together with `outfile`.
- Up to 7 spatial criteria can be combined simultaneously; all criteria must pass by default (logical AND). Use `reverse` to invert individual tests.
- `mask_values` and `resolution` are only meaningful when testing against coastline features (criteria 5).
- The deprecated parameters `mask` and `gridmask` (replaced by `mask_values` and `mask_grid` in v0.18.0) will be removed in v0.20.0.

## Linked Pull Requests

- [x] Initial feature implementation – #1429
- [x] Add `mask_grid` (`-G`) parameter – #1429
- [x] Add `z_subregion` (`-Z`) parameter – #2123
- [x] Add inline docstring example – #2085
- [x] Rename `mask` → `mask_values` and `gridmask` → `mask_grid` (deprecation) – #3986
- [ ] Implement `-q` (row-number selection) option
- [ ] Remove deprecated `mask` and `gridmask` parameters in v0.20.0
- [ ] Add a gallery or tutorial example (e.g., extracting ship observations that lie within a country's exclusive economic zone polygon)

## Related Issues and Discussions

- `pygmt.select` provides spatial subsetting at the data-table level; for subsetting a grid, use `pygmt.grdcut` instead.
- Combining `polygon` and `reverse="f"` efficiently excludes points that fall inside a known contaminated region (e.g., land stations in a marine dataset).
- The `z_subregion` (`-Z`) parameter supports multiple column tests when passed as a list, enabling multi-column range filtering in a single call.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pygmt.select: Select data table subsets based on multiple spatial criteria #4564

Documentation

GMT Option Flags and Modifiers

Notes on Input Formats

Linked Pull Requests

Related Issues and Discussions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pygmt.select: Select data table subsets based on multiple spatial criteria #4564

Description

Documentation

GMT Option Flags and Modifiers

Notes on Input Formats

Linked Pull Requests

Related Issues and Discussions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions