Skip to content

pygmt.select: Select data table subsets based on multiple spatial criteria #4564

@seisman

Description

@seisman

This issue serves as the central place for discussing and tracking the implementation of the pygmt.select function in PyGMT. The issue will be closed when the initial implementation is complete. Progress is tracked at PyGMT: Wrapping GMT modules.

Documentation

GMT Option Flags and Modifiers

☑️: Implemented; ⬜: To be implemented/discussed; Strikethrough: Won't implement.

  • ☑️ -A (area_thresh): Threshold for excluding small features based on area; skip polygons or coastline features smaller than this threshold.
  • ☑️ -C (dist2pt): pointfile|lon/lat**+d** dist. Pass all records within dist of any point in pointfile (or a single lon/lat point).
  • ☑️ -D (resolution): "full", "high", "intermediate", "low", "crude", or "auto". Coastline resolution used with mask_values.
  • ☑️ -F (polygon): Pass all records whose locations are inside one of the closed polygons in polygonfile.
  • ☑️ -G (mask_grid): Pass all records that fall inside valid (non-NaN, non-zero) nodes of a grid mask.
  • ☑️ -I (reverse): [cflrsz]. Reverse the sense of the test for one or more of the spatial criteria.
  • ☑️ -J (projection): Map projection used when computing Cartesian distances from geographic coordinates.
  • ☑️ -L (dist2line): linefile**+d** dist[+p]. Pass all records within dist of any line segment in linefile.
  • ☑️ -N (mask_values): wet/dry or ocean/land/lake/island/pond. Pass records based on whether they fall on land, ocean, or other geographic features.
  • ☑️ -R (region): Rectangular region filter; pass only records inside the specified bounding box.
  • ☑️ -V (verbose): Verbosity level.
  • -X/-Y: Use Figure.shift_origin instead.
  • ☑️ -Z (z_subregion): min[/max][+a][+c col][+i]. Pass records whose z (or other column) value lies within the given range.
  • ☑️ -b (binary): Binary input/output.
  • ☑️ -d (nodata): Replace NaN with a specified nodata value on input/output.
  • ☑️ -e (find): Pattern matching to select input rows.
  • ☑️ -f (coltypes): Column data types.
  • ☑️ -g (gap): Gap detection.
  • ☑️ -h (header): Read/write header records.
  • ☑️ -i (incols): Select input columns.
  • ☑️ -o (outcols): Select output columns.
  • -q: Select rows by row number or range.
  • ☑️ -s (skiprows): Skip rows containing NaN values.
  • ☑️ -w (wrap): Wrap repeated cycles.
  • --PAR=value: Use pygmt.config instead.

Notes on Input Formats

  • data: Accepts a file path, 2-D numpy.ndarray, or pandas.DataFrame with (x, y) in the first two columns.
  • output_type: "pandas" (default), "numpy", or "file". Use "file" together with outfile.
  • Up to 7 spatial criteria can be combined simultaneously; all criteria must pass by default (logical AND). Use reverse to invert individual tests.
  • mask_values and resolution are only meaningful when testing against coastline features (criteria 5).
  • The deprecated parameters mask and gridmask (replaced by mask_values and mask_grid in v0.18.0) will be removed in v0.20.0.

Linked Pull Requests

Related Issues and Discussions

  • pygmt.select provides spatial subsetting at the data-table level; for subsetting a grid, use pygmt.grdcut instead.
  • Combining polygon and reverse="f" efficiently excludes points that fall inside a known contaminated region (e.g., land stations in a marine dataset).
  • The z_subregion (-Z) parameter supports multiple column tests when passed as a list, enabling multi-column range filtering in a single call.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions