feat: add graphile-pg-trgm-plugin for pg_trgm fuzzy text matching#809
Open
pyramation wants to merge 30 commits intomainfrom
Open
feat: add graphile-pg-trgm-plugin for pg_trgm fuzzy text matching#809pyramation wants to merge 30 commits intomainfrom
pyramation wants to merge 30 commits intomainfrom
Conversation
Implements a from-scratch PostGraphile v5 native connection filter plugin, replacing the upstream postgraphile-plugin-connection-filter dependency. New package: graphile/graphile-connection-filter/ Plugin architecture (7 plugins): - ConnectionFilterInflectionPlugin: filter type naming conventions - ConnectionFilterTypesPlugin: registers per-table and per-scalar filter types - ConnectionFilterArgPlugin: injects filter arg on connections via applyPlan - ConnectionFilterAttributesPlugin: adds per-column filter fields - ConnectionFilterOperatorsPlugin: standard/sort/pattern/jsonb/inet/array/range operators - ConnectionFilterCustomOperatorsPlugin: addConnectionFilterOperator API for satellite plugins - ConnectionFilterLogicalOperatorsPlugin: and/or/not logical composition Key features: - Full v5 native: uses Grafast planning, PgCondition, codec system, behavior registry - EXPORTABLE pattern for schema caching - Preserves addConnectionFilterOperator API for PostGIS, search, pgvector, textsearch plugins - No relation filter plugins (simplifies configuration vs upstream) - Preset factory: ConnectionFilterPreset(options) Also updates graphile-settings to use the new workspace package.
…Operator filterType is for table-level filter types (UserFilter), while filterFieldType is for scalar operator types (StringFilter). Satellite plugins pass scalar type names, so the lookup must use filterFieldType to match the registration in ConnectionFilterTypesPlugin. Previously worked by coincidence since both inflections produce the same output, but would silently fail if a consumer overrode one inflection but not the other.
Adds computed column filter support — allows filtering on PostgreSQL functions that take a table row as their first argument and return a scalar. Controlled by connectionFilterComputedColumns schema option. The preset factory includes the plugin only when the option is truthy (default in preset: true, but constructive-preset sets it to false).
- Remove phantom postgraphile-plugin-connection-filter dep from graphile-pgvector-plugin (never used) - Remove phantom postgraphile-plugin-connection-filter dep from graphile-pg-textsearch-plugin (never used) - Update graphile-plugin-connection-filter-postgis to use graphile-connection-filter workspace dep with typed imports - Update graphile-search-plugin to use graphile-connection-filter workspace dep with typed imports - Replace (build as any).addConnectionFilterOperator casts with properly typed build.addConnectionFilterOperator
…on-filter - Update search plugin, pgvector, and postgis test files to import from graphile-connection-filter instead of postgraphile-plugin-connection-filter - Use ConnectionFilterPreset() factory instead of PostGraphileConnectionFilterPreset - Import ConnectionFilterOperatorSpec type from graphile-connection-filter - Fix smart quote characters in filter descriptions to match existing snapshots
…ion filter tests - Add graphile-connection-filter as devDependency in graphile-pgvector-plugin (test file imports ConnectionFilterPreset but package had no dependency) - Skip connectionFilterRelations tests in search plugin (relation filters are intentionally not included in the v5-native plugin; they were disabled in production via disablePlugins with the old plugin)
…toggle - ConnectionFilterForwardRelationsPlugin: filter by FK parent relations - ConnectionFilterBackwardRelationsPlugin: filter by backward relations (one-to-one + one-to-many with some/every/none) - connectionFilterRelations toggle in preset (default: false) - Un-skip relation filter tests in search plugin - Updated augmentations, types, and exports
… at runtime The preset factory now always includes relation plugins in the plugin list. Each plugin checks build.options.connectionFilterRelations at runtime and early-returns if disabled. This allows the toggle to be set by any preset in the chain, not just the ConnectionFilterPreset() call.
Enables relation filter fields in the production schema: - Forward: filter by FK parent (e.g. clientByClientId on OrderFilter) - Backward: filter by children with some/every/none - Codegen will pick up the new filter fields automatically
- Search plugin: isPgCondition → isPgConnectionFilter scope
- BM25 plugin: isPgCondition → isPgConnectionFilter scope
- Disable PgConditionArgumentPlugin and PgConditionCustomFieldsPlugin in preset
- Update all tests from condition: {...} to filter: {...}
- Add graphile-connection-filter devDependency to BM25 plugin
- Update search plugin graceful degradation tests to use filter
BREAKING CHANGE: The condition argument has been removed entirely.
All filtering now uses the filter argument exclusively.
- Search plugin plugin.test.ts: condition → filter syntax, add ConnectionFilterPreset - Server-test: condition → filter in query with equalTo operator - Clear stale snapshots (schema-snapshot, introspection) for regeneration
- Search plugin: update snapshot keys to match renamed filter-based tests - Schema snapshot: remove all condition arguments and XxxCondition input types - Introspection snapshot: remove condition arg and UserCondition type - Kept conditionType in _meta schema (unrelated to deprecated condition arg)
… behavior for pgCodecRelation, update schema snapshot with relation filter types
… types to schema snapshot
…, and proper type ordering
…y filter at applyPlan level
Top-level empty filter {} is now treated as 'no filter' (skipped) instead of
throwing an error. Nested empty objects in and/or/not and relation filters are
still rejected. This removes the need for the connectionFilterAllowEmptyObjectInput
workaround in pgvector tests.
- Extract shared getQueryBuilder utility into graphile-connection-filter/src/utils.ts - Remove duplicate getQueryBuilder from search, BM25, and pgvector plugins - Replace (build as any).dataplanPg with build.dataplanPg (already typed on Build) - Replace (build as any).behavior with build.behavior (already typed on Build) - Replace (build as any).input.pgRegistry with build.input.pgRegistry (already typed) - Remove scope destructuring as any casts (pgCodec already typed on ScopeInputObject) - Add pgCodec comment to augmentations.ts noting it's already declared by graphile-build-pg - Export getQueryBuilder from graphile-connection-filter for satellite plugin use
Adds index safety check for relation filter fields. When enabled (default: true), relation filter fields are only created for FKs with supporting indexes. This prevents generating EXISTS subqueries that would cause sequential scans on large tables. Uses PgIndexBehaviorsPlugin's existing relation.extensions.isIndexed metadata which is set at gather time. The check runs at schema build time with zero runtime cost. Applied to both forward and backward relation filter plugins.
Comprehensive test coverage using graphile-test infrastructure: - Scalar operators: equalTo, notEqualTo, distinctFrom, isNull, in/notIn, lessThan, greaterThan, like, iLike, includes, startsWith, endsWith - Logical operators: and, or, not, nested combinations - Relation filters: forward (child->parent), backward one-to-one, backward one-to-many (some/every/none), exists fields - Computed column filters - Schema introspection: filter types, operator fields, relation fields - Options toggles: connectionFilterRelations, connectionFilterComputedColumns, connectionFilterLogicalOperators, connectionFilterAllowedOperators, connectionFilterOperatorNames Also adds graphile/graphile-connection-filter to CI matrix (41 jobs).
Exercises multiple plugins working together in a single test database: - Connection filter (scalar operators, logical operators, relation filters) - PostGIS spatial filters (geometry column) - pgvector (vector column, search function, distance ordering) - tsvector search plugin (fullText matches, rank, orderBy) - BM25 search (pg_textsearch body index, score, orderBy) - Kitchen sink queries combining multiple plugins 34 test cases across 8 describe blocks, all passing locally. Added postgres-plus CI job for tests requiring PostGIS/pgvector/pg_textsearch.
…tion filter + scalar in one query
… test The mega query now exercises all SIX plugin types in a single filter: - tsvector (fullTextTsv) - BM25 (bm25Body) - relation filter (category name) - scalar filter (isActive) - pgvector (vectorEmbedding nearby) - PostGIS (geom intersects polygon bbox) Also validates returned coordinates fall within the bounding box.
New package: graphile-pg-trgm-plugin — a PostGraphile v5 plugin for pg_trgm trigram-based fuzzy text matching. Zero config, works on any text column. Features: - similarTo / wordSimilarTo filter operators on StringFilter - trgm<Column> direct filter fields on connection filter types - <column>Similarity computed score fields (0-1, null when inactive) - SIMILARITY_<COLUMN>_ASC/DESC orderBy enum values - TrgmSearchPreset for easy composition into presets - connectionFilterTrgmRequireIndex option (default: false) - 14 dedicated tests + integrated into mega query as 7th plugin type Mega query now exercises ALL 7 plugin types in one GraphQL query: tsvector + BM25 + pgvector + PostGIS + pg_trgm + relation filter + scalar
Contributor
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Updated introspection and SDL snapshots to include new fields from TrgmSearchPlugin: similarTo/wordSimilarTo operators on StringFilter, *Similarity computed fields, trgm* filter fields, and SIMILARITY_* orderBy enum values.
- orderBy: [BM25_BODY_SCORE_ASC, SIMILARITY_NAME_DESC] demonstrates multi-signal relevance ranking in a single query - Added comprehensive JSDoc explaining all 7 plugin types, the 2-phase meta system, and ORDER BY priority semantics - Inline GraphQL comments explain each filter and score field - Assertion verifies BM25 ASC ordering (primary sort) - Documents important subtlety: ORDER BY priority follows schema field processing order, not the orderBy array order
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds
graphile-pg-trgm-plugin, a new PostGraphile v5 plugin that provides pg_trgm trigram-based fuzzy text matching capabilities to the connection filter system. It's the 7th plugin type integrated into theConstructivePreset, joining tsvector, BM25, pgvector, PostGIS, relation filters, and scalar filters.New Package:
graphile-pg-trgm-pluginsimilarTo/wordSimilarTooperators onStringFilterfor fuzzy matching with configurable thresholdstrgm<Column>filter fields directly on connection filter types (e.g.,trgmName,trgmDescription)<column>Similaritycomputed fields returning match quality scores (0-1 range, null when no trgm filter active)SIMILARITY_<COLUMN>_ASC/DESCorderBy enum values for ranking by match qualityconnectionFilterTrgmRequireIndexoption (default: false) — optionally restrict trgm operators to GIN-indexed columns onlyIntegration:
ConstructivePresetviaTrgmSearchPreset()pg_trgmextension + GIN trigram index to integration test seed SQLtrgmNamefilter as 7th plugin type — now exercises all 7 filter plugin types in a single GraphQL queryPlugin Architecture:
graphile-pg-textsearch-plugin(BM25)setMeta/getMeta) to pass similarity score indices between filter apply and computed field plan phasesaddConnectionFilterOperatorAPI fromgraphile-connection-filterUpdates since last revision
orderBy: [BM25_BODY_SCORE_ASC, SIMILARITY_NAME_DESC]to demonstrate combining multiple scoring signals in a single query, with an assertion verifying BM25 ASC ordering.applyruns first), not the orderBy array order. The array determines which signals are active and their direction (ASC/DESC), but the SQL clause sequence depends on which filter plugin's apply function executes first during schema evaluation. This is inherent to the 2-phase meta architecture and is thoroughly documented in the test comments.graphql/testandgraphql/server-test— the trgm plugin addssimilarTo/wordSimilarToto everyStringFilter,*Similaritycomputed fields to every type with text columns, andSIMILARITY_*orderBy enums.Review & Testing Checklist for Human
graphile-pg-trgm-pluginpackage and its integration into the mega query.Schema bloat from auto-discovery: The plugin adds
similarTo/wordSimilarTotoStringFilter(global) and<col>Similaritycomputed fields +trgm<Col>filter fields +SIMILARITY_*orderBy enums to every table with text columns. Review whether this level of auto-discovery is appropriate or if it should be gated by a behavior flag (e.g.,attributeTrgmSimilarity). The updated snapshots show the full extent of schema additions.Multi-signal orderBy semantics: The mega query demonstrates combining BM25 + pg_trgm orderBy signals, but note the documented subtlety: the orderBy array order does NOT determine SQL ORDER BY priority. Instead, priority follows the schema's internal field processing order (which filter apply runs first). Review whether this behavior is acceptable or if the 2-phase meta system should be redesigned to respect array order. Test by running queries with different orderBy array orders and verifying SQL generation.
Meta system correctness: Verify
trgm-search.tscorrectly usessetMeta/getMetato pass similarity score indices. Check that concurrent queries with multiple trgm filters don't collide (e.g., filtering on bothnameanddescriptionin the same query).Performance impact: The default
connectionFilterTrgmRequireIndex: falsemeans any text column can be fuzzy-matched without a GIN index. On large tables, this could be slow. Review whether the default should betruefor production.Recommended Test Plan:
Notes