Skip to content

feat: add generic search client traits and adapters#994

Open
bitner wants to merge 1 commit intomainfrom
searchstream
Open

feat: add generic search client traits and adapters#994
bitner wants to merge 1 commit intomainfrom
searchstream

Conversation

@bitner
Copy link
Collaborator

@bitner bitner commented Mar 18, 2026

Description

Introduce a family of search client traits with blanket implementations and adapter utilities, replacing ad-hoc per-backend boilerplate with a consistent, extensible design.

New traits (stac::api)

Trait Purpose Required method
ItemsClient Single-page item search search
StreamItemsClient Stream items across all pages search_stream
CollectionsClient Fetch all collections at once collections
PagedCollectionsClient Cursor-paginated collections (future-proofing) collections_page
StreamCollectionsClient Stream all collections collections_stream
ArrowItemsClient (geoarrow) Arrow record batch output search_to_arrow
TransactionClient Write items and collections add_item, add_collection

Blanket implementations

  • CollectionsClient + Clone + Sync → StreamCollectionsClient — eagerly fetches all collections and yields as a stream; no wrapper struct needed
  • ArrowItemsClient + Sync → ItemsClient + StreamItemsClient (geoarrow feature) — collects record batches synchronously and returns owned items

Adapter utilities

  • PagedItemsStream\<T> — wraps any ItemsClient to provide StreamItemsClient via token/skip pagination (ItemCollection::next)
  • stream_pages_generic — free function driving the items pagination loop; used by PagedItemsStream and all server backends
  • stream_pages_collections_generic — collections equivalent for PagedCollectionsClient backends (ready for future paginated /collections support)
  • RecordBatchReaderAdapter\<I> (geoarrow) — bridges Iterator<Item = Result<RecordBatch, E>> to arrow_array::RecordBatchReader

Backend implementations

All three server backends (memory, duckdb, pgstac) implement the full trait family. stac-duckdb provides HrefClient (ArrowItemsClient) and SyncHrefClient (ItemsClient + CollectionsClient + StreamItemsClient via Mutex). stac-io's Client implements StreamItemsClient with HATEOAS link-following rather than token pagination.

Design notes

  • The ItemsClient + Clone → StreamItemsClient blanket cannot be added because it would overlap with the ArrowItemsClient blanket under Rust's coherence rules. Server backends use stream_pages_generic directly in their explicit StreamItemsClient impls.
  • PagedCollectionsClient has no blanket StreamCollectionsClient for the same reason (would overlap with the CollectionsClient blanket). Paginated backends call stream_pages_collections_generic in their own impl.

Related issues

  • Groundwork for federated search / streaming backends

Checklist

  • Unit tests
  • Documentation, including doctests
  • Pull request title follows conventional commits
  • Pre-commit hooks pass (prek run --all-files)

@bitner bitner requested a review from gadomski as a code owner March 18, 2026 19:33
Introduce a family of search client traits with blanket implementations
and adapter utilities to reduce boilerplate across backends.

New traits
----------
- ItemsClient: single-page item search (search required); defaults: item, items
- StreamItemsClient: streaming items across pages (search_stream required);
  defaults: collect_items, item_count, items_stream
- CollectionsClient: fetch all collections (collections required);
  default: collection point-lookup
- PagedCollectionsClient: cursor-paginated collections (collections_page required)
  for future backends that support paginated /collections
- StreamCollectionsClient: streaming collections (collections_stream required);
  default: collect_collections
- ArrowItemsClient (geoarrow feature): Arrow record batch output
  (search_to_arrow required); default: items_to_arrow
- TransactionClient: write operations

Blanket implementations
-----------------------
- CollectionsClient + Clone + Sync -> StreamCollectionsClient (eager fetch)
- ArrowItemsClient + Sync -> ItemsClient + StreamItemsClient (geoarrow feature)

Adapter utilities
-----------------
- PagedItemsStream<T>: wraps ItemsClient to provide StreamItemsClient via
  token/skip pagination using ItemCollection::next
- stream_pages_generic: free function driving the pagination loop
- stream_pages_collections_generic: collections equivalent for
  PagedCollectionsClient backends
- RecordBatchReaderAdapter<I> (geoarrow feature): bridges any
  Iterator<Item = Result<RecordBatch, E>> to arrow_array::RecordBatchReader

Documentation
-------------
Add docs/search-clients.md covering the trait family, blanket impls, adapter
conversion chart, pagination mechanics, and performance notes.
Comment on lines +49 to +60
#[cfg(feature = "geoarrow")]
#[derive(Debug, thiserror::Error)]
#[non_exhaustive]
pub enum AdapterError<E: std::error::Error + Send> {
/// Error from the wrapped client.
#[error(transparent)]
Client(E),

/// Error from STAC core (e.g. Arrow decode failure).
#[error(transparent)]
Stac(#[from] crate::Error),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to only have one error enum per crate, so this should probably go up into crates/core/src/error.rs.

Comment on lines +62 to +63
// ---------------------------------------------------------------------------
// PagedItemsStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot-code leftover? If we can nuke any of these // out-of-function comments in this PR, I'd be grateful 🙇🏼

Suggested change
// ---------------------------------------------------------------------------
// PagedItemsStream

/// is `Some`, its entries are merged into `additional_fields` of the next
/// search request (overwriting any previous pagination values). The stream ends
/// when the page is empty or `next` is `None`.
pub fn stream_pages_generic<T>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the _generic suffix, that's implied by the <T>? Same for the other function(s) in this file.

Suggested change
pub fn stream_pages_generic<T>(
pub fn stream_pages<T>(

Comment on lines +268 to +275
impl<I> std::fmt::Debug for RecordBatchReaderAdapter<I> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("RecordBatchReaderAdapter")
.field("schema", &self.schema)
.finish_non_exhaustive()
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this Debug impl?

Output = Result<impl Stream<Item = Result<Item, Self::Error>> + Send, Self::Error>,
> + Send {
// Collect synchronously while we hold `&self`.
let result: Result<Vec<Item>, AdapterError<T::Error>> = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to not force us to load all items into memory?

Comment on lines +265 to +270
let stream = self.search_stream(search).await?;
futures::pin_mut!(stream);
let mut items = Vec::new();
while let Some(result) = stream.next().await {
items.push(result?);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you could collect streams? But maybe I'm wrong...

Comment on lines +31 to +32
async-stream.workspace = true
futures.workspace = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love pulling these in by default. We probably want to put them behind an async feature?

Comment on lines +20 to +21
futures.workspace = true
futures-core.workspace = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as core, let's put these behind an async feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our docs/ are generally light, I prefer to use the docs.rs docs. These should probably live in the stac::api module-level documentation?

package-lock.json
crates/wasm/tests/__screenshots__
.yarn/
planfor*.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants