Skip to content

docs: Document pattern for using non-exposed stream definitions as parent streams #866

@devin-ai-integration

Description

@devin-ai-integration

Summary

When building complex connectors with multi-level substream hierarchies, it's useful to define stream definitions that are only used internally as parent streams for other streams, without exposing them as top-level streams. This pattern is currently undocumented but is actively used in production connectors.

Problem

The current YAML Reference documentation explains that only entries in the top-level streams: array are exposed as runnable streams, but it doesn't explicitly document the pattern of:

  1. Defining a full stream definition in definitions that is NOT listed in streams:
  2. Using that definition solely as a parent_stream_config for another stream
  3. The naming convention some connectors use (e.g., __ prefix) to signal "internal helper"

This pattern is particularly useful for 3-level nested substream hierarchies where an intermediate stream is needed to provide partition keys but shouldn't be exposed to users.

Example Implementation: Jira Connector

The Jira connector uses this pattern extensively. Here are code permalinks:

Internal/Private Stream Definitions (in definitions, NOT in streams:)

How It's Used (3-level hierarchy example)

The issue_properties_stream references the internal __issue_property_keys_substream as its parent:

issue_properties_stream:
  # ...
  retriever:
    # ...
    partition_router:
      type: SubstreamPartitionRouter
      parent_stream_configs:
        - type: ParentStreamConfig
          stream: "#/definitions/__issue_property_keys_substream"  # <-- Internal stream reference

This creates a 3-level hierarchy:

  1. issues_stream (grandparent - exposed)
  2. __issue_property_keys_substream (parent - internal, NOT exposed)
  3. issue_properties_stream (child - exposed)

Top-level streams: Section

The streams section only lists the streams that should be exposed to users - the __-prefixed definitions are intentionally omitted.

Suggested Documentation

Add a section to the YAML Reference or a new "Advanced Patterns" page that documents:

  1. Pattern: Using stream definitions as internal parent streams
  2. Use case: Multi-level substream hierarchies where intermediate streams shouldn't be exposed
  3. Naming convention: The __ prefix convention (optional but recommended for clarity)
  4. Behavior: Streams not listed in streams: will not be exposed by source.streams(config) - attempting to sync them will silently no-op
  5. Testing implications: When writing mock server tests, always verify stream names against the streams: section to avoid testing non-existent streams

Context

This issue was discovered while creating comprehensive mock server tests for the Jira connector (airbytehq/airbyte#70884). The pattern caused confusion when attempting to test issue_property_keys as a stream, only to discover it's an internal-only definition.


Requested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))
Related PR: airbytehq/airbyte#70884
Devin session: https://app.devin.ai/sessions/f152f435f9d146688e476611ff864c30

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions