Skip to content

Latest commit

 

History

History
473 lines (393 loc) · 14.8 KB

File metadata and controls

473 lines (393 loc) · 14.8 KB

📚 Examples & Recipes

Minimal .visor.yaml starter

version: "1.0"
steps:
  security:
    type: ai
    schema: code-review
    prompt: "Identify security vulnerabilities in changed files"
  • Fast local pre-commit hook (Husky)
npx husky add .husky/pre-commit "npx -y @probelabs/visor@latest --tags local,fast --output table || exit 1"

Chat-like workflows (human-input + ai)

Minimal chat loop (CLI/SDK):

version: "1.0"

checks:
  ask:
    type: human-input
    group: chat
    prompt: |
      Please type your message.

  reply:
    type: ai
    group: chat
    depends_on: ask
    ai:
      disableTools: true
      allowedTools: []
      system_prompt: "You are general assistant, follow user instructions."
    prompt: |
      You are a concise, friendly assistant.

      Conversation so far (oldest → newest):
      {% assign history = '' | chat_history: 'ask', 'reply' %}
      {% for m in history %}
      {{ m.role | capitalize }}: {{ m.text }}
      {% endfor %}

      Latest user message:
      {{ outputs['ask'].text }}

      Reply naturally. Keep it short (1–2 sentences).

    guarantee: "(output?.text ?? '').length > 0"
    on_success:
      goto: ask

Notes:

  • ask (human-input) produces { text, ts } by default.
  • reply (ai) responds and loops back to ask.
  • chat_history('ask','reply') merges both histories by timestamp with roles:
    • type: human-inputrole: "user"
    • type: airole: "assistant"

Slack chat using the same pattern:

version: "1.0"

slack:
  version: "v1"
  mentions: all
  threads: required

frontends:
  - name: slack
    config:
      summary:
        enabled: false

checks:
  ask:
    type: human-input
    group: chat
    prompt: |
      Please type your message. (Posted only when the workflow is waiting.)

  reply:
    type: ai
    group: chat
    depends_on: ask
    ai:
      disableTools: true
      allowedTools: []
      # For chat-style Slack flows you can optionally turn off
      # automatic PR/issue + Slack XML context and rely solely on
      # chat_history + conversation objects:
      # skip_transport_context: true
    prompt: |
      You are a concise, friendly assistant.

      Conversation so far (oldest → newest):
      {% assign history = '' | chat_history: 'ask', 'reply' %}
      {% for m in history %}
      {{ m.role | capitalize }}: {{ m.text }}
      {% endfor %}

      Latest user message:
      {{ outputs['ask'].text }}

      Reply naturally. Keep it short (1–2 sentences).

    guarantee: "(output?.text ?? '').length > 0"
    on_success:
      goto: ask

Runtime behavior:

  • First Slack message in a thread:
    • Treated as ask input.
    • reply posts into the same thread.
    • Engine loops to ask, posts a prompt, and saves a snapshot.
  • Next Slack message in the same thread:
    • Resumes from snapshot.
    • ask consumes the new message.
    • reply posts a new answer and loops again.

Accessing normalized conversation context in prompts:

{% if conversation %}
  Transport: {{ conversation.transport }}  {# 'slack', 'github', ... #}
  Thread: {{ conversation.thread.id }}
  {% for m in conversation.messages %}
    {{ m.user }} ({{ m.role }}): {{ m.text }}
  {% endfor %}
{% endif %}
  • Under Slack, conversation and slack.conversation are the same normalized object.
  • Under GitHub (PR/issue), conversation is built from the body + comment history using the same { role, user, text, timestamp } structure.

Customizing chat_history (roles, text, limits):

{% assign history = '' |
   chat_history:
     'ask',
     'clarify',
     'reply',
     direction: 'asc',
     limit: 50,
     text: {
       default_field: 'text',
       by_step: {
         'summarize': 'summary.text'
       }
     },
     roles: {
       by_step: {
         'summarize': 'system'
       }
     },
     role_map: 'ask=user,reply=assistant'
%}
{% for m in history %}
  [{{ m.step }}][{{ m.role }}] {{ m.text }}
{% endfor %}

Quick reference:

  • direction: 'asc' | 'desc', limit: N
  • text.default_field, text.by_step[step]
  • roles.by_step[step], roles.by_type[type], roles.default
  • role_map: 'step=role,other=role' as a compact override

See also:

Advanced routing & contracts (general patterns)

Inner loops vs. closing the loop

  • Close the loop: Leaf steps use on_success: goto: <entry-step> to end the workflow and return to a single top-level human-input. Each new event (Slack message, webhook, CLI run) starts a fresh execution.
  • Inner loop: Add a local human-input and route inside a sub‑flow:
    • Example shape: router → section-confirm → section-answer → section-confirm.
    • Use a control field (e.g. output.done === true) in transitions to exit the section back to the top-level entry step.
  • This pattern is transport-agnostic and works for Slack, GitHub, HTTP workflows, etc.
  • See: examples/slack-simple-chat.yaml for a concrete implementation of both patterns.

Declarative routing (transitions) instead of JS

  • Prefer on_success.transitions / on_finish.transitions for branching:

    on_success:
      transitions:
        - when: "output && output.intent === 'chat'"
          to: chat-answer
        - when: "output && output.intent === 'project_help'"
          to: project-intent
  • Reserve goto_js / run_js for legacy or very dynamic use cases.

  • More details: fault-management-and-contracts.md, loop-routing-refactor.md.

  • Pattern A — central router + transitions (explicit routing):

    • Use a single “router” step that sets control fields (e.g. output.intent, output.kind).
    • Declare all branching in one place via on_success.transitions on the router:
      router:
        type: ai
        on_success:
          transitions:
            - when: "output.intent === 'chat'"
              to: chat-answer
            - when: "output.intent === 'status'"
              to: status-answer
      
      chat-answer:
        depends_on: [router]
        if: "outputs['router']?.intent === 'chat'"
      
      status-answer:
        depends_on: [router]
        if: "outputs['router']?.intent === 'status'"
    • Good when you want a single, centralized view of routing logic. Use if on branches for readability and to skip branches cleanly; reserve assume for hard dependency checks only.
  • Pattern B — distributed routing via depends_on + if:

    • Omit transitions entirely and let each branch decide whether it should run:
      router:
        type: ai
        # no on_success.transitions
      
      chat-answer:
        depends_on: [router]
        if: "outputs['router']?.intent === 'chat'"
      
      status-answer:
        depends_on: [router]
        if: "outputs['router']?.intent === 'status'"
    • The DAG (depends_on) defines possible flows; if conditions select the active branch(es) per run.
    • This works well when routing is simple or when you prefer fully local branch declarations over a central router table.

Criticality + assume + guarantee (recommended layout)

  • Apply to any workflow, not just chat:
    • external – step changes external state:
      • Examples: GitHub comments/labels, HTTP POST/PUT/PATCH/DELETE, ticket creation, updating CI/CD or incident systems, filesystem writes in a shared location.
      • If someone can look elsewhere and see a change after this step, it’s usually external.
    • internal – step changes the workflow’s control-plane:
      • Examples: forEach parents that fan out work; steps with on_* transitions/goto that decide what runs next; script/memory steps that set flags used by if/assume/guarantee.
      • If it mostly “steers” the run (not user-facing output), treat it as internal.
    • policy – step enforces org or safety rules:
      • Examples: permission checks (who may deploy/label), change windows, compliance checks (branches, commit format, DCO/CLA).
      • Often used to gate external steps (e.g. only label when policy passes).
    • info – read-only / non-critical:
      • Examples: summaries, hints, dashboards, advisory AI steps that do not gate other critical steps and do not mutate anything directly.
  • For internal / external steps, group fields in this order:
    some-step:
      type: ai | script | command | ...
      group: ...
      depends_on: [...]
      criticality: internal    # or external / policy / info
      assume:
        - "upstream condition"       # never reference this step's own output here
      guarantee: "output?.field != null"   # assertions about this step's output
      schema:                           # JSON Schema when output is structured
        ...
  • Use assume for preconditions about upstream state (memory, env, outputs[...]).
  • Use guarantee for postconditions about this step’s own output (shape, control flags, size caps).
  • For info steps, contracts are recommended but optional; keep assume + guarantee adjacent when present.
  • More details: criticality-modes.md, fault-management-and-contracts.md.

JSON Schemas instead of schema: plain

  • For structured outputs (routers, script integrations, control signals), prefer real JSON Schema:
    router-step:
      schema:
        type: object
        properties:
          intent:
            type: string
            enum: [chat, summarize, escalate]
          target:
            type: string
        required: [intent]
  • For text responses, it can still be useful to wrap in an object:
    answer:
      schema:
        type: object
        properties:
          text: { type: string }
        required: [text]
      guarantee: "(output?.text ?? '').length > 0"
  • Use schema: plain only when output shape is genuinely unconstrained.

Tip: When you define a JSON Schema, you generally do not need to tell the model “respond only as JSON”; describe the semantics in the prompt and let the renderer/schema enforce shape.

Expression style (assume, guarantee, when)

  • Prefer clear, concise expressions:
    • outputs['router']?.intent === 'chat'
    • !!outputs['status-fetch']?.project
    • output?.done === true
  • Avoid noisy fallbacks like (outputs['x']?.kind ?? '') === 'status' when outputs['x']?.kind === 'status' is equivalent.
  • These conventions apply uniformly to any provider (ai, command, script, github, http_client, etc).

Command step best practices

When using type: command steps:

Avoid external tool dependencies like jq, yq, python, etc.:

  • They may not be installed in all environments (GitHub Actions, Docker, CI)
  • Use transform_js to parse and transform output instead
  • Keep shell commands simple: grep, sed, awk, sort, head are universally available
# Bad - requires jq
extract-data:
  type: command
  exec: |
    echo "$TEXT" | grep -oE '[A-Z]+-[0-9]+' | jq -R -s 'split("\n")'
  parseJson: true

# Good - use transform_js for parsing
extract-data:
  type: command
  exec: |
    echo "$TEXT" | grep -oE '[A-Z]+-[0-9]+' | sort -u
  transform_js: |
    const lines = (output || '').trim().split('\n').filter(Boolean);
    return { data: lines, count: lines.length };

Prefer line-separated output over JSON from shell:

  • Simple to parse with transform_js
  • No need for parseJson: true
  • More robust across different shells/environments

Use transform_js for structured output:

  • The sandbox provides output (command stdout as string)
  • Return an object with the fields you need
  • Works consistently across all environments

Testing workflows with --no-mocks

The --no-mocks flag runs your test cases with real providers instead of injecting mock responses. This is essential for:

  1. Debugging integration issues - See actual API responses and errors
  2. Capturing realistic mock data - Get real output to copy into your test cases
  3. Validating credentials - Verify environment variables are set correctly
  4. Developing new workflows - Build tests incrementally with real data

Basic usage

# Run all test cases with real providers
visor test --config my-workflow.yaml --no-mocks

# Run a specific test case with real providers
visor test --config my-workflow.yaml --no-mocks --only "my-test-case"

Suggested mocks output

When running with --no-mocks, Visor captures each step's output and prints it as YAML you can copy directly into your test case:

🔴 NO-MOCKS MODE: Running with real providers (no mock injection)
   Step outputs will be captured and printed as suggested mocks

... test execution ...

📋 Suggested mocks (copy to your test case):
mocks:
  extract-keys:
    data:
      - PROJ-123
      - DEV-456
    count: 2
  fetch-issues:
    data:
      - key: PROJ-123
        summary: Fix authentication bug
        status: In Progress

Copy the YAML under mocks: into your test case's mocks: section.

Workflow for building tests

  1. Start with a minimal test case (no mocks):

    tests:
      cases:
        - name: my-new-test
          event: manual
          fixture: local.minimal
          workflow_input:
            text: "Fix bug PROJ-123"
  2. Run with --no-mocks to capture real outputs:

    visor test --config workflow.yaml --no-mocks --only "my-new-test"
  3. Copy the suggested mocks into your test case:

    tests:
      cases:
        - name: my-new-test
          event: manual
          fixture: local.minimal
          workflow_input:
            text: "Fix bug PROJ-123"
          mocks:
            extract-keys:
              data: ["PROJ-123"]
              count: 1
            # ... rest of captured mocks
  4. Add assertions based on the real data:

          expect:
            workflow_output:
              - path: issue_count
                equals: 1
  5. Run normally to verify mocks work:

    visor test --config workflow.yaml --only "my-new-test"

Debugging with --no-mocks

When a test fails with mocks, use --no-mocks to see what's actually happening:

# See real API responses and errors
visor test --config workflow.yaml --no-mocks --only "failing-test"

# Common issues revealed:
# - Missing or expired credentials
# - API endpoint changes
# - Unexpected response formats
# - Network/timeout issues

The real error messages and responses help identify whether the issue is with your mocks or the actual integration.

More examples