Skip to content

[Enh]: Internal subsystem for embedding text. #3103

@JerryNixon

Description

@JerryNixon

@robertopc1

What?

A subsystem in DAB to fetch embeddings from text.

Why?

For the sake of future use around cache and parameter substitution.

Configuration (runtime.embeddings)

{
  "runtime": {
    "embeddings": {
      "enabled": true,
      "provider": "azure-openai",
      "base-url": "@env('EMBEDDINGS_ENDPOINT')",
      "api-key": "@env('EMBEDDINGS_API_KEY')",
      "model": "text-embedding-ada-002",
      "api-version": "2024-02-01",
      "dimensions": 1536,
      "timeout-ms": 30000,
      "endpoint": {
        "enabled": true,
        "path": "/embed",
        "roles": ["authenticated"]
      },
      "health": {
        "enabled": true,
        "threshold-ms": 1000
      }
    }
  }
}

Parameters

Parameter Type Required Default Description
enabled bool No true Master toggle for embedding subsystem
provider string Yes - azure-openai or openai
base-url string Yes - Provider base URL
api-key string Yes - Authentication key (can use @env() indirection)
model string Yes OpenAI: text-embedding-3-small Deployment name (Azure) or model name (OpenAI)
api-version string Azure only 2024-02-01 Azure API version (prevent deprecation issues)
dimensions int No Model default Output vector size (used for Redis schema alignment)
timeout-ms int No 30000 Request timeout in milliseconds
endpoint.enabled bool No false Whether to expose /embed REST API
endpoint.path string No /embed Embed endpoint path (customizable)
endpoint.roles string[] No ["authenticated"] List of roles allowed to access embed endpoint
health.enabled bool No true Enable embedding health check
health.threshold-ms int No 5000 Acceptable embedding latency threshold in ms for health check

Command line

  • Easily configure via dab configure --runtime.embeddings.*

REST Endpoint

  • Optionally exposes /embed for embedding on demand (secured by role)
  • When host.mode is development, then anonymous is automatically allowed.

Telemetry & Caching

  • OpenTelemetry spans/metrics for latency, cache hits/misses, and error rate
  • L1 (SHA256-based, 24h TTL, cache keys incorporate provider & model)

Health, Hot Reload, Startup

  • Health check covers embedding provider & endpoint
  • Listens for hot-reload event and reloads
  • Startup logs embedding configuration info

REST API Reference

Azure OpenAI:

  • URL: POST {base-url}/openai/deployments/{model}/embeddings?api-version={api-version}
  • Headers: api-key: {api-key}
  • Body: { "input": string | string[], "dimensions": 1536 }

OpenAI:

  • URL: POST {base-url}/v1/embeddings
  • Headers: Authorization: Bearer {api-key}
  • Body: { "model": "text-embedding-3-small", "input": string | string[], "dimensions": 1536 }

Metadata

Metadata

Labels

No labels
No labels

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions