Skip to content

Commit a15806b

Browse files
authored
feat: Add Workflow Analytics Dashboards with OpenSearch integration (#229)
* feat(infra): add nginx reverse proxy and production security - Add nginx reverse proxy for unified entry point at http://localhost - Routes: / (frontend), /api (backend), /analytics (OpenSearch Dashboards) - Configure OpenSearch Dashboards with /analytics base path - Add production deployment with TLS and security plugin - SaaS multitenancy with per-customer tenant isolation - Certificate generation script (just generate-certs) - New commands: just dev, just prod-secure Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * feat(workflows): add STALE status and workflow improvements - Add STALE status for orphaned run records (DB/Temporal mismatch) - Improve status inference from trace events when Temporal not found - Use correct TraceEventType values for status detection - Add amber badge color for STALE status - Extract WorkflowNode into modular directory structure - Document all execution statuses with transition diagram Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * feat(analytics): add Security Analytics platform with OpenSearch integration Analytics Sink Component (core.analytics.sink): - Index output data from any upstream node to OpenSearch - Auto-detect asset correlation keys (host, domain, url, ip, etc.) - Fire-and-forget with retry logic (3 attempts, exponential backoff) - Configurable index suffix and fail-on-error modes OpenSearch Integration: - Daily index rotation: security-findings-{orgId}-{YYYY.MM.DD} - Index template with standard metadata fields - Multi-tenant data isolation per organization Analytics API: - POST /api/v1/analytics/query with OpenSearch DSL support - Auto-scope queries to organization's index pattern - Rate limiting: 100 req/min per user - Protected routes require authentication - Session cookie support for analytics route auth UI Integration: - Analytics Settings page with tier-based retention - Dashboards link in sidebar (opens in new tab) - View Analytics button uses Discover app with proper URL state - Uses .keyword fields for exact match filtering Component SDK Extensions: - generateFindingHash() for deduplication - Workflow context (workflowId, workflowName, organizationId) - Results output port on nuclei, trufflehog, supabase-scanner - Support for optional inputs in components Bug fixes: - Fix webhook URLs to include global API prefix (ENG-115) - Add proper connectionType for list variable types - Handle invalid_value errors for placeholder fields Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * feat(analytics): add multi-tenant OpenSearch Security with dynamic provisioning Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * docs(analytics): add multi-tenant architecture and troubleshooting guide Document the OpenSearch tenant identity resolution flow, Clerk active org session vs membership distinction, tenant provisioning details, and security guarantees. Add troubleshooting entry for workspace-user fallback with screenshots and diagnostic commands. Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * feat(analytics): lock down Dashboards for SaaS tenants and fix saved objects Two-layer SaaS lockdown for OpenSearch Dashboards: 1. nginx whitelist: PCRE negative lookahead blocks non-whitelisted /analytics/app/* routes (returns 403). Allowed: Discover, Visualize, Dashboards, Alerting, Dev Tools, Data Explorer, Home. Blocked: ISM, Security, Management, Anomaly Detection, Maps, etc. Admin retains full access via direct Dashboards port (5601). 2. Role permissions: Replace ISM cluster permissions with Alerting permissions (monitor CRUD, alerts, destinations) for tenant roles. Add indices:data/write/bulk cluster permission required for Dashboards saved objects (visualizations, dashboards, saved searches). Without this, multitenancy's kibana_all_write grant is never reached. 3. Default landing page set to Discover instead of Home (which exposes all plugin links including blocked ones). Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * refactor(dev): unify justfile commands and harden Dashboards lockdown Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * feat(infra): lock down service ports to localhost-only in dev, disable in prod Base compose configs (infra.yml, full.yml) now use `expose` instead of `ports` for all internal services. Dev-ports overlay binds everything to 127.0.0.1. Only nginx port 80 remains publicly accessible. Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * fix(analytics): harden tenant security roles and restore bulk write permission Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * chore(infra): consolidate duplicate configs and remove orphaned files - Merge nginx.full.conf into nginx.prod.conf (95% identical, prod has better proxy_redirect) - Consolidate DB init scripts: merge temporal DB creation into 01-create-instance-databases.sh - Remove orphaned scripts: dev-instance-manager.sh, instance-bootstrap.sh (unreferenced) - Remove deprecated opensearch-security/whitelist.yml (superseded by allowlist.yml) - Update docker-compose.full.yml and docs to reference nginx.prod.conf Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * fix(test): mock AnalyticsModule in MCP integration test The AnalyticsModule's controller and services depend on ConfigService and OpenSearchClient which aren't available in the MCP test module. Use overrideModule to replace the entire AnalyticsModule with mocks. Also add explicit ConfigModule import to AnalyticsModule. Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * fix(infra): correct PM2 process names, Kafka port, and security init - Fix PM2 --only filter to use instance-suffixed names (shipsec-backend-0) - Fix Kafka broker port from 19092 to 9092 (matches single-listener Redpanda) - Add whitelist.yml required by securityadmin.sh alongside allowlist.yml Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * refactor(ui): move Analytics Settings under Manage sidebar section Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * docs: update port references to reflect nginx routing in dev mode Update all documentation to reflect that services are accessed through nginx on port 80 in both dev and production modes: - docker/README.md: Add nginx routing table for dev mode - docs/installation.mdx: Update service endpoints and analytics URLs - docs/quickstart.mdx: Update access points - docs/architecture.mdx: Update development URLs - docs/command-reference.mdx: Update dev and prod access points - README.md: Correct production access URL from 8090 to 80 Individual service ports (5173, 3211, 5601) remain available for debugging but should not be used in normal development. Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * docs: update env examples and remaining port references to nginx routing Update environment variable examples and remaining documentation to reflect nginx routing for all application services: - backend/.env.example: Update OPENSEARCH_DASHBOARDS_URL comment - worker/.env.example: Update OPENSEARCH_DASHBOARDS_URL to use nginx route - docs/development/component-development.mdx: Update API examples to use nginx - docs/user-guide.md: Update access URL from 8090 to 80 - docs/installation.mdx: Update env examples and production endpoints - docs/analytics.md: Update dashboard access URLs and SSH tunneling - docs/development/workflow-analytics.mdx: Update dashboard URL example All services now accessed via nginx on port 80 with proper routing: - Frontend: http://localhost/ - Backend API: http://localhost/api/ - Analytics: http://localhost/analytics/ Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> * docs(justfile): clarify debug port messages for nginx routing Update development mode port messages to explicitly indicate that direct service ports are for debugging only and nginx should be used in normal development. This aligns with the port isolation pattern from PR #265. Changes: - Add "(debugging only, use nginx in normal development)" to port messages - Consistent messaging across secure and local auth modes - Production mode already correctly shows nginx routing and port isolation Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> --------- Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com> Co-authored-by: Aseem Shrey <LuD1161@users.noreply.github.com>
1 parent fe8b35d commit a15806b

132 files changed

Lines changed: 9171 additions & 973 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# Analytics Output Port Design
2+
3+
## Status: Approved
4+
## Date: 2025-01-21
5+
6+
## Problem Statement
7+
8+
When connecting a component's `rawOutput` (which contains complex nested JSON) to the Analytics Sink, OpenSearch hits the default field limit of 1000 fields. This is because:
9+
10+
1. **Dynamic mapping explosion**: Elasticsearch/OpenSearch creates a field for every unique JSON path
11+
2. **Nested structures**: Arrays with objects like `issues[0].metadata.schema` create many paths
12+
3. **Varying schemas**: Different scanner outputs accumulate unique field paths over time
13+
14+
Example error:
15+
```
16+
illegal_argument_exception: Limit of total fields [1000] has been exceeded
17+
```
18+
19+
## Solution
20+
21+
### Design Decisions
22+
23+
1. **Each component owns its analytics schema**
24+
- Components output structured `list<json>` through dedicated ports (`findings`, `results`, `secrets`, `issues`)
25+
- Component authors define the structure appropriate for their tool
26+
- No generic "one schema fits all" approach
27+
28+
2. **Analytics Sink accepts `list<json>`**
29+
- Input type: `z.array(z.record(z.string(), z.unknown()))`
30+
- Each item in the array is indexed as a separate document
31+
- Rejects arbitrary nested objects (must be an array)
32+
33+
3. **Same timestamp for all findings in a batch**
34+
- All findings from one component execution share the same `@timestamp`
35+
- Captured once at the start of indexing, applied to all documents
36+
37+
4. **Nested `shipsec` context**
38+
- Workflow context stored under `shipsec.*` namespace
39+
- Prevents field name collision with component data
40+
- Clear separation: component fields at root, system fields under `shipsec`
41+
42+
5. **Nested objects serialized before indexing**
43+
- Any nested object or array within a finding is JSON-stringified
44+
- Prevents field explosion from dynamic mapping
45+
- Trade-off: Can't query inside serialized fields directly, but prevents index corruption
46+
47+
6. **No `data` wrapper**
48+
- Original PRD design wrapped component output in a `data` field
49+
- New design: finding fields are at the top level for easier querying
50+
51+
### Document Structure
52+
53+
**Before (PRD design):**
54+
```json
55+
{
56+
"workflow_id": "...",
57+
"workflow_name": "...",
58+
"run_id": "...",
59+
"node_ref": "...",
60+
"component_id": "...",
61+
"@timestamp": "...",
62+
"asset_key": "...",
63+
"data": {
64+
"check_id": "DB_RLS_DISABLED",
65+
"severity": "CRITICAL",
66+
"metadata": { "schema": "public", "table": "users" }
67+
}
68+
}
69+
```
70+
71+
**After (new design):**
72+
```json
73+
{
74+
"check_id": "DB_RLS_DISABLED",
75+
"severity": "CRITICAL",
76+
"title": "RLS Disabled on Table: users",
77+
"resource": "public.users",
78+
"metadata": "{\"schema\":\"public\",\"table\":\"users\"}",
79+
"scanner": "supabase-scanner",
80+
"asset_key": "abcdefghij1234567890",
81+
"finding_hash": "a1b2c3d4e5f67890",
82+
83+
"shipsec": {
84+
"organization_id": "org_123",
85+
"run_id": "shipsec-run-xxx",
86+
"workflow_id": "d1d33161-929f-4af4-9a64-xxx",
87+
"workflow_name": "Supabase Security Audit",
88+
"component_id": "core.analytics.sink",
89+
"node_ref": "analytics-sink-1"
90+
},
91+
92+
"@timestamp": "2025-01-21T10:30:00.000Z"
93+
}
94+
```
95+
96+
### Component Output Ports
97+
98+
Components should use their existing structured list outputs:
99+
100+
| Component | Port | Type | Notes |
101+
|-----------|------|------|-------|
102+
| Nuclei | `results` | `z.array(z.record(z.string(), z.unknown()))` | Scanner + asset_key added |
103+
| TruffleHog | `results` | `z.array(z.record(z.string(), z.unknown()))` | Scanner + asset_key added |
104+
| Supabase Scanner | `results` | `z.array(z.record(z.string(), z.unknown()))` | Scanner + asset_key added |
105+
106+
All `results` ports include:
107+
- `scanner`: Scanner identifier (e.g., `'nuclei'`, `'trufflehog'`, `'supabase-scanner'`)
108+
- `asset_key`: Primary asset identifier from the finding
109+
- `finding_hash`: Stable hash for deduplication (16-char hex from SHA-256)
110+
111+
### Finding Hash for Deduplication
112+
113+
The `finding_hash` enables tracking findings across workflow runs:
114+
115+
**Generation:**
116+
```typescript
117+
import { createHash } from 'crypto';
118+
119+
function generateFindingHash(...fields: (string | undefined | null)[]): string {
120+
const normalized = fields.map((f) => (f ?? '').toLowerCase().trim()).join('|');
121+
return createHash('sha256').update(normalized).digest('hex').slice(0, 16);
122+
}
123+
```
124+
125+
**Key fields per scanner:**
126+
| Scanner | Hash Fields |
127+
|---------|-------------|
128+
| Nuclei | `templateId + host + matchedAt` |
129+
| TruffleHog | `DetectorType + Redacted + filePath` |
130+
| Supabase Scanner | `check_id + projectRef + resource` |
131+
132+
**Use cases:**
133+
- **New vs recurring**: Is this finding appearing for the first time?
134+
- **First-seen / last-seen**: When did we first detect this? Is it still present?
135+
- **Resolution tracking**: Findings that stop appearing may be resolved
136+
- **Deduplication**: Remove duplicates in dashboards across runs
137+
138+
### `shipsec` Context Fields
139+
140+
The indexer automatically adds these fields under `shipsec`:
141+
142+
| Field | Description |
143+
|-------|-------------|
144+
| `organization_id` | Organization that owns the workflow |
145+
| `run_id` | Unique identifier for this workflow execution |
146+
| `workflow_id` | ID of the workflow definition |
147+
| `workflow_name` | Human-readable workflow name |
148+
| `component_id` | Component type (e.g., `core.analytics.sink`) |
149+
| `node_ref` | Node reference in the workflow graph |
150+
| `asset_key` | Auto-detected or specified asset identifier |
151+
152+
### Querying in OpenSearch
153+
154+
With this structure, users can:
155+
- Filter by organization: `shipsec.organization_id: "org_123"`
156+
- Filter by workflow: `shipsec.workflow_id: "xxx"`
157+
- Filter by run: `shipsec.run_id: "xxx"`
158+
- Filter by asset: `asset_key: "api.example.com"`
159+
- Filter by scanner: `scanner: "nuclei"`
160+
- Filter by component-specific fields: `severity: "CRITICAL"`
161+
- Aggregate by severity: `terms` aggregation on `severity` field
162+
- Track finding history: `finding_hash: "a1b2c3d4" | sort @timestamp`
163+
- Find recurring findings: Group by `finding_hash`, count occurrences
164+
165+
### Trade-offs
166+
167+
| Decision | Pro | Con |
168+
|----------|-----|-----|
169+
| Serialize nested objects | Prevents field explosion | Can't query inside serialized fields |
170+
| `shipsec` namespace | No field collision | Slightly more verbose queries |
171+
| No generic schema | Better fit per component | Less consistency across components |
172+
| Same timestamp per batch | Accurate (same scan time) | Can't distinguish individual finding times |
173+
174+
### Implementation Files
175+
176+
1. `/worker/src/utils/opensearch-indexer.ts` - Add `shipsec` context, serialize nested objects
177+
2. `/worker/src/components/core/analytics-sink.ts` - Accept `list<json>`, consistent timestamp
178+
3. Component files - Ensure structured output, add `results` port where missing
179+
180+
### Backward Compatibility
181+
182+
- Existing workflows connecting `rawOutput` to Analytics Sink will still work
183+
- Analytics Sink continues to accept any data type for backward compatibility
184+
- New `list<json>` processing only triggers when input is an array
185+
186+
### Future Considerations
187+
188+
1. **Index templates**: Create OpenSearch index template with explicit mappings for `shipsec.*` fields
189+
2. **Field discovery**: Build UI to show available fields from indexed data
190+
3. **Schema validation**: Optional strict mode to validate findings against expected schema

Dockerfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ ARG VITE_DEFAULT_ORG_ID=local-dev
8989
ARG VITE_GIT_SHA=unknown
9090
ARG VITE_PUBLIC_POSTHOG_KEY=""
9191
ARG VITE_PUBLIC_POSTHOG_HOST=""
92+
ARG VITE_OPENSEARCH_DASHBOARDS_URL=""
9293

9394
ENV VITE_AUTH_PROVIDER=${VITE_AUTH_PROVIDER}
9495
ENV VITE_CLERK_PUBLISHABLE_KEY=${VITE_CLERK_PUBLISHABLE_KEY}
@@ -98,6 +99,7 @@ ENV VITE_DEFAULT_ORG_ID=${VITE_DEFAULT_ORG_ID}
9899
ENV VITE_GIT_SHA=${VITE_GIT_SHA}
99100
ENV VITE_PUBLIC_POSTHOG_KEY=${VITE_PUBLIC_POSTHOG_KEY}
100101
ENV VITE_PUBLIC_POSTHOG_HOST=${VITE_PUBLIC_POSTHOG_HOST}
102+
ENV VITE_OPENSEARCH_DASHBOARDS_URL=${VITE_OPENSEARCH_DASHBOARDS_URL}
101103

102104
# Set working directory for frontend
103105
USER shipsec
@@ -129,6 +131,7 @@ ARG VITE_DEFAULT_ORG_ID=local-dev
129131
ARG VITE_GIT_SHA=unknown
130132
ARG VITE_PUBLIC_POSTHOG_KEY=""
131133
ARG VITE_PUBLIC_POSTHOG_HOST=""
134+
ARG VITE_OPENSEARCH_DASHBOARDS_URL=""
132135

133136
ENV VITE_AUTH_PROVIDER=${VITE_AUTH_PROVIDER}
134137
ENV VITE_CLERK_PUBLISHABLE_KEY=${VITE_CLERK_PUBLISHABLE_KEY}
@@ -138,6 +141,7 @@ ENV VITE_DEFAULT_ORG_ID=${VITE_DEFAULT_ORG_ID}
138141
ENV VITE_GIT_SHA=${VITE_GIT_SHA}
139142
ENV VITE_PUBLIC_POSTHOG_KEY=${VITE_PUBLIC_POSTHOG_KEY}
140143
ENV VITE_PUBLIC_POSTHOG_HOST=${VITE_PUBLIC_POSTHOG_HOST}
144+
ENV VITE_OPENSEARCH_DASHBOARDS_URL=${VITE_OPENSEARCH_DASHBOARDS_URL}
141145

142146
# Set working directory for frontend
143147
USER shipsec

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ This installer will:
5353
- Clone the repository and start all services
5454
- Guide you through any required setup steps
5555

56-
Once complete, visit **http://localhost:8090** to access ShipSec Studio.
56+
Once complete, visit **http://localhost** to access ShipSec Studio.
5757

5858
### 2. ShipSec Cloud (Preview)
5959

@@ -79,7 +79,7 @@ cd studio
7979
just prod start-latest
8080
```
8181

82-
Access the studio at `http://localhost:8090`.
82+
Access the studio at `http://localhost`.
8383

8484
---
8585

backend/.env.example

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ AUTH_PROVIDER="local"
3232
# If AUTH_LOCAL_ALLOW_UNAUTHENTICATED=false, clients must present AUTH_LOCAL_API_KEY in the Authorization header.
3333
AUTH_LOCAL_ALLOW_UNAUTHENTICATED="true"
3434
AUTH_LOCAL_API_KEY=""
35+
# Required in production for session auth cookie signing
36+
SESSION_SECRET=""
3537

3638
# Clerk provider options
3739
# Required when AUTH_PROVIDER="clerk"
@@ -44,15 +46,25 @@ PLATFORM_SERVICE_TOKEN=""
4446
# Optional: override request timeout in milliseconds (default 5000)
4547
PLATFORM_API_TIMEOUT_MS=""
4648

47-
# OpenSearch configuration
48-
OPENSEARCH_URL="http://localhost:9200"
49-
OPENSEARCH_INDEX_PREFIX="logs-tenant"
50-
# OPENSEARCH_USERNAME=""
51-
# OPENSEARCH_PASSWORD=""
49+
# OpenSearch configuration for security analytics indexing
50+
# Optional: if not set, security analytics indexing will be disabled
51+
OPENSEARCH_URL=""
52+
OPENSEARCH_USERNAME=""
53+
OPENSEARCH_PASSWORD=""
54+
55+
# OpenSearch Dashboards configuration for analytics visualization
56+
# Optional: if not set, Dashboards link will not appear in frontend sidebar
57+
# Dev/Prod (via nginx): "http://localhost/analytics"
58+
# Custom domain: "https://dashboards.example.com/analytics"
59+
OPENSEARCH_DASHBOARDS_URL=""
5260

5361
# Secret encryption key (must be exactly 32 characters, NOT hex-encoded)
5462
# Generate with: openssl rand -base64 24 | head -c 32
5563
SECRET_STORE_MASTER_KEY="CHANGE_ME_32_CHAR_SECRET_KEY!!!!"
5664

65+
# Redis configuration for rate limiting and caching
66+
# Optional: if not set, rate limiting will use in-memory storage (not recommended for production)
67+
REDIS_URL=""
68+
5769
# Kafka / Redpanda configuration for node I/O, log, and event ingestion
5870
LOG_KAFKA_BROKERS="localhost:19092"

backend/package.json

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,22 @@
1414
"generate:openapi": "bun scripts/generate-openapi.ts",
1515
"migration:push": "bun x drizzle-kit push",
1616
"migration:smoke": "bun scripts/migration-smoke.ts",
17-
"delete:runs": "bun scripts/delete-all-workflow-runs.ts"
17+
"delete:runs": "bun scripts/delete-all-workflow-runs.ts",
18+
"setup:opensearch": "bun scripts/setup-opensearch.ts"
1819
},
1920
"dependencies": {
2021
"@clerk/backend": "^2.29.5",
2122
"@clerk/types": "^4.101.13",
2223
"@grpc/grpc-js": "^1.14.3",
24+
"@nest-lab/throttler-storage-redis": "^1.1.0",
2325
"@nestjs/common": "^10.4.22",
2426
"@nestjs/config": "^3.3.0",
2527
"@nestjs/core": "^10.4.22",
2628
"@nestjs/microservices": "^11.1.13",
2729
"@nestjs/platform-express": "^10.4.22",
2830
"@nestjs/swagger": "^11.2.5",
31+
"@nestjs/throttler": "^6.5.0",
32+
"@opensearch-project/opensearch": "^3.5.1",
2933
"@shipsec/backend-client": "workspace:*",
3034
"@shipsec/component-sdk": "workspace:*",
3135
"@shipsec/shared": "workspace:*",
@@ -62,6 +66,7 @@
6266
"@eslint/js": "^9.39.2",
6367
"@nestjs/testing": "^10.4.22",
6468
"@types/bcryptjs": "^3.0.0",
69+
"@types/cookie-parser": "^1.4.10",
6570
"@types/express-serve-static-core": "^4.19.8",
6671
"@types/har-format": "^1.2.16",
6772
"@types/multer": "^2.0.0",

0 commit comments

Comments
 (0)