Cortex File System - Complete Documentation

Architecture Overview
File Handler Service
Cortex File Utilities Layer
File Collection System
Tools Integration
Data Flow Diagrams
Storage Layers
Key Concepts
Complete Function Reference
Error Handling

Architecture Overview

The Cortex file system is a multi-layered architecture that handles file uploads, storage, retrieval, and management:

┌─────────────────────────────────────────────────────────────┐
│                    Cortex Application                        │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │         System Tools & Plugins                       │   │
│  │  (WriteFile, EditFile, Image, FileCollection, etc.)  │   │
│  └──────────────────┬─────────────────────────────────┘   │
│                      │                                       │
│  ┌───────────────────▼─────────────────────────────────┐   │
│  │         lib/fileUtils.js                             │   │
│  │  (Encapsulated file handler interactions)            │   │
│  └───────────────────┬─────────────────────────────────┘   │
│                      │                                       │
│  ┌───────────────────▼─────────────────────────────────┐   │
│  │         File Collection System                       │   │
│  │  (Redis hash maps: FileStoreMap:ctx:<contextId>)    │   │
│  └───────────────────┬─────────────────────────────────┘   │
└───────────────────────┼───────────────────────────────────────┘
                        │
                        │ HTTP/HTTPS
                        │
┌───────────────────────▼───────────────────────────────────────┐
│         Cortex File Handler Service                           │
│  (External Azure Function - cortex-file-handler)              │
│                                                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   Azure Blob │  │   GCS        │  │    Redis     │       │
│  │   Storage    │  │   Storage    │  │   Metadata   │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
└────────────────────────────────────────────────────────────────┘

Key Components

File Handler Service (cortex-file-handler): External Azure Function that handles actual file storage
File Utilities (lib/fileUtils.js): Cortex's abstraction layer over the file handler
File Collection System: Redis-based metadata storage for user file collections
System Tools: Pathways that use files (WriteFile, EditFile, Image, etc.)

File Handler Service

The file handler is an external Azure Function service that manages file storage and processing.

Configuration

URL: Configured via WHISPER_MEDIA_API_URL environment variable
Storage Backends: Azure Blob Storage (primary), Google Cloud Storage (optional), Local (fallback)

Key Features

1. Single Container Architecture

All files stored in a single Azure Blob Storage container
Files distinguished by blob index tags, not separate containers
No container parameter supported - always uses configured container

2. Retention Management

Temporary (default): Files tagged with retention=temporary, auto-deleted after 30 days
Permanent: Files tagged with retention=permanent, retained indefinitely
Retention changed via setRetention operation (updates blob tag, no file copying)

3. Context Scoping

contextId: Optional parameter for per-user/per-context file isolation
Redis keys: <hash>:ctx:<contextId> for context-scoped files
Falls back to unscoped keys if context-scoped not found
Strongly recommended for multi-tenant applications

4. Hash-Based Deduplication

Files identified by xxhash64 hash
Duplicate uploads return existing file URLs
Hash stored in Redis for fast lookups

5. Short-Lived URLs

All operations return shortLivedUrl (5-minute expiration, configurable)
Provides secure, time-limited access
Preferred for LLM file access

API Endpoints

POST `/file-handler` - Upload File

// FormData:
{
  file: <FileStream>,
  hash: "abc123",           // Optional: for deduplication
  contextId: "user-456",    // Optional: for scoping
  requestId: "req-789"      // Optional: for tracking
}

// Response:
{
  url: "https://storage.../file.pdf?long-lived-sas",
  shortLivedUrl: "https://storage.../file.pdf?short-lived-sas",
  gcs: "gs://bucket/file.pdf",  // If GCS configured
  hash: "abc123",
  filename: "file.pdf"
}

GET `/file-handler` - Retrieve/Process File

// Query Parameters:
{
  hash: "abc123",                    // Check if file exists
  checkHash: true,                    // Enable hash check
  contextId: "user-456",              // Optional: for scoping
  shortLivedMinutes: 5,               // Optional: URL expiration
  fetch: "https://example.com/file",  // Download from URL
  save: true                          // Save converted document
}

// Response (checkHash):
{
  url: "https://storage.../file.pdf",
  shortLivedUrl: "https://storage.../file.pdf?short-lived",
  gcs: "gs://bucket/file.pdf",
  hash: "abc123",
  filename: "file.pdf",
  converted: {                        // If file was converted
    url: "https://storage.../converted.csv",
    gcs: "gs://bucket/converted.csv"
  }
}

DELETE `/file-handler` - Delete File

// Query Parameters:
{
  hash: "abc123",           // Delete by hash
  contextId: "user-456",    // Optional: for scoping
  requestId: "req-789"      // Or delete all files for requestId
}

POST/PUT `/file-handler` - Set Retention

// Body:
{
  hash: "abc123",
  retention: "permanent",   // or "temporary"
  contextId: "user-456",   // Optional: for scoping
  setRetention: true
}

// Response:
{
  hash: "abc123",
  filename: "file.pdf",
  retention: "permanent",
  url: "https://storage.../file.pdf",  // Same URL (tag updated)
  shortLivedUrl: "https://storage.../file.pdf?new-sas",
  gcs: "gs://bucket/file.pdf"
}

Cortex File Utilities Layer

Location: lib/fileUtils.js

This is Cortex's abstraction layer that encapsulates all file handler interactions. No direct axios calls to the file handler should exist - all go through these functions.

Core Functions

URL Building

buildFileHandlerUrl(baseUrl, params)

Handles separator detection (? vs &)
Properly encodes all parameters
Skips null/undefined/empty values
Used by all file handler operations

File Upload

uploadFileToCloud(fileInput, mimeType, filename, pathwayResolver, contextId)

Input Types: URL string, base64 string, or Buffer
Process:
1. Converts input to Buffer
2. Computes xxhash64 hash
3. Checks if file exists via checkHashExists (deduplication)
4. If exists, returns existing URLs
5. If not, uploads via file handler POST
Returns: {url, gcs, hash}
ContextId: Passed in formData body (not URL)

File Retrieval

checkHashExists(hash, fileHandlerUrl, pathwayResolver, contextId, shortLivedMinutes)

Checks if file exists by hash
Returns short-lived URL (prefers converted version)
Returns: {url, gcs, hash, filename} or null
Makes single API call (optimized)

fetchFileFromUrl(fileUrl, requestId, contextId, save)

Downloads file from URL via file handler
Processes based on file type
Used by: azureVideoTranslatePlugin, azureCognitivePlugin

File Deletion

deleteFileByHash(hash, pathwayResolver, contextId)

Deletes file from cloud storage
Handles 404 gracefully (file already deleted)
Returns: true if deleted, false if not found

Retention Management

setRetentionForHash(hash, retention, contextId, pathwayResolver)

Sets file retention to 'temporary' or 'permanent'
Best-effort operation (logs warnings on failure)
Used by: addFileToCollection when permanent=true

Short-Lived URL Resolution

ensureShortLivedUrl(fileObject, fileHandlerUrl, contextId, shortLivedMinutes)

Resolves file object to use short-lived URL
Updates GCS URL if converted version exists
Used by: Tools that send files to LLMs

Media Chunks

getMediaChunks(file, requestId, contextId)

Gets chunked media file URLs
Used by: Media processing workflows

Cleanup

markCompletedForCleanUp(requestId, contextId)

Marks request as completed for cleanup
Used by: azureCognitivePlugin

File Collection System

Location: lib/fileUtils.js + pathways/system/entity/tools/sys_tool_file_collection.js

The file collection system stores file metadata in Redis hash maps using atomic operations for concurrent safety. Files are stored directly in Redis hash maps keyed by hash, with context-scoped isolation.

Storage Architecture

Redis Hash Maps
└── FileStoreMap:ctx:<contextId>
    └── Hash Map (hash → fileData JSON)
        └── File Entry (JSON):
            {
              // CFH-managed fields (preserved from file handler)
              url: "https://storage.../file.pdf",
              gcs: "gs://bucket/file.pdf",
              filename: "uuid-based-filename.pdf",  // CFH-managed
              
              // Cortex-managed fields (user metadata)
              id: "timestamp-random",
              displayFilename: "user-friendly-name.pdf",  // User-provided name
              mimeType: "application/pdf",
              tags: ["pdf", "report"],
              notes: "Quarterly report",
              hash: "abc123",
              permanent: true,
              addedDate: "2024-01-15T10:00:00.000Z",
              lastAccessed: "2024-01-15T10:00:00.000Z"
            }

Key Features

1. Atomic Operations

Uses Redis hash map operations (HSET, HGET, HDEL) which are atomic
No version-based locking needed - Redis operations are thread-safe
Direct hash map access: FileStoreMap:ctx:<contextId> → {hash: fileData}

2. Caching

In-memory cache with 5-second TTL
Reduces Redis load for read operations
Cache invalidated on writes

3. Field Ownership

CFH-managed fields: url, gcs, filename (UUID-based, managed by file handler)
Cortex-managed fields: id, displayFilename, tags, notes, mimeType, permanent, addedDate, lastAccessed
When merging data, CFH fields are preserved, Cortex fields are updated

Core Functions

Loading

loadFileCollection(contextId, contextKey, useCache)

Loads collection from Redis hash map FileStoreMap:ctx:<contextId>
Returns array of file entries (sorted by lastAccessed, most recent first)
Uses cache if available and fresh (5-second TTL)
Converts hash map entries to array format

Saving

saveFileCollection(contextId, contextKey, collection)

Saves collection to Redis hash map (only updates changed entries)
Uses atomic HSET operations per file
Optimized to only write files that actually changed
Returns true if successful, false on error

Metadata Updates

updateFileMetadata(contextId, hash, metadata)

Updates Cortex-managed metadata fields atomically
Preserves all CFH-managed fields
Updates only specified fields (displayFilename, tags, notes, mimeType, dates, permanent)
Used for: Updating lastAccessed, modifying tags/notes without full reload

Adding Files

addFileToCollection(contextId, contextKey, url, gcs, filename, tags, notes, hash, fileUrl, pathwayResolver, permanent)

Adds file entry to collection via atomic HSET operation
If fileUrl provided, uploads file first via uploadFileToCloud()
If permanent=true, sets retention to permanent via setRetentionForHash()
Merges with existing CFH data if file with same hash already exists
Returns file entry object with id

Processing Chat History Files

syncAndStripFilesFromChatHistory(chatHistory, contextId, contextKey)

Files IN collection: stripped from message (replaced with placeholder), tools can access them
Files NOT in collection: left in message as-is (model sees them directly)
Updates lastAccessed for collection files
Used by: sys_entity_agent to process incoming chat history

File Entry Schema

{
  id: string,                    // Unique ID: "timestamp-random" (Cortex-managed)
  url: string,                   // Azure Blob Storage URL (CFH-managed)
  gcs: string | null,            // Google Cloud Storage URL (CFH-managed)
  filename: string | null,       // CFH-managed filename (UUID-based) (CFH-managed)
  displayFilename: string | null, // User-friendly filename (Cortex-managed)
  mimeType: string | null,      // MIME type (Cortex-managed)
  tags: string[],               // Searchable tags (Cortex-managed)
  notes: string,                // User notes/description (Cortex-managed)
  hash: string,                 // File hash for deduplication (used as Redis key)
  permanent: boolean,           // Whether file is permanent (Cortex-managed)
  addedDate: string,            // ISO timestamp when added (Cortex-managed)
  lastAccessed: string          // ISO timestamp of last access (Cortex-managed)
}

Field Ownership Notes:

filename: Managed by CFH, UUID-based storage filename
displayFilename: Managed by Cortex, user-provided friendly name
When displaying files, prefer displayFilename with fallback to filename

Tools Integration

System Tools That Use Files

1. WriteFile (`sys_tool_writefile.js`)

Flow:

User provides content and filename
Creates Buffer from content
Calls uploadFileToCloud() with contextId
Calls addFileToCollection() with permanent=true
Returns file info with fileId

Key Code:

const uploadResult = await uploadFileToCloud(
    fileBuffer, mimeType, filename, resolver, contextId
);
const fileEntry = await addFileToCollection(
    contextId, contextKey, uploadResult.url, uploadResult.gcs,
    filename, tags, notes, uploadResult.hash, null, resolver, true
);

2. EditFile (`sys_tool_editfile.js`)

Flow:

User provides file identifier and modification
Resolves file via resolveFileParameter() → finds in collection
Downloads file content via axios.get(file.url)
Modifies content (line replacement or search/replace)
Uploads modified file via uploadFileToCloud() (creates new hash)
Updates collection entry atomically via updateFileMetadata() with new URL/hash
Deletes old file version (if not permanent) via deleteFileByHash()

Key Code:

const foundFile = await resolveFileParameter(fileParam, contextId, contextKey);
const oldHash = foundFile.hash;
const uploadResult = await uploadFileToCloud(
    fileBuffer, mimeType, filename, resolver, contextId
);
// Update file entry atomically (preserves CFH data, updates Cortex metadata)
await updateFileMetadata(contextId, foundFile.hash, {
    url: uploadResult.url,
    gcs: uploadResult.gcs,
    hash: uploadResult.hash
});
if (!foundFile.permanent) {
    await deleteFileByHash(oldHash, resolver, contextId);
}

3. FileCollection (`sys_tool_file_collection.js`)

Tools:

AddFileToCollection: Adds file to collection (with optional upload)
SearchFileCollection: Searches files by filename, tags, notes
ListFileCollection: Lists all files with filtering/sorting
RemoveFileFromCollection: Removes files (deletes from cloud if not permanent)

Key Code:

// Add file
await addFileToCollection(contextId, contextKey, url, gcs, filename, tags, notes, hash, fileUrl, resolver, permanent);

// Remove file (with permanent check)
if (!fileInfo.permanent) {
    await deleteFileByHash(fileInfo.hash, resolver, contextId);
}

4. Image Tools (`sys_tool_image.js`, `sys_tool_image_gemini.js`)

Flow:

Generates/modifies image
Gets image URL
Uploads via uploadFileToCloud()
Adds to collection with permanent=true

5. ReadFile (`sys_tool_readfile.js`)

Flow:

Resolves file via resolveFileParameter() → finds in collection
Downloads file content via axios.get(file.url)
Validates file is text-based via isTextMimeType()
Returns content with line/character range support

6. ViewImage (`sys_tool_view_image.js`)

Flow:

Finds file in collection
Resolves to short-lived URL via ensureShortLivedUrl()
Returns image URL for display

7. AnalyzeFile (`sys_tool_analyzefile.js`)

Flow:

Extracts files from chat history via extractFilesFromChatHistory()
Generates file message content via generateFileMessageContent()
Injects files into chat history via injectFileIntoChatHistory()
Uses Gemini Vision model to analyze files

Plugins That Use Files

1. AzureVideoTranslatePlugin

Flow:

Receives video URL
If not from Azure storage, uploads via fetchFileFromUrl()
Uses uploaded URL for video translation

Key Code:

const response = await fetchFileFromUrl(videoUrl, this.requestId, contextId, false);
const resultUrl = Array.isArray(response) ? response[0] : response.url;

2. AzureCognitivePlugin

Flow:

Receives file for indexing
If not text file, converts via fetchFileFromUrl() with save=true
Uses converted text file for indexing
Marks completed via markCompletedForCleanUp()

Key Code:

const data = await fetchFileFromUrl(file, requestId, contextId, true);
url = Array.isArray(data) ? data[0] : data.url;

Data Flow Diagrams

File Upload Flow

User/LLM Request
    │
    ▼
System Tool (WriteFile, Image, etc.)
    │
    ▼
uploadFileToCloud()
    │
    ├─► Convert input to Buffer
    ├─► Compute xxhash64 hash
    ├─► checkHashExists() ──► File Handler GET /file-handler?checkHash=true
    │                           │
    │                           ├─► File exists? ──► Return existing URLs
    │                           │
    │                           └─► File not found ──► Continue
    │
    └─► Upload via POST ──► File Handler POST /file-handler
        │                      │
        │                      ├─► Store in Azure Blob Storage
        │                      ├─► Store in GCS (if configured)
        │                      ├─► Store metadata in Redis
        │                      └─► Return {url, gcs, hash, shortLivedUrl}
        │
        └─► addFileToCollection()
            │
            ├─► If permanent=true ──► setRetentionForHash() ──► File Handler POST /file-handler?setRetention=true
            │
            └─► Save to Redis hash map (atomic operation)
                │
                └─► Redis HSET FileStoreMap:ctx:<contextId> <hash> <fileData>
                    │
                    ├─► Merge with existing CFH data (if hash exists)
                    ├─► Preserve CFH fields (url, gcs, filename)
                    └─► Update Cortex fields (displayFilename, tags, notes, etc.)

File Retrieval Flow

User/LLM Request (e.g., "view file.pdf")
    │
    ▼
System Tool (ViewImage, ReadFile, etc.)
    │
    ▼
resolveFileParameter()
    │
    ├─► Find in collection via findFileInCollection()
    │   │
    │   └─► Matches by: ID, filename, hash, URL, or fuzzy filename
    │
    └─► ensureShortLivedUrl()
        │
        └─► checkHashExists() ──► File Handler GET /file-handler?checkHash=true&shortLivedMinutes=5
            │                      │
            │                      ├─► Check Redis for hash metadata
            │                      ├─► Generate short-lived SAS token
            │                      └─► Return {url, gcs, hash, filename, shortLivedUrl}
            │
            └─► Return file object with shortLivedUrl

File Edit Flow

User/LLM Request (e.g., "edit file.txt, replace line 5")
    │
    ▼
EditFile Tool
    │
    ├─► resolveFileParameter() ──► Find file in collection
    │
    ├─► Download file content ──► axios.get(file.url)
    │
    ├─► Modify content (line replacement or search/replace)
    │
    ├─► uploadFileToCloud() ──► Upload modified file
    │   │
    │   └─► Returns new {url, gcs, hash}
    │
    └─► updateFileMetadata() ──► Redis HSET (atomic update)
        │
        ├─► Preserve CFH fields (url, gcs, filename)
        ├─► Update Cortex fields (url, gcs, hash)
        └─► If update succeeds:
            └─► Delete old file (if not permanent)
                └─► deleteFileByHash() ──► File Handler DELETE /file-handler?hash=oldHash

File Deletion Flow

User/LLM Request (e.g., "remove file.pdf from collection")
    │
    ▼
RemoveFileFromCollection Tool
    │
    ├─► Load collection ──► findFileInCollection() for each fileId
    │
    ├─► Capture file info (hash, permanent) from collection
    │
    └─► Redis HDEL FileStoreMap:ctx:<contextId> <hash> (atomic deletion)
        │
        └─► Async deletion (fire and forget)
            │
            ├─► For each file:
            │   │
            │   ├─► If permanent=true ──► Skip deletion (keep in cloud)
            │   │
            │   └─► If permanent=false ──► deleteFileByHash()
            │       │
            │       └─► File Handler DELETE /file-handler?hash=hash&contextId=contextId
            │           │
            │           ├─► Delete from Azure Blob Storage
            │           ├─► Delete from GCS (if configured)
            │           └─► Remove from Redis metadata

Storage Layers

Layer 1: Cloud Storage (File Handler)

Azure Blob Storage (Primary)

Container: Single container (configured via AZURE_STORAGE_CONTAINER_NAME)
Naming: UUID-based filenames
Organization: By requestId folders
Access: SAS tokens (long-lived and short-lived)
Tags: Blob index tags for retention (retention=temporary or retention=permanent)
Lifecycle: Azure automatically deletes retention=temporary files after 30 days

Google Cloud Storage (Optional)

Enabled: If GCP_SERVICE_ACCOUNT_KEY configured
URL Format: gs://bucket/path
Usage: Media file chunks, converted files
No short-lived URLs: GCS URLs are permanent (no SAS equivalent)

Local Storage (Fallback)

Used: If Azure not configured
Served: Via HTTP on configured port

Layer 2: Redis Metadata (File Handler)

Purpose: Fast hash lookups, file metadata caching

Key Format:

Unscoped: <hash>
Context-scoped: <hash>:ctx:<contextId>
Legacy (migrated): <hash>:<containerName> (auto-migrated on read)

Data Stored:

{
  url: "https://storage.../file.pdf?long-lived-sas",
  shortLivedUrl: "https://storage.../file.pdf?short-lived-sas",
  gcs: "gs://bucket/file.pdf",
  hash: "abc123",
  filename: "file.pdf",
  timestamp: "2024-01-15T10:00:00.000Z",
  converted: {
    url: "https://storage.../converted.csv",
    gcs: "gs://bucket/converted.csv"
  }
}

Layer 3: File Collection (Cortex Redis Hash Maps)

Purpose: User-facing file collections with metadata

Storage: Redis hash maps (FileStoreMap:ctx:<contextId>)

Format:

// Redis Hash Map Structure:
// Key: FileStoreMap:ctx:<contextId>
// Value: Hash map where each entry is {hash: fileDataJSON}

// Example hash map entry:
{
  "abc123": JSON.stringify({
    // CFH-managed fields
    url: "https://storage.../file.pdf",
    gcs: "gs://bucket/file.pdf",
    filename: "uuid-based-name.pdf",
    
    // Cortex-managed fields
    id: "1736966400000-abc123",
    displayFilename: "user-friendly-name.pdf",
    mimeType: "application/pdf",
    tags: ["pdf", "report"],
    notes: "Quarterly report",
    hash: "abc123",
    permanent: true,
    addedDate: "2024-01-15T10:00:00.000Z",
    lastAccessed: "2024-01-15T10:00:00.000Z"
  })
}

Features:

Atomic operations (Redis HSET/HDEL/HGET are thread-safe)
In-memory caching (5-second TTL)
Direct hash map access (no versioning needed)
Context-scoped isolation (FileStoreMap:ctx:<contextId>)

Key Concepts

1. Context Scoping (`agentContext`)

Purpose: Per-user/per-context file isolation with optional cross-context reading

Usage:

agentContext: Array of context objects, each with:
- contextId: Context identifier (required)
- contextKey: Encryption key for this context (optional, null for unencrypted)
- default: Boolean indicating the default context for write operations (required)
Stored in Redis with scoped keys: FileStoreMap:ctx:<contextId>

Benefits:

Prevents hash collisions between users
Enables per-user file management
Supports multi-tenant applications
Multiple contexts allow reading files from secondary contexts (e.g., workspace files)
Separate encryption keys allow user-encrypted files alongside unencrypted shared workspace files
Centralized context management (single parameter instead of multiple)

Example:

// Upload with contextId (from default context)
const agentContext = [
    { contextId: "user-123", contextKey: userContextKey, default: true }
];
await uploadFileToCloud(fileBuffer, mimeType, filename, resolver, agentContext[0].contextId);

// Check hash with contextId
await checkHashExists(hash, fileHandlerUrl, null, agentContext[0].contextId);

// Delete with contextId
await deleteFileByHash(hash, resolver, agentContext[0].contextId);

// Load merged collection (reads from both contexts)
// User context is encrypted (userContextKey), workspace is not (null)
const agentContext = [
    { contextId: "user-123", contextKey: userContextKey, default: true },
    { contextId: "workspace-456", contextKey: null, default: false }  // Shared workspace, unencrypted
];
const collection = await loadMergedFileCollection(agentContext);

// Resolve file from any context in agentContext
const url = await resolveFileParameter("file.pdf", agentContext);

agentContext Behavior:

Files are read from all contexts in the array (union)
Each context uses its own encryption key (contextKey)
Shared workspaces typically use contextKey: null (unencrypted) since they're shared between users
Writes/updates only go to the context marked as default: true, using its contextKey
Deduplication: if a file exists in multiple contexts (same hash), the first context takes precedence
Files from non-default contexts bypass inCollection filtering (all files accessible)
The default context is used for all write operations (uploads, updates, deletions)

agentContext Security Note:

agentContext allows reading files from multiple contexts, including files that bypass inCollection filtering
Important: agentContext should be treated as a privileged, server-derived value
Server-side authorization MUST verify that any contexts in agentContext are restricted to trusted, same-tenant contexts (e.g., derived from workspace membership) before use
Never accept agentContext directly from untrusted client inputs without validation
Only the default context should be used for write operations - non-default contexts are read-only

2. Permanent Files (`permanent` flag)

Purpose: Indicate files that should be kept indefinitely

Storage:

Stored in file collection entry: permanent: true
Sets blob index tag: retention=permanent
Prevents deletion from cloud storage

Usage:

// Add permanent file
await addFileToCollection(
    contextId, contextKey, url, gcs, filename, tags, notes, hash,
    null, resolver, true  // permanent=true
);

// Check before deletion
if (!file.permanent) {
    await deleteFileByHash(file.hash, resolver, contextId);
}

Behavior:

Permanent files are not deleted from cloud storage when removed from collection
Retention set via setRetentionForHash() (best-effort)
Default: permanent=false (temporary, 30-day retention)

3. Hash Deduplication

Purpose: Avoid storing duplicate files

Process:

Compute xxhash64 hash of file content
Check if hash exists via checkHashExists()
If exists, return existing URLs (no upload)
If not, upload and store hash

Benefits:

Saves storage space
Faster uploads (skip if duplicate)
Consistent file references

4. Short-Lived URLs

Purpose: Secure, time-limited file access

Features:

5-minute expiration (configurable)
Always included in file handler responses
Preferred for LLM file access
Automatically generated on checkHash operations

Usage:

// Resolve to short-lived URL
const fileWithShortLivedUrl = await ensureShortLivedUrl(
    fileObject, fileHandlerUrl, contextId, 5  // 5 minutes
);
// fileWithShortLivedUrl.url is now short-lived URL

5. Atomic Operations

Purpose: Ensure thread-safe collection modifications

Process:

Redis hash map operations (HSET, HDEL, HGET) are atomic
No version-based locking needed
Direct hash map updates per file (not full collection replacement)

Functions:

addFileToCollection(): Atomic HSET operation
updateFileMetadata(): Atomic HSET operation (updates single file)
loadFileCollection(): Atomic HGETALL operation
File removal: Atomic HDEL operation

Benefits:

No version conflicts (each file updated independently)
Faster operations (no retry loops)
Simpler code (no locking logic needed)

Complete Function Reference

File Handler Operations

`buildFileHandlerUrl(baseUrl, params)`

Builds file handler URL with query parameters.

Parameters:
- baseUrl: File handler service URL
- params: Object with query parameters (null/undefined skipped)
Returns: Complete URL with encoded parameters
Used by: All file handler operations

`fetchFileFromUrl(fileUrl, requestId, contextId, save)`

Downloads and processes file from URL.

Parameters:
- fileUrl: URL to fetch
- requestId: Request ID for tracking
- contextId: Optional context ID
- save: Whether to save converted file (default: false)
Returns: Response data (object or array)
Used by: azureVideoTranslatePlugin, azureCognitivePlugin

`uploadFileToCloud(fileInput, mimeType, filename, pathwayResolver, contextId)`

Uploads file to cloud storage with deduplication.

Parameters:
- fileInput: URL string, base64 string, or Buffer
- mimeType: MIME type (optional)
- filename: Filename (optional, inferred if not provided)
- pathwayResolver: Optional resolver for logging
- contextId: Optional context ID for scoping
Returns: {url, gcs, hash}
Process:
1. Converts input to Buffer
2. Computes hash
3. Checks if exists (deduplication)
4. Uploads if not exists
Used by: All tools that upload files

`checkHashExists(hash, fileHandlerUrl, pathwayResolver, contextId, shortLivedMinutes)`

Checks if file exists by hash.

Parameters:
- hash: File hash
- fileHandlerUrl: File handler URL
- pathwayResolver: Optional resolver for logging
- contextId: Optional context ID
- shortLivedMinutes: URL expiration (default: 5)
Returns: {url, gcs, hash, filename} or null
Used by: Upload deduplication, file resolution

`deleteFileByHash(hash, pathwayResolver, contextId)`

Deletes file from cloud storage.

Parameters:
- hash: File hash
- pathwayResolver: Optional resolver for logging
- contextId: Optional context ID
Returns: true if deleted, false if not found
Handles: 404 gracefully (file already deleted)

`setRetentionForHash(hash, retention, contextId, pathwayResolver)`

Sets file retention (temporary or permanent).

Parameters:
- hash: File hash
- retention: 'temporary' or 'permanent'
- contextId: Optional context ID
- pathwayResolver: Optional resolver for logging
Returns: Response data or null
Used by: addFileToCollection when permanent=true

`ensureShortLivedUrl(fileObject, fileHandlerUrl, contextId, shortLivedMinutes)`

Resolves file to use short-lived URL.

Parameters:
- fileObject: File object with hash and url
- fileHandlerUrl: File handler URL
- contextId: Optional context ID
- shortLivedMinutes: URL expiration (default: 5)
Returns: File object with url updated to short-lived URL
Used by: Tools that send files to LLMs

`getMediaChunks(file, requestId, contextId)`

Gets chunked media file URLs.

Parameters:
- file: File URL
- requestId: Request ID
- contextId: Optional context ID
Returns: Array of chunk URLs

`markCompletedForCleanUp(requestId, contextId)`

Marks request as completed for cleanup.

Parameters:
- requestId: Request ID
- contextId: Optional context ID
Returns: Response data or null

File Collection Operations

`loadFileCollection(contextId, contextKey, useCache)`

Loads file collection from Redis hash map.

Parameters:
- contextId: Context ID (required)
- contextKey: Optional encryption key
- useCache: Whether to use cache (default: true)
Returns: Array of file entries (sorted by lastAccessed, most recent first)
Process:
1. Checks in-memory cache (5-second TTL)
2. Loads from Redis hash map FileStoreMap:ctx:<contextId>
3. Filters by inCollection (only returns global files or chat-specific files)
4. Converts hash map entries to array format
5. Updates cache
Used by: Primary file collection operations

`loadFileCollectionAll(contextId, contextKey)`

Loads ALL files from a context, bypassing inCollection filtering.

Parameters:
- contextId: Context ID (required)
- contextKey: Optional encryption key
Returns: Array of all file entries (no filtering)
Used by: loadMergedFileCollection when loading files from all contexts

`loadMergedFileCollection(agentContext)`

Loads merged file collection from one or more contexts.

Parameters:
- agentContext: Array of context objects, each with { contextId, contextKey, default } (required)
Returns: Array of file entries from all contexts (deduplicated by hash/url/gcs)
Process:
1. Loads first context collection via loadFileCollectionAll() with its contextKey
2. Tags each file with _contextId (internal, stripped before returning to callers)
3. For each additional context, loads collection via loadFileCollectionAll() with its contextKey
4. Deduplicates: earlier contexts take precedence if same file exists in multiple
5. Returns merged collection (with _contextId stripped before returning)
Used by: syncAndStripFilesFromChatHistory, getAvailableFiles, resolveFileParameter, file tools

`saveFileCollection(contextId, contextKey, collection)`

Saves file collection to Redis hash map (optimized - only updates changed entries).

Parameters:
- contextId: Context ID
- contextKey: Optional encryption key (unused, kept for compatibility)
- collection: Array of file entries
Returns: true if successful, false on error
Process:
1. Compares each file with current state
2. Only updates files that changed (optimized)
3. Uses atomic HSET operations per file
4. Preserves CFH-managed fields, updates Cortex-managed fields
Used by: Tools that need to save multiple file changes

`updateFileMetadata(contextId, hash, metadata)`

Updates Cortex-managed metadata fields atomically.

Parameters:
- contextId: Context ID (required)
- hash: File hash (used as Redis key)
- metadata: Object with fields to update (displayFilename, tags, notes, mimeType, addedDate, lastAccessed, permanent)
Returns: true if successful, false on error
Process:
1. Loads existing file data from Redis
2. Merges metadata (preserves CFH fields, updates Cortex fields)
3. Writes back via atomic HSET
4. Invalidates cache
Used by: Search operations (updates lastAccessed), EditFile (updates URL/hash)

`addFileToCollection(contextId, contextKey, url, gcs, filename, tags, notes, hash, fileUrl, pathwayResolver, permanent)`

Adds file to collection via atomic operation.

Parameters:
- contextId: Context ID (required)
- contextKey: Optional encryption key (unused, kept for compatibility)
- url: Azure URL (optional if fileUrl provided)
- gcs: GCS URL (optional)
- filename: User-friendly filename (required)
- tags: Array of tags (optional)
- notes: Notes string (optional)
- hash: File hash (optional, computed if not provided)
- fileUrl: URL to upload (optional, uploads if provided)
- pathwayResolver: Optional resolver for logging
- permanent: Whether file is permanent (default: false)
Returns: File entry object with id
Process:
1. If fileUrl provided, uploads file first via uploadFileToCloud()
2. If permanent=true, sets retention to permanent via setRetentionForHash()
3. Creates file entry with displayFilename (user-friendly name)
4. Writes to Redis hash map via atomic HSET
5. Merges with existing CFH data if hash already exists
Used by: WriteFile, Image tools, FileCollection tool

`syncAndStripFilesFromChatHistory(chatHistory, agentContext)`

Processes chat history files based on collection membership.

Parameters:
- chatHistory: Chat history array to process
- agentContext: Array of context objects, each with { contextId, contextKey, default } (required)
Returns: { chatHistory, availableFiles } - processed chat history and formatted file list
Process:
1. Loads merged file collection from all contexts in agentContext
2. For each file in chat history:
  - If in collection: strip from message, update lastAccessed and inCollection in owning context (using that context's key)
  - If not in collection: leave in message as-is
3. Returns processed history and available files string
4. Uses atomic operations per file, updating the context that owns each file (identified by _contextId tag)
Used by: sys_entity_agent to process incoming chat history

File Resolution

`resolveFileParameter(fileParam, agentContext, options)`

Resolves file parameter to file URL.

Parameters:
- fileParam: File ID, filename, URL, or hash
- agentContext: Array of context objects, each with { contextId, contextKey, default } (required)
- options: Optional options object:
  - preferGcs: Boolean - prefer GCS URL over Azure URL
  - useCache: Boolean - use cache (default: true)
Returns: File URL string (Azure or GCS) or null if not found
Matching (via findFileInCollection()):
- Exact ID match
- Exact hash match
- Exact URL match (Azure or GCS)
- Exact filename match (case-insensitive, basename comparison)
- Fuzzy filename match (contains, minimum 4 characters)
Process:
1. Loads merged file collection from all contexts in agentContext
2. Searches merged collection for matching file
3. Returns file URL if found
Used by: ReadFile, EditFile, and other tools that need file URLs

`findFileInCollection(fileParam, collection)`

Finds file in collection array.

Parameters:
- fileParam: File identifier
- collection: Collection array
Returns: File entry or null
Used by: resolveFileParameter

`generateFileMessageContent(fileParam, agentContext)`

Generates file content for LLM messages.

Parameters:
- fileParam: File identifier (ID, filename, URL, or hash)
- agentContext: Array of context objects, each with { contextId, contextKey, default } (required)
Returns: File content object with type, url, gcs, hash or null
Process:
1. Loads merged file collection from all contexts in agentContext
2. Finds file in merged collection via findFileInCollection()
3. Resolves to short-lived URL via ensureShortLivedUrl() using default context
4. Returns OpenAI-compatible format: {type: 'image_url', url, gcs, hash}
Used by: AnalyzeFile tool to inject files into chat history

`extractFilesFromChatHistory(chatHistory)`

Extracts file metadata from chat history messages.

Parameters:
- chatHistory: Chat history array to scan
Returns: Array of file metadata objects {url, gcs, hash, type}
Process:
1. Scans all messages for file content objects
2. Extracts from image_url, file, or direct URL objects
3. Returns normalized format
Used by: File extraction utilities

`getAvailableFiles(chatHistory, agentContext)`

Gets formatted list of available files from collection.

Parameters:
- chatHistory: Unused (kept for API compatibility)
- agentContext: Array of context objects, each with { contextId, contextKey, default } (required)
Returns: Formatted string of available files (last 10 most recent)
Process:
1. Loads merged file collection from all contexts in agentContext
2. Formats files via formatFilesForTemplate()
3. Returns compact one-line format per file
Used by: Template rendering to show available files

Utility Functions

`getDefaultContext(agentContext)`

Helper function to extract the default context from an agentContext array.

Parameters:
- agentContext: Array of context objects, each with { contextId, contextKey, default }
Returns: Context object with default: true, or first context if none marked as default, or null if array is empty
Used by: Functions that need to determine which context to use for write operations

`computeFileHash(filePath)`

Computes xxhash64 hash of file.

Returns: Hash string (hex)

`computeBufferHash(buffer)`

Computes xxhash64 hash of buffer.

Returns: Hash string (hex)

`extractFilenameFromUrl(url, gcs)`

Extracts filename from URL (prefers GCS).

Returns: Filename string

`ensureFilenameExtension(filename, mimeType)`

Ensures filename has correct extension based on MIME type.

Returns: Filename with correct extension

`determineMimeTypeFromUrl(url, gcs, filename)`

Determines MIME type from URL or filename.

Returns: MIME type string

`isTextMimeType(mimeType)`

Checks if MIME type is text-based.

Parameters:
- mimeType: MIME type string to check
Returns: Boolean (true if text-based)
Supports: All text/* types, plus application types like JSON, JavaScript, XML, YAML, Python, etc.
Used by: ReadFile, EditFile to validate file types

`getMimeTypeFromFilename(filenameOrPath, defaultMimeType)`

Gets MIME type from filename or path.

Parameters:
- filenameOrPath: Filename or full file path
- defaultMimeType: Optional default (default: 'application/octet-stream')
Returns: MIME type string
Used by: File upload, file type detection

`getMimeTypeFromExtension(extension, defaultMimeType)`

Gets MIME type from file extension.

Parameters:
- extension: File extension (with or without leading dot)
- defaultMimeType: Optional default (default: 'application/octet-stream')
Returns: MIME type string

Error Handling

File Handler Errors

Network Errors:

Handled gracefully in all functions
Logged via pathwayResolver or logger
Non-critical operations return null instead of throwing

404 Errors:

Treated as "file not found" (not an error)
deleteFileByHash returns false on 404
checkHashExists returns null on 404

Timeout Errors:

Upload: 30 seconds
Check hash: 10 seconds
Fetch file: 60 seconds
Set retention: 15 seconds

File Collection Errors

Missing ContextId:

File collection operations require contextId
Returns null or throws error if missing

Concurrent Modifications:

Prevented by atomic Redis operations (HSET, HDEL are thread-safe)
No version conflicts (each file updated independently)

Invalid File Data:

Invalid JSON entries are skipped during load
Missing required fields are handled gracefully

Best Practices

Always pass contextId when available (strongly recommended for multi-tenant)
Use atomic operations - addFileToCollection(), updateFileMetadata() are thread-safe
Check permanent flag before deleting files from cloud storage
Handle errors gracefully - don't throw on non-critical failures
Use short-lived URLs for LLM file access (via ensureShortLivedUrl())
Check for existing files before uploading (automatic in uploadFileToCloud)
Preserve CFH fields - when updating metadata, preserve url, gcs, filename from file handler
Use displayFilename for user-facing displays (fallback to filename if not set)

Summary

The Cortex file system provides:

✅ Encapsulated file handler interactions - No direct axios calls ✅ Hash-based deduplication - Avoids duplicate storage ✅ Context scoping - Per-user file isolation via FileStoreMap:ctx:<contextId> ✅ Permanent file support - Indefinite retention ✅ Atomic operations - Thread-safe collection modifications via Redis hash maps ✅ Short-lived URLs - Secure file access (5-minute expiration) ✅ Comprehensive error handling - Graceful failure handling ✅ Single API call optimization - Efficient file resolution ✅ Field ownership separation - CFH-managed vs Cortex-managed fields ✅ Chat history integration - Automatic file syncing from conversations

All file operations flow through lib/fileUtils.js, ensuring consistency, maintainability, and proper error handling throughout the system.

Architecture Highlights

File Handler Service: External Azure Function managing cloud storage
File Utilities Layer: Abstraction over file handler (no direct API calls)
File Collection System: Redis hash maps for user file metadata
Atomic Operations: Thread-safe via Redis HSET/HDEL/HGET operations
Context Isolation: Per-context hash maps for multi-tenant support

FilesExpand file tree

FILE_SYSTEM_DOCUMENTATION.md

Latest commit

History

FILE_SYSTEM_DOCUMENTATION.md

File metadata and controls

Cortex File System - Complete Documentation

Table of Contents

Architecture Overview

Key Components

File Handler Service

Configuration

Key Features

1. Single Container Architecture

2. Retention Management

3. Context Scoping

4. Hash-Based Deduplication

5. Short-Lived URLs

API Endpoints

POST /file-handler - Upload File

GET /file-handler - Retrieve/Process File

DELETE /file-handler - Delete File

POST/PUT /file-handler - Set Retention

Cortex File Utilities Layer

Core Functions

URL Building

File Upload

File Retrieval

File Deletion

Retention Management

Short-Lived URL Resolution

Media Chunks

Cleanup

File Collection System

Storage Architecture

Key Features

1. Atomic Operations

2. Caching

3. Field Ownership

Core Functions

Loading

Saving

Metadata Updates

Adding Files

Processing Chat History Files

File Entry Schema

Tools Integration

System Tools That Use Files

1. WriteFile (sys_tool_writefile.js)

2. EditFile (sys_tool_editfile.js)

3. FileCollection (sys_tool_file_collection.js)

4. Image Tools (sys_tool_image.js, sys_tool_image_gemini.js)

5. ReadFile (sys_tool_readfile.js)

6. ViewImage (sys_tool_view_image.js)

7. AnalyzeFile (sys_tool_analyzefile.js)

Plugins That Use Files

1. AzureVideoTranslatePlugin

2. AzureCognitivePlugin

Data Flow Diagrams

File Upload Flow

File Retrieval Flow

File Edit Flow

File Deletion Flow

Storage Layers

Layer 1: Cloud Storage (File Handler)

Azure Blob Storage (Primary)

Google Cloud Storage (Optional)

Local Storage (Fallback)

Layer 2: Redis Metadata (File Handler)

Layer 3: File Collection (Cortex Redis Hash Maps)

Key Concepts

1. Context Scoping (agentContext)

2. Permanent Files (permanent flag)

3. Hash Deduplication

4. Short-Lived URLs

5. Atomic Operations

Complete Function Reference

File Handler Operations

buildFileHandlerUrl(baseUrl, params)

fetchFileFromUrl(fileUrl, requestId, contextId, save)

POST `/file-handler` - Upload File

GET `/file-handler` - Retrieve/Process File

DELETE `/file-handler` - Delete File

POST/PUT `/file-handler` - Set Retention

1. WriteFile (`sys_tool_writefile.js`)

2. EditFile (`sys_tool_editfile.js`)

3. FileCollection (`sys_tool_file_collection.js`)

4. Image Tools (`sys_tool_image.js`, `sys_tool_image_gemini.js`)

5. ReadFile (`sys_tool_readfile.js`)

6. ViewImage (`sys_tool_view_image.js`)

7. AnalyzeFile (`sys_tool_analyzefile.js`)

1. Context Scoping (`agentContext`)

2. Permanent Files (`permanent` flag)

`buildFileHandlerUrl(baseUrl, params)`

`fetchFileFromUrl(fileUrl, requestId, contextId, save)`

`uploadFileToCloud(fileInput, mimeType, filename, pathwayResolver, contextId)`

`checkHashExists(hash, fileHandlerUrl, pathwayResolver, contextId, shortLivedMinutes)`

`deleteFileByHash(hash, pathwayResolver, contextId)`

`setRetentionForHash(hash, retention, contextId, pathwayResolver)`

`ensureShortLivedUrl(fileObject, fileHandlerUrl, contextId, shortLivedMinutes)`

`getMediaChunks(file, requestId, contextId)`

`markCompletedForCleanUp(requestId, contextId)`

`loadFileCollection(contextId, contextKey, useCache)`

`loadFileCollectionAll(contextId, contextKey)`

`loadMergedFileCollection(agentContext)`

`saveFileCollection(contextId, contextKey, collection)`

`updateFileMetadata(contextId, hash, metadata)`

`addFileToCollection(contextId, contextKey, url, gcs, filename, tags, notes, hash, fileUrl, pathwayResolver, permanent)`

`syncAndStripFilesFromChatHistory(chatHistory, agentContext)`

`resolveFileParameter(fileParam, agentContext, options)`

`findFileInCollection(fileParam, collection)`

`generateFileMessageContent(fileParam, agentContext)`

`extractFilesFromChatHistory(chatHistory)`

`getAvailableFiles(chatHistory, agentContext)`

`getDefaultContext(agentContext)`

`computeFileHash(filePath)`

`computeBufferHash(buffer)`

`extractFilenameFromUrl(url, gcs)`

`ensureFilenameExtension(filename, mimeType)`

`determineMimeTypeFromUrl(url, gcs, filename)`

`isTextMimeType(mimeType)`

`getMimeTypeFromFilename(filenameOrPath, defaultMimeType)`

`getMimeTypeFromExtension(extension, defaultMimeType)`