- Architecture Overview
- File Handler Service
- Cortex File Utilities Layer
- File Collection System
- Tools Integration
- Data Flow Diagrams
- Storage Layers
- Key Concepts
- Complete Function Reference
- Error Handling
The Cortex file system is a multi-layered architecture that handles file uploads, storage, retrieval, and management:
┌─────────────────────────────────────────────────────────────┐
│ Cortex Application │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ System Tools & Plugins │ │
│ │ (WriteFile, EditFile, Image, FileCollection, etc.) │ │
│ └──────────────────┬─────────────────────────────────┘ │
│ │ │
│ ┌───────────────────▼─────────────────────────────────┐ │
│ │ lib/fileUtils.js │ │
│ │ (Encapsulated file handler interactions) │ │
│ └───────────────────┬─────────────────────────────────┘ │
│ │ │
│ ┌───────────────────▼─────────────────────────────────┐ │
│ │ File Collection System │ │
│ │ (Redis hash maps: FileStoreMap:ctx:<contextId>) │ │
│ └───────────────────┬─────────────────────────────────┘ │
└───────────────────────┼───────────────────────────────────────┘
│
│ HTTP/HTTPS
│
┌───────────────────────▼───────────────────────────────────────┐
│ Cortex File Handler Service │
│ (External Azure Function - cortex-file-handler) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Azure Blob │ │ GCS │ │ Redis │ │
│ │ Storage │ │ Storage │ │ Metadata │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────────────────────────────────────────────┘
- File Handler Service (
cortex-file-handler): External Azure Function that handles actual file storage - File Utilities (
lib/fileUtils.js): Cortex's abstraction layer over the file handler - File Collection System: Redis-based metadata storage for user file collections
- System Tools: Pathways that use files (WriteFile, EditFile, Image, etc.)
The file handler is an external Azure Function service that manages file storage and processing.
- URL: Configured via
WHISPER_MEDIA_API_URLenvironment variable - Storage Backends: Azure Blob Storage (primary), Google Cloud Storage (optional), Local (fallback)
- All files stored in a single Azure Blob Storage container
- Files distinguished by blob index tags, not separate containers
- No
containerparameter supported - always uses configured container
- Temporary (default): Files tagged with
retention=temporary, auto-deleted after 30 days - Permanent: Files tagged with
retention=permanent, retained indefinitely - Retention changed via
setRetentionoperation (updates blob tag, no file copying)
contextId: Optional parameter for per-user/per-context file isolation- Redis keys:
<hash>:ctx:<contextId>for context-scoped files - Falls back to unscoped keys if context-scoped not found
- Strongly recommended for multi-tenant applications
- Files identified by xxhash64 hash
- Duplicate uploads return existing file URLs
- Hash stored in Redis for fast lookups
- All operations return
shortLivedUrl(5-minute expiration, configurable) - Provides secure, time-limited access
- Preferred for LLM file access
// FormData:
{
file: <FileStream>,
hash: "abc123", // Optional: for deduplication
contextId: "user-456", // Optional: for scoping
requestId: "req-789" // Optional: for tracking
}
// Response:
{
url: "https://storage.../file.pdf?long-lived-sas",
shortLivedUrl: "https://storage.../file.pdf?short-lived-sas",
gcs: "gs://bucket/file.pdf", // If GCS configured
hash: "abc123",
filename: "file.pdf"
}// Query Parameters:
{
hash: "abc123", // Check if file exists
checkHash: true, // Enable hash check
contextId: "user-456", // Optional: for scoping
shortLivedMinutes: 5, // Optional: URL expiration
fetch: "https://example.com/file", // Download from URL
save: true // Save converted document
}
// Response (checkHash):
{
url: "https://storage.../file.pdf",
shortLivedUrl: "https://storage.../file.pdf?short-lived",
gcs: "gs://bucket/file.pdf",
hash: "abc123",
filename: "file.pdf",
converted: { // If file was converted
url: "https://storage.../converted.csv",
gcs: "gs://bucket/converted.csv"
}
}// Query Parameters:
{
hash: "abc123", // Delete by hash
contextId: "user-456", // Optional: for scoping
requestId: "req-789" // Or delete all files for requestId
}// Body:
{
hash: "abc123",
retention: "permanent", // or "temporary"
contextId: "user-456", // Optional: for scoping
setRetention: true
}
// Response:
{
hash: "abc123",
filename: "file.pdf",
retention: "permanent",
url: "https://storage.../file.pdf", // Same URL (tag updated)
shortLivedUrl: "https://storage.../file.pdf?new-sas",
gcs: "gs://bucket/file.pdf"
}Location: lib/fileUtils.js
This is Cortex's abstraction layer that encapsulates all file handler interactions. No direct axios calls to the file handler should exist - all go through these functions.
buildFileHandlerUrl(baseUrl, params)- Handles separator detection (
?vs&) - Properly encodes all parameters
- Skips null/undefined/empty values
- Used by all file handler operations
uploadFileToCloud(fileInput, mimeType, filename, pathwayResolver, contextId)- Input Types: URL string, base64 string, or Buffer
- Process:
- Converts input to Buffer
- Computes xxhash64 hash
- Checks if file exists via
checkHashExists(deduplication) - If exists, returns existing URLs
- If not, uploads via file handler POST
- Returns:
{url, gcs, hash} - ContextId: Passed in formData body (not URL)
checkHashExists(hash, fileHandlerUrl, pathwayResolver, contextId, shortLivedMinutes)- Checks if file exists by hash
- Returns short-lived URL (prefers converted version)
- Returns:
{url, gcs, hash, filename}ornull - Makes single API call (optimized)
fetchFileFromUrl(fileUrl, requestId, contextId, save)- Downloads file from URL via file handler
- Processes based on file type
- Used by:
azureVideoTranslatePlugin,azureCognitivePlugin
deleteFileByHash(hash, pathwayResolver, contextId)- Deletes file from cloud storage
- Handles 404 gracefully (file already deleted)
- Returns:
trueif deleted,falseif not found
setRetentionForHash(hash, retention, contextId, pathwayResolver)- Sets file retention to
'temporary'or'permanent' - Best-effort operation (logs warnings on failure)
- Used by:
addFileToCollectionwhenpermanent=true
ensureShortLivedUrl(fileObject, fileHandlerUrl, contextId, shortLivedMinutes)- Resolves file object to use short-lived URL
- Updates GCS URL if converted version exists
- Used by: Tools that send files to LLMs
getMediaChunks(file, requestId, contextId)- Gets chunked media file URLs
- Used by: Media processing workflows
markCompletedForCleanUp(requestId, contextId)- Marks request as completed for cleanup
- Used by:
azureCognitivePlugin
Location: lib/fileUtils.js + pathways/system/entity/tools/sys_tool_file_collection.js
The file collection system stores file metadata in Redis hash maps using atomic operations for concurrent safety. Files are stored directly in Redis hash maps keyed by hash, with context-scoped isolation.
Redis Hash Maps
└── FileStoreMap:ctx:<contextId>
└── Hash Map (hash → fileData JSON)
└── File Entry (JSON):
{
// CFH-managed fields (preserved from file handler)
url: "https://storage.../file.pdf",
gcs: "gs://bucket/file.pdf",
filename: "uuid-based-filename.pdf", // CFH-managed
// Cortex-managed fields (user metadata)
id: "timestamp-random",
displayFilename: "user-friendly-name.pdf", // User-provided name
mimeType: "application/pdf",
tags: ["pdf", "report"],
notes: "Quarterly report",
hash: "abc123",
permanent: true,
addedDate: "2024-01-15T10:00:00.000Z",
lastAccessed: "2024-01-15T10:00:00.000Z"
}
- Uses Redis hash map operations (HSET, HGET, HDEL) which are atomic
- No version-based locking needed - Redis operations are thread-safe
- Direct hash map access:
FileStoreMap:ctx:<contextId>→{hash: fileData}
- In-memory cache with 5-second TTL
- Reduces Redis load for read operations
- Cache invalidated on writes
- CFH-managed fields:
url,gcs,filename(UUID-based, managed by file handler) - Cortex-managed fields:
id,displayFilename,tags,notes,mimeType,permanent,addedDate,lastAccessed - When merging data, CFH fields are preserved, Cortex fields are updated
loadFileCollection(contextId, contextKey, useCache)- Loads collection from Redis hash map
FileStoreMap:ctx:<contextId> - Returns array of file entries (sorted by lastAccessed, most recent first)
- Uses cache if available and fresh (5-second TTL)
- Converts hash map entries to array format
saveFileCollection(contextId, contextKey, collection)- Saves collection to Redis hash map (only updates changed entries)
- Uses atomic HSET operations per file
- Optimized to only write files that actually changed
- Returns
trueif successful,falseon error
updateFileMetadata(contextId, hash, metadata)- Updates Cortex-managed metadata fields atomically
- Preserves all CFH-managed fields
- Updates only specified fields (displayFilename, tags, notes, mimeType, dates, permanent)
- Used for: Updating lastAccessed, modifying tags/notes without full reload
addFileToCollection(contextId, contextKey, url, gcs, filename, tags, notes, hash, fileUrl, pathwayResolver, permanent)- Adds file entry to collection via atomic HSET operation
- If
fileUrlprovided, uploads file first viauploadFileToCloud() - If
permanent=true, sets retention to permanent viasetRetentionForHash() - Merges with existing CFH data if file with same hash already exists
- Returns file entry object with
id
syncAndStripFilesFromChatHistory(chatHistory, contextId, contextKey)- Files IN collection: stripped from message (replaced with placeholder), tools can access them
- Files NOT in collection: left in message as-is (model sees them directly)
- Updates lastAccessed for collection files
- Used by:
sys_entity_agentto process incoming chat history
{
id: string, // Unique ID: "timestamp-random" (Cortex-managed)
url: string, // Azure Blob Storage URL (CFH-managed)
gcs: string | null, // Google Cloud Storage URL (CFH-managed)
filename: string | null, // CFH-managed filename (UUID-based) (CFH-managed)
displayFilename: string | null, // User-friendly filename (Cortex-managed)
mimeType: string | null, // MIME type (Cortex-managed)
tags: string[], // Searchable tags (Cortex-managed)
notes: string, // User notes/description (Cortex-managed)
hash: string, // File hash for deduplication (used as Redis key)
permanent: boolean, // Whether file is permanent (Cortex-managed)
addedDate: string, // ISO timestamp when added (Cortex-managed)
lastAccessed: string // ISO timestamp of last access (Cortex-managed)
}Field Ownership Notes:
filename: Managed by CFH, UUID-based storage filenamedisplayFilename: Managed by Cortex, user-provided friendly name- When displaying files, prefer
displayFilenamewith fallback tofilename
Flow:
- User provides content and filename
- Creates Buffer from content
- Calls
uploadFileToCloud()withcontextId - Calls
addFileToCollection()withpermanent=true - Returns file info with
fileId
Key Code:
const uploadResult = await uploadFileToCloud(
fileBuffer, mimeType, filename, resolver, contextId
);
const fileEntry = await addFileToCollection(
contextId, contextKey, uploadResult.url, uploadResult.gcs,
filename, tags, notes, uploadResult.hash, null, resolver, true
);Flow:
- User provides file identifier and modification
- Resolves file via
resolveFileParameter()→ finds in collection - Downloads file content via
axios.get(file.url) - Modifies content (line replacement or search/replace)
- Uploads modified file via
uploadFileToCloud()(creates new hash) - Updates collection entry atomically via
updateFileMetadata()with new URL/hash - Deletes old file version (if not permanent) via
deleteFileByHash()
Key Code:
const foundFile = await resolveFileParameter(fileParam, contextId, contextKey);
const oldHash = foundFile.hash;
const uploadResult = await uploadFileToCloud(
fileBuffer, mimeType, filename, resolver, contextId
);
// Update file entry atomically (preserves CFH data, updates Cortex metadata)
await updateFileMetadata(contextId, foundFile.hash, {
url: uploadResult.url,
gcs: uploadResult.gcs,
hash: uploadResult.hash
});
if (!foundFile.permanent) {
await deleteFileByHash(oldHash, resolver, contextId);
}Tools:
AddFileToCollection: Adds file to collection (with optional upload)SearchFileCollection: Searches files by filename, tags, notesListFileCollection: Lists all files with filtering/sortingRemoveFileFromCollection: Removes files (deletes from cloud if not permanent)
Key Code:
// Add file
await addFileToCollection(contextId, contextKey, url, gcs, filename, tags, notes, hash, fileUrl, resolver, permanent);
// Remove file (with permanent check)
if (!fileInfo.permanent) {
await deleteFileByHash(fileInfo.hash, resolver, contextId);
}Flow:
- Generates/modifies image
- Gets image URL
- Uploads via
uploadFileToCloud() - Adds to collection with
permanent=true
Flow:
- Resolves file via
resolveFileParameter()→ finds in collection - Downloads file content via
axios.get(file.url) - Validates file is text-based via
isTextMimeType() - Returns content with line/character range support
Flow:
- Finds file in collection
- Resolves to short-lived URL via
ensureShortLivedUrl() - Returns image URL for display
Flow:
- Extracts files from chat history via
extractFilesFromChatHistory() - Generates file message content via
generateFileMessageContent() - Injects files into chat history via
injectFileIntoChatHistory() - Uses Gemini Vision model to analyze files
Flow:
- Receives video URL
- If not from Azure storage, uploads via
fetchFileFromUrl() - Uses uploaded URL for video translation
Key Code:
const response = await fetchFileFromUrl(videoUrl, this.requestId, contextId, false);
const resultUrl = Array.isArray(response) ? response[0] : response.url;Flow:
- Receives file for indexing
- If not text file, converts via
fetchFileFromUrl()withsave=true - Uses converted text file for indexing
- Marks completed via
markCompletedForCleanUp()
Key Code:
const data = await fetchFileFromUrl(file, requestId, contextId, true);
url = Array.isArray(data) ? data[0] : data.url;User/LLM Request
│
▼
System Tool (WriteFile, Image, etc.)
│
▼
uploadFileToCloud()
│
├─► Convert input to Buffer
├─► Compute xxhash64 hash
├─► checkHashExists() ──► File Handler GET /file-handler?checkHash=true
│ │
│ ├─► File exists? ──► Return existing URLs
│ │
│ └─► File not found ──► Continue
│
└─► Upload via POST ──► File Handler POST /file-handler
│ │
│ ├─► Store in Azure Blob Storage
│ ├─► Store in GCS (if configured)
│ ├─► Store metadata in Redis
│ └─► Return {url, gcs, hash, shortLivedUrl}
│
└─► addFileToCollection()
│
├─► If permanent=true ──► setRetentionForHash() ──► File Handler POST /file-handler?setRetention=true
│
└─► Save to Redis hash map (atomic operation)
│
└─► Redis HSET FileStoreMap:ctx:<contextId> <hash> <fileData>
│
├─► Merge with existing CFH data (if hash exists)
├─► Preserve CFH fields (url, gcs, filename)
└─► Update Cortex fields (displayFilename, tags, notes, etc.)
User/LLM Request (e.g., "view file.pdf")
│
▼
System Tool (ViewImage, ReadFile, etc.)
│
▼
resolveFileParameter()
│
├─► Find in collection via findFileInCollection()
│ │
│ └─► Matches by: ID, filename, hash, URL, or fuzzy filename
│
└─► ensureShortLivedUrl()
│
└─► checkHashExists() ──► File Handler GET /file-handler?checkHash=true&shortLivedMinutes=5
│ │
│ ├─► Check Redis for hash metadata
│ ├─► Generate short-lived SAS token
│ └─► Return {url, gcs, hash, filename, shortLivedUrl}
│
└─► Return file object with shortLivedUrl
User/LLM Request (e.g., "edit file.txt, replace line 5")
│
▼
EditFile Tool
│
├─► resolveFileParameter() ──► Find file in collection
│
├─► Download file content ──► axios.get(file.url)
│
├─► Modify content (line replacement or search/replace)
│
├─► uploadFileToCloud() ──► Upload modified file
│ │
│ └─► Returns new {url, gcs, hash}
│
└─► updateFileMetadata() ──► Redis HSET (atomic update)
│
├─► Preserve CFH fields (url, gcs, filename)
├─► Update Cortex fields (url, gcs, hash)
└─► If update succeeds:
└─► Delete old file (if not permanent)
└─► deleteFileByHash() ──► File Handler DELETE /file-handler?hash=oldHash
User/LLM Request (e.g., "remove file.pdf from collection")
│
▼
RemoveFileFromCollection Tool
│
├─► Load collection ──► findFileInCollection() for each fileId
│
├─► Capture file info (hash, permanent) from collection
│
└─► Redis HDEL FileStoreMap:ctx:<contextId> <hash> (atomic deletion)
│
└─► Async deletion (fire and forget)
│
├─► For each file:
│ │
│ ├─► If permanent=true ──► Skip deletion (keep in cloud)
│ │
│ └─► If permanent=false ──► deleteFileByHash()
│ │
│ └─► File Handler DELETE /file-handler?hash=hash&contextId=contextId
│ │
│ ├─► Delete from Azure Blob Storage
│ ├─► Delete from GCS (if configured)
│ └─► Remove from Redis metadata
- Container: Single container (configured via
AZURE_STORAGE_CONTAINER_NAME) - Naming: UUID-based filenames
- Organization: By
requestIdfolders - Access: SAS tokens (long-lived and short-lived)
- Tags: Blob index tags for retention (
retention=temporaryorretention=permanent) - Lifecycle: Azure automatically deletes
retention=temporaryfiles after 30 days
- Enabled: If
GCP_SERVICE_ACCOUNT_KEYconfigured - URL Format:
gs://bucket/path - Usage: Media file chunks, converted files
- No short-lived URLs: GCS URLs are permanent (no SAS equivalent)
- Used: If Azure not configured
- Served: Via HTTP on configured port
Purpose: Fast hash lookups, file metadata caching
Key Format:
- Unscoped:
<hash> - Context-scoped:
<hash>:ctx:<contextId> - Legacy (migrated):
<hash>:<containerName>(auto-migrated on read)
Data Stored:
{
url: "https://storage.../file.pdf?long-lived-sas",
shortLivedUrl: "https://storage.../file.pdf?short-lived-sas",
gcs: "gs://bucket/file.pdf",
hash: "abc123",
filename: "file.pdf",
timestamp: "2024-01-15T10:00:00.000Z",
converted: {
url: "https://storage.../converted.csv",
gcs: "gs://bucket/converted.csv"
}
}Purpose: User-facing file collections with metadata
Storage: Redis hash maps (FileStoreMap:ctx:<contextId>)
Format:
// Redis Hash Map Structure:
// Key: FileStoreMap:ctx:<contextId>
// Value: Hash map where each entry is {hash: fileDataJSON}
// Example hash map entry:
{
"abc123": JSON.stringify({
// CFH-managed fields
url: "https://storage.../file.pdf",
gcs: "gs://bucket/file.pdf",
filename: "uuid-based-name.pdf",
// Cortex-managed fields
id: "1736966400000-abc123",
displayFilename: "user-friendly-name.pdf",
mimeType: "application/pdf",
tags: ["pdf", "report"],
notes: "Quarterly report",
hash: "abc123",
permanent: true,
addedDate: "2024-01-15T10:00:00.000Z",
lastAccessed: "2024-01-15T10:00:00.000Z"
})
}Features:
- Atomic operations (Redis HSET/HDEL/HGET are thread-safe)
- In-memory caching (5-second TTL)
- Direct hash map access (no versioning needed)
- Context-scoped isolation (
FileStoreMap:ctx:<contextId>)
Purpose: Per-user/per-context file isolation with optional cross-context reading
Usage:
agentContext: Array of context objects, each with:contextId: Context identifier (required)contextKey: Encryption key for this context (optional,nullfor unencrypted)default: Boolean indicating the default context for write operations (required)
- Stored in Redis with scoped keys:
FileStoreMap:ctx:<contextId>
Benefits:
- Prevents hash collisions between users
- Enables per-user file management
- Supports multi-tenant applications
- Multiple contexts allow reading files from secondary contexts (e.g., workspace files)
- Separate encryption keys allow user-encrypted files alongside unencrypted shared workspace files
- Centralized context management (single parameter instead of multiple)
Example:
// Upload with contextId (from default context)
const agentContext = [
{ contextId: "user-123", contextKey: userContextKey, default: true }
];
await uploadFileToCloud(fileBuffer, mimeType, filename, resolver, agentContext[0].contextId);
// Check hash with contextId
await checkHashExists(hash, fileHandlerUrl, null, agentContext[0].contextId);
// Delete with contextId
await deleteFileByHash(hash, resolver, agentContext[0].contextId);
// Load merged collection (reads from both contexts)
// User context is encrypted (userContextKey), workspace is not (null)
const agentContext = [
{ contextId: "user-123", contextKey: userContextKey, default: true },
{ contextId: "workspace-456", contextKey: null, default: false } // Shared workspace, unencrypted
];
const collection = await loadMergedFileCollection(agentContext);
// Resolve file from any context in agentContext
const url = await resolveFileParameter("file.pdf", agentContext);agentContext Behavior:
- Files are read from all contexts in the array (union)
- Each context uses its own encryption key (
contextKey) - Shared workspaces typically use
contextKey: null(unencrypted) since they're shared between users - Writes/updates only go to the context marked as
default: true, using itscontextKey - Deduplication: if a file exists in multiple contexts (same hash), the first context takes precedence
- Files from non-default contexts bypass
inCollectionfiltering (all files accessible) - The default context is used for all write operations (uploads, updates, deletions)
agentContext Security Note:
agentContextallows reading files from multiple contexts, including files that bypassinCollectionfiltering- Important:
agentContextshould be treated as a privileged, server-derived value - Server-side authorization MUST verify that any contexts in
agentContextare restricted to trusted, same-tenant contexts (e.g., derived from workspace membership) before use - Never accept
agentContextdirectly from untrusted client inputs without validation - Only the default context should be used for write operations - non-default contexts are read-only
Purpose: Indicate files that should be kept indefinitely
Storage:
- Stored in file collection entry:
permanent: true - Sets blob index tag:
retention=permanent - Prevents deletion from cloud storage
Usage:
// Add permanent file
await addFileToCollection(
contextId, contextKey, url, gcs, filename, tags, notes, hash,
null, resolver, true // permanent=true
);
// Check before deletion
if (!file.permanent) {
await deleteFileByHash(file.hash, resolver, contextId);
}Behavior:
- Permanent files are not deleted from cloud storage when removed from collection
- Retention set via
setRetentionForHash()(best-effort) - Default:
permanent=false(temporary, 30-day retention)
Purpose: Avoid storing duplicate files
Process:
- Compute xxhash64 hash of file content
- Check if hash exists via
checkHashExists() - If exists, return existing URLs (no upload)
- If not, upload and store hash
Benefits:
- Saves storage space
- Faster uploads (skip if duplicate)
- Consistent file references
Purpose: Secure, time-limited file access
Features:
- 5-minute expiration (configurable)
- Always included in file handler responses
- Preferred for LLM file access
- Automatically generated on
checkHashoperations
Usage:
// Resolve to short-lived URL
const fileWithShortLivedUrl = await ensureShortLivedUrl(
fileObject, fileHandlerUrl, contextId, 5 // 5 minutes
);
// fileWithShortLivedUrl.url is now short-lived URLPurpose: Ensure thread-safe collection modifications
Process:
- Redis hash map operations (HSET, HDEL, HGET) are atomic
- No version-based locking needed
- Direct hash map updates per file (not full collection replacement)
Functions:
addFileToCollection(): Atomic HSET operationupdateFileMetadata(): Atomic HSET operation (updates single file)loadFileCollection(): Atomic HGETALL operation- File removal: Atomic HDEL operation
Benefits:
- No version conflicts (each file updated independently)
- Faster operations (no retry loops)
- Simpler code (no locking logic needed)
Builds file handler URL with query parameters.
- Parameters:
baseUrl: File handler service URLparams: Object with query parameters (null/undefined skipped)
- Returns: Complete URL with encoded parameters
- Used by: All file handler operations
Downloads and processes file from URL.
- Parameters:
fileUrl: URL to fetchrequestId: Request ID for trackingcontextId: Optional context IDsave: Whether to save converted file (default: false)
- Returns: Response data (object or array)
- Used by:
azureVideoTranslatePlugin,azureCognitivePlugin
Uploads file to cloud storage with deduplication.
- Parameters:
fileInput: URL string, base64 string, or BuffermimeType: MIME type (optional)filename: Filename (optional, inferred if not provided)pathwayResolver: Optional resolver for loggingcontextId: Optional context ID for scoping
- Returns:
{url, gcs, hash} - Process:
- Converts input to Buffer
- Computes hash
- Checks if exists (deduplication)
- Uploads if not exists
- Used by: All tools that upload files
Checks if file exists by hash.
- Parameters:
hash: File hashfileHandlerUrl: File handler URLpathwayResolver: Optional resolver for loggingcontextId: Optional context IDshortLivedMinutes: URL expiration (default: 5)
- Returns:
{url, gcs, hash, filename}ornull - Used by: Upload deduplication, file resolution
Deletes file from cloud storage.
- Parameters:
hash: File hashpathwayResolver: Optional resolver for loggingcontextId: Optional context ID
- Returns:
trueif deleted,falseif not found - Handles: 404 gracefully (file already deleted)
Sets file retention (temporary or permanent).
- Parameters:
hash: File hashretention:'temporary'or'permanent'contextId: Optional context IDpathwayResolver: Optional resolver for logging
- Returns: Response data or
null - Used by:
addFileToCollectionwhenpermanent=true
Resolves file to use short-lived URL.
- Parameters:
fileObject: File object withhashandurlfileHandlerUrl: File handler URLcontextId: Optional context IDshortLivedMinutes: URL expiration (default: 5)
- Returns: File object with
urlupdated to short-lived URL - Used by: Tools that send files to LLMs
Gets chunked media file URLs.
- Parameters:
file: File URLrequestId: Request IDcontextId: Optional context ID
- Returns: Array of chunk URLs
Marks request as completed for cleanup.
- Parameters:
requestId: Request IDcontextId: Optional context ID
- Returns: Response data or
null
Loads file collection from Redis hash map.
- Parameters:
contextId: Context ID (required)contextKey: Optional encryption keyuseCache: Whether to use cache (default: true)
- Returns: Array of file entries (sorted by lastAccessed, most recent first)
- Process:
- Checks in-memory cache (5-second TTL)
- Loads from Redis hash map
FileStoreMap:ctx:<contextId> - Filters by
inCollection(only returns global files or chat-specific files) - Converts hash map entries to array format
- Updates cache
- Used by: Primary file collection operations
Loads ALL files from a context, bypassing inCollection filtering.
- Parameters:
contextId: Context ID (required)contextKey: Optional encryption key
- Returns: Array of all file entries (no filtering)
- Used by:
loadMergedFileCollectionwhen loading files from all contexts
Loads merged file collection from one or more contexts.
- Parameters:
agentContext: Array of context objects, each with{ contextId, contextKey, default }(required)
- Returns: Array of file entries from all contexts (deduplicated by hash/url/gcs)
- Process:
- Loads first context collection via
loadFileCollectionAll()with itscontextKey - Tags each file with
_contextId(internal, stripped before returning to callers) - For each additional context, loads collection via
loadFileCollectionAll()with itscontextKey - Deduplicates: earlier contexts take precedence if same file exists in multiple
- Returns merged collection (with
_contextIdstripped before returning)
- Loads first context collection via
- Used by:
syncAndStripFilesFromChatHistory,getAvailableFiles,resolveFileParameter, file tools
Saves file collection to Redis hash map (optimized - only updates changed entries).
- Parameters:
contextId: Context IDcontextKey: Optional encryption key (unused, kept for compatibility)collection: Array of file entries
- Returns:
trueif successful,falseon error - Process:
- Compares each file with current state
- Only updates files that changed (optimized)
- Uses atomic HSET operations per file
- Preserves CFH-managed fields, updates Cortex-managed fields
- Used by: Tools that need to save multiple file changes
Updates Cortex-managed metadata fields atomically.
- Parameters:
contextId: Context ID (required)hash: File hash (used as Redis key)metadata: Object with fields to update (displayFilename, tags, notes, mimeType, addedDate, lastAccessed, permanent)
- Returns:
trueif successful,falseon error - Process:
- Loads existing file data from Redis
- Merges metadata (preserves CFH fields, updates Cortex fields)
- Writes back via atomic HSET
- Invalidates cache
- Used by: Search operations (updates lastAccessed), EditFile (updates URL/hash)
addFileToCollection(contextId, contextKey, url, gcs, filename, tags, notes, hash, fileUrl, pathwayResolver, permanent)
Adds file to collection via atomic operation.
- Parameters:
contextId: Context ID (required)contextKey: Optional encryption key (unused, kept for compatibility)url: Azure URL (optional if fileUrl provided)gcs: GCS URL (optional)filename: User-friendly filename (required)tags: Array of tags (optional)notes: Notes string (optional)hash: File hash (optional, computed if not provided)fileUrl: URL to upload (optional, uploads if provided)pathwayResolver: Optional resolver for loggingpermanent: Whether file is permanent (default: false)
- Returns: File entry object with
id - Process:
- If
fileUrlprovided, uploads file first viauploadFileToCloud() - If
permanent=true, sets retention to permanent viasetRetentionForHash() - Creates file entry with
displayFilename(user-friendly name) - Writes to Redis hash map via atomic HSET
- Merges with existing CFH data if hash already exists
- If
- Used by: WriteFile, Image tools, FileCollection tool
Processes chat history files based on collection membership.
- Parameters:
chatHistory: Chat history array to processagentContext: Array of context objects, each with{ contextId, contextKey, default }(required)
- Returns:
{ chatHistory, availableFiles }- processed chat history and formatted file list - Process:
- Loads merged file collection from all contexts in
agentContext - For each file in chat history:
- If in collection: strip from message, update lastAccessed and inCollection in owning context (using that context's key)
- If not in collection: leave in message as-is
- Returns processed history and available files string
- Uses atomic operations per file, updating the context that owns each file (identified by
_contextIdtag)
- Loads merged file collection from all contexts in
- Used by:
sys_entity_agentto process incoming chat history
Resolves file parameter to file URL.
- Parameters:
fileParam: File ID, filename, URL, or hashagentContext: Array of context objects, each with{ contextId, contextKey, default }(required)options: Optional options object:preferGcs: Boolean - prefer GCS URL over Azure URLuseCache: Boolean - use cache (default: true)
- Returns: File URL string (Azure or GCS) or
nullif not found - Matching (via
findFileInCollection()):- Exact ID match
- Exact hash match
- Exact URL match (Azure or GCS)
- Exact filename match (case-insensitive, basename comparison)
- Fuzzy filename match (contains, minimum 4 characters)
- Process:
- Loads merged file collection from all contexts in
agentContext - Searches merged collection for matching file
- Returns file URL if found
- Loads merged file collection from all contexts in
- Used by: ReadFile, EditFile, and other tools that need file URLs
Finds file in collection array.
- Parameters:
fileParam: File identifiercollection: Collection array
- Returns: File entry or
null - Used by:
resolveFileParameter
Generates file content for LLM messages.
- Parameters:
fileParam: File identifier (ID, filename, URL, or hash)agentContext: Array of context objects, each with{ contextId, contextKey, default }(required)
- Returns: File content object with
type,url,gcs,hashornull - Process:
- Loads merged file collection from all contexts in
agentContext - Finds file in merged collection via
findFileInCollection() - Resolves to short-lived URL via
ensureShortLivedUrl()using default context - Returns OpenAI-compatible format:
{type: 'image_url', url, gcs, hash}
- Loads merged file collection from all contexts in
- Used by: AnalyzeFile tool to inject files into chat history
Extracts file metadata from chat history messages.
- Parameters:
chatHistory: Chat history array to scan
- Returns: Array of file metadata objects
{url, gcs, hash, type} - Process:
- Scans all messages for file content objects
- Extracts from
image_url,file, or direct URL objects - Returns normalized format
- Used by: File extraction utilities
Gets formatted list of available files from collection.
- Parameters:
chatHistory: Unused (kept for API compatibility)agentContext: Array of context objects, each with{ contextId, contextKey, default }(required)
- Returns: Formatted string of available files (last 10 most recent)
- Process:
- Loads merged file collection from all contexts in
agentContext - Formats files via
formatFilesForTemplate() - Returns compact one-line format per file
- Loads merged file collection from all contexts in
- Used by: Template rendering to show available files
Helper function to extract the default context from an agentContext array.
- Parameters:
agentContext: Array of context objects, each with{ contextId, contextKey, default }
- Returns: Context object with
default: true, or first context if none marked as default, ornullif array is empty - Used by: Functions that need to determine which context to use for write operations
Computes xxhash64 hash of file.
- Returns: Hash string (hex)
Computes xxhash64 hash of buffer.
- Returns: Hash string (hex)
Extracts filename from URL (prefers GCS).
- Returns: Filename string
Ensures filename has correct extension based on MIME type.
- Returns: Filename with correct extension
Determines MIME type from URL or filename.
- Returns: MIME type string
Checks if MIME type is text-based.
- Parameters:
mimeType: MIME type string to check
- Returns: Boolean (true if text-based)
- Supports: All
text/*types, plus application types like JSON, JavaScript, XML, YAML, Python, etc. - Used by: ReadFile, EditFile to validate file types
Gets MIME type from filename or path.
- Parameters:
filenameOrPath: Filename or full file pathdefaultMimeType: Optional default (default: 'application/octet-stream')
- Returns: MIME type string
- Used by: File upload, file type detection
Gets MIME type from file extension.
- Parameters:
extension: File extension (with or without leading dot)defaultMimeType: Optional default (default: 'application/octet-stream')
- Returns: MIME type string
Network Errors:
- Handled gracefully in all functions
- Logged via
pathwayResolverorlogger - Non-critical operations return
nullinstead of throwing
404 Errors:
- Treated as "file not found" (not an error)
deleteFileByHashreturnsfalseon 404checkHashExistsreturnsnullon 404
Timeout Errors:
- Upload: 30 seconds
- Check hash: 10 seconds
- Fetch file: 60 seconds
- Set retention: 15 seconds
Missing ContextId:
- File collection operations require
contextId - Returns
nullor throws error if missing
Concurrent Modifications:
- Prevented by atomic Redis operations (HSET, HDEL are thread-safe)
- No version conflicts (each file updated independently)
Invalid File Data:
- Invalid JSON entries are skipped during load
- Missing required fields are handled gracefully
- Always pass
contextIdwhen available (strongly recommended for multi-tenant) - Use atomic operations -
addFileToCollection(),updateFileMetadata()are thread-safe - Check
permanentflag before deleting files from cloud storage - Handle errors gracefully - don't throw on non-critical failures
- Use short-lived URLs for LLM file access (via
ensureShortLivedUrl()) - Check for existing files before uploading (automatic in
uploadFileToCloud) - Preserve CFH fields - when updating metadata, preserve
url,gcs,filenamefrom file handler - Use
displayFilenamefor user-facing displays (fallback tofilenameif not set)
The Cortex file system provides:
✅ Encapsulated file handler interactions - No direct axios calls
✅ Hash-based deduplication - Avoids duplicate storage
✅ Context scoping - Per-user file isolation via FileStoreMap:ctx:<contextId>
✅ Permanent file support - Indefinite retention
✅ Atomic operations - Thread-safe collection modifications via Redis hash maps
✅ Short-lived URLs - Secure file access (5-minute expiration)
✅ Comprehensive error handling - Graceful failure handling
✅ Single API call optimization - Efficient file resolution
✅ Field ownership separation - CFH-managed vs Cortex-managed fields
✅ Chat history integration - Automatic file syncing from conversations
All file operations flow through lib/fileUtils.js, ensuring consistency, maintainability, and proper error handling throughout the system.
- File Handler Service: External Azure Function managing cloud storage
- File Utilities Layer: Abstraction over file handler (no direct API calls)
- File Collection System: Redis hash maps for user file metadata
- Atomic Operations: Thread-safe via Redis HSET/HDEL/HGET operations
- Context Isolation: Per-context hash maps for multi-tenant support