The PR #1341 enabled the storing of logs in a JSON structure. One advantage of this is that the different log types are now separated properly:
- logs from the container
- logs from setup commands
- logs from flow commands
- system related logs
While the logs from the setup commands and flow commands happen during a specific phase (setup commands in [BOOT] phase and flow commands in [RUNTIME] phase), the logs from the container usually span multiple phases ([BOOT], [IDLE], [RUNTIME]). Therefore the phase of the log type container_execution is stored as [MULTIPLE].
@ArneTR proposed in the comment #1341 (comment) to time-split the container logs to be able to separate between boot, idle, runtime and sub-runtime logs. For this, the logs need to be time-keyed. At least for the container logs that should be not problem ... not so much for the run/exec logs.
Current Implementation
The container logs are collected using:
log = subprocess.run(
['docker', 'logs', container_id],
check=True,
encoding='UTF-8',
errors='replace',
stdout=stdout_behaviour,
stderr=stderr_behaviour,
)
Docker's logs command provides built-in timestamp support that could be used:
- Flag:
--timestamps or -t
- Format: RFC3339Nano timestamp prefix
- Example Output:
2025-09-17T06:46:17.013138795Z first log
2025-09-17T06:46:19.014901570Z second log
2025-09-17T06:46:20.016411547Z third log
Logs are currently stored as JSON objects with the following structure:
log_entry = {
'type': log_type.value,
'id': str(log_id),
'cmd': command_string,
'phase': phase,
'stdout': stdout, # optional
'stderr': stderr, # optional
'flow': flow, # optional
'class': exception_class # optional
}
Implementation Options
Option 1: Integrated Timestamp Storage
Approach: Add timestamps to existing log structure
Changes Required:
-
Modify docker logs command in _read_container_logs():
['docker', 'logs', '--timestamps', container_id]
-
Parse timestamps in _handle_process_output():
for line in log_output.split('\n'):
if not line.strip():
continue
# Parse RFC3339Nano timestamp
timestamp_match = re.match(r'^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\s+(.*)$', line)
if timestamp_match:
timestamp, content = timestamp_match.groups()
entries.append({
'timestamp': timestamp,
'content': content
})
else:
# Fallback for lines without timestamps
entries.append({
'timestamp': None,
'content': line
})
-
Update log entry structure for container execution only:
# For LogType.CONTAINER_EXECUTION only
if log_type == LogType.CONTAINER_EXECUTION:
log_entry = {
'type': log_type.value,
'id': str(log_id),
'cmd': command_string,
'phase': phase,
'stdout_entries': parsed_stdout_entries, # List of {timestamp, content}
'stderr_entries': parsed_stderr_entries, # List of {timestamp, content}
# ... other fields
}
else:
# Keep existing structure for other types like setup_command and flow_command
log_entry = {
'type': log_type.value,
'id': str(log_id),
'cmd': command_string,
'phase': phase,
'stdout': stdout, # Plain string
'stderr': stderr, # Plain string
# ... other fields
}
Option 2: Separate Timestamp-Aware Logs
Approach: Maintain existing logs and add parallel timestamp-aware storage
Pros:
- Full backward compatibility
- No impact on existing log consumers
Cons:
- Increased storage overhead
- Duplicate data maintenance
Technical Considerations
1. Log Type Differentiation
- Container logs: Will have per-line timestamps with nanosecond precision
- Setup/Flow logs: Will maintain current plain string format
- Mixed handling: Code must handle both timestamped and non-timestamped logs
2. Timezone Handling
- Docker timestamps are in UTC (Z suffix)
- Consider if local timezone conversion is needed for display
3. Performance Impact
- Timestamp parsing adds processing overhead
- Consider lazy parsing if logs are large
4. Backward Compatibility
- If the JSON structure is changed (approach 1), a migration script is needed
- Only affects container execution logs structure
- Setup/Flow command logs remain unchanged
5. Phase Detection
- The timestamps could be used to determine the correct phase to which the log message belongs
- Further analysis may be necessary to specify where and how this should be done
6. Frontend
- How to display the timestamps in the frontend is an open question for me
The PR #1341 enabled the storing of logs in a JSON structure. One advantage of this is that the different log types are now separated properly:
While the logs from the setup commands and flow commands happen during a specific phase (setup commands in
[BOOT]phase and flow commands in[RUNTIME]phase), the logs from the container usually span multiple phases ([BOOT],[IDLE],[RUNTIME]). Therefore the phase of the log typecontainer_executionis stored as[MULTIPLE].@ArneTR proposed in the comment #1341 (comment) to time-split the container logs to be able to separate between boot, idle, runtime and sub-runtime logs. For this, the logs need to be time-keyed. At least for the container logs that should be not problem ... not so much for the run/exec logs.
Current Implementation
The container logs are collected using:
Docker's
logscommand provides built-in timestamp support that could be used:--timestampsor-tLogs are currently stored as JSON objects with the following structure:
Implementation Options
Option 1: Integrated Timestamp Storage
Approach: Add timestamps to existing log structure
Changes Required:
Modify docker logs command in
_read_container_logs():Parse timestamps in
_handle_process_output():Update log entry structure for container execution only:
Option 2: Separate Timestamp-Aware Logs
Approach: Maintain existing logs and add parallel timestamp-aware storage
Pros:
Cons:
Technical Considerations
1. Log Type Differentiation
2. Timezone Handling
3. Performance Impact
4. Backward Compatibility
5. Phase Detection
6. Frontend