Time-splitting of container logs

The PR https://github.com/green-coding-solutions/green-metrics-tool/pull/1341 enabled the storing of logs in a JSON structure. One advantage of this is that the different log types are now separated properly:

- logs from the container
- logs from setup commands
- logs from flow commands
- system related logs

While the logs from the setup commands and flow commands happen during a specific phase (setup commands in `[BOOT]` phase and flow commands in `[RUNTIME]` phase), the logs from the container usually span multiple phases (`[BOOT]`, `[IDLE]`, `[RUNTIME]`). Therefore the phase of the log type `container_execution` is stored as `[MULTIPLE]`.

@ArneTR proposed in the comment https://github.com/green-coding-solutions/green-metrics-tool/pull/1341#issuecomment-3295479744 to time-split the container logs to be able to separate between boot, idle, runtime and sub-runtime logs. For this, the logs need to be time-keyed. At least for the container logs that should be not problem ... not so much for the run/exec logs.

## Current Implementation

The container logs are collected using:

```python
log = subprocess.run(
    ['docker', 'logs', container_id],
    check=True,
    encoding='UTF-8',
    errors='replace',
    stdout=stdout_behaviour,
    stderr=stderr_behaviour,
)
```

Docker's `logs` command provides built-in timestamp support that could be used:

- **Flag**: `--timestamps` or `-t`
- **Format**: RFC3339Nano timestamp prefix
- **Example Output**:
  ```
  2025-09-17T06:46:17.013138795Z first log
  2025-09-17T06:46:19.014901570Z second log
  2025-09-17T06:46:20.016411547Z third log
  ```

Logs are currently stored as JSON objects with the following structure:

```python
log_entry = {
    'type': log_type.value,
    'id': str(log_id),
    'cmd': command_string,
    'phase': phase,
    'stdout': stdout,  # optional
    'stderr': stderr,  # optional
    'flow': flow,      # optional
    'class': exception_class  # optional
}
```

## Implementation Options

### Option 1: Integrated Timestamp Storage

**Approach**: Add timestamps to existing log structure

**Changes Required**:
1. Modify docker logs command in `_read_container_logs()`:
   ```python
   ['docker', 'logs', '--timestamps', container_id]
   ```

2. Parse timestamps in `_handle_process_output()`:
   ```python
   for line in log_output.split('\n'):
       if not line.strip():
           continue

     # Parse RFC3339Nano timestamp
     timestamp_match = re.match(r'^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)\s+(.*)$', line)
     if timestamp_match:
         timestamp, content = timestamp_match.groups()
         entries.append({
             'timestamp': timestamp,
             'content': content
         })
     else:
         # Fallback for lines without timestamps
         entries.append({
             'timestamp': None,
             'content': line
         })
   ```

3. Update log entry structure for container execution only:

   ```python
   # For LogType.CONTAINER_EXECUTION only
   if log_type == LogType.CONTAINER_EXECUTION:
       log_entry = {
           'type': log_type.value,
           'id': str(log_id),
           'cmd': command_string,
           'phase': phase,
           'stdout_entries': parsed_stdout_entries,  # List of {timestamp, content}
           'stderr_entries': parsed_stderr_entries,  # List of {timestamp, content}
           # ... other fields
       }
   else:
       # Keep existing structure for other types like setup_command and flow_command
       log_entry = {
           'type': log_type.value,
           'id': str(log_id),
           'cmd': command_string,
           'phase': phase,
           'stdout': stdout,  # Plain string
           'stderr': stderr,  # Plain string
           # ... other fields
       }
   ```

### Option 2: Separate Timestamp-Aware Logs

**Approach**: Maintain existing logs and add parallel timestamp-aware storage

**Pros**:
- Full backward compatibility
- No impact on existing log consumers

**Cons**:
- Increased storage overhead
- Duplicate data maintenance

## Technical Considerations

### 1. Log Type Differentiation

- **Container logs**: Will have per-line timestamps with nanosecond precision
- **Setup/Flow logs**: Will maintain current plain string format
- **Mixed handling**: Code must handle both timestamped and non-timestamped logs

### 2. Timezone Handling
- Docker timestamps are in UTC (Z suffix)
- Consider if local timezone conversion is needed for display

### 3. Performance Impact
- Timestamp parsing adds processing overhead
- Consider lazy parsing if logs are large

### 4. Backward Compatibility
- If the JSON structure is changed (approach 1), a migration script is needed
- Only affects container execution logs structure
- Setup/Flow command logs remain unchanged

### 5. Phase Detection
- The timestamps could be used to determine the correct phase to which the log message belongs
- Further analysis may be necessary to specify where and how this should be done

### 6. Frontend
- How to display the timestamps in the frontend is an open question for me


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time-splitting of container logs #1343

Current Implementation

Implementation Options

Option 1: Integrated Timestamp Storage

Option 2: Separate Timestamp-Aware Logs

Technical Considerations

1. Log Type Differentiation

2. Timezone Handling

3. Performance Impact

4. Backward Compatibility

5. Phase Detection

6. Frontend

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time-splitting of container logs #1343

Description

Current Implementation

Implementation Options

Option 1: Integrated Timestamp Storage

Option 2: Separate Timestamp-Aware Logs

Technical Considerations

1. Log Type Differentiation

2. Timezone Handling

3. Performance Impact

4. Backward Compatibility

5. Phase Detection

6. Frontend

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions