Skip to content

better error message --resume crash when download.state2 file is missing #1383

@minguyen9988

Description

@minguyen9988

Problem

When using --resume on a backup directory that exists but has no download.state2 file, clickhouse-backup crashes with:

Error: backup is already exists

Root cause

The download logic checks if the backup already exists locally:

for i := range localBackups {
    if backupName == localBackups[i].BackupName {
        if !b.resume {
            return ErrBackupIsAlreadyExists
        } else {
            // resume path requires download.state2 to exist
        }
    }
}

When --resume is set but download.state2 doesn't exist (e.g., the previous download crashed before creating the state file, or the state file was manually deleted), the code falls through to an error path.

When this happens

  1. Previous download was killed before writing any state
  2. User manually deleted the state file but kept the partial data
  3. Previous download used a different version that didn't create state files
  4. Backup directory was created by a failed download that crashed on metadata

Proposed Fix

Make the state file optional for resume. If it doesn't exist, resume from scratch — the conservative validation (see #1376) will detect which parts are complete and which need re-downloading:

if !b.resume {
    return ErrBackupIsAlreadyExists
} else {
    isResumeExists = true
    _, checkDownloadErr := os.Stat(path.Join(b.DefaultDataPath, "backup", backupName, "download.state2"))
    if errors.Is(checkDownloadErr, os.ErrNotExist) {
        // No state file from a previous download — this is OK.
        // Resume will re-validate and re-download any incomplete parts.
        // The state file is an optimization, not a requirement.
        log.Warn().Msgf("%s already exists but no download.state2 found, will resume download from scratch", backupName)
    } else {
        log.Warn().Msgf("%s already exists will try to resume download", backupName)
    }
}

Behavior after fix

State file exists? Behavior
Yes Normal resume — bbolt state tracks completed parts
No Resume from scratch — validates each part locally, re-downloads incomplete ones
No + manifest Fast resume — manifest enables local-only validation (zero remote calls)

The key insight is that the state file is an optimization for skipping validation, not a requirement for correctness. The conservative resume validation (checking actual file existence and sizes) is the real safety mechanism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions