Skip to content

Receive: Analyze overlap between volumes and automatically use them as snapshot sources #247

@tasket

Description

@tasket

Option to find and use related volumes during restore.

When Wyng has a set of (large) volumes to restore it may make sense to first scan through the manifests to detect volumes that are related by cloning, and create snapshots of the related volumes as the starting points for --sparse receive.

This can save time and bandwidth when restoring large related volumes.

The idea could be taken further and look simply for a high proportion of shared chunks, not needing to be at the same offsets with the volumes, and simply copy (or even use fs dedup commands) the chunks from the source snapshots, thus preventing transmission from a remote archive location.

One potential pitfall is that if volume B was automatically pre-sourced from volume A and the receive is interrupted, the user could have a volume that is inconsistent, or worse, containing data from A that the user would not expect or want in B. (Restoring to a 'tmp' volume name is advised.)

A p-code example of what this might look like between only two volumes:

with open(merge_manifests(vol_a)) as af, \
     open(merge_manifests(vol_b)) as bf:

    lines = count = 0
    for lna, lnb in zip(af, bf):
        lines += 1
        if lna.split()[0] == lnb.split()[0]:
            count += 1

if (count and lines) and lines / count < 5:
    storage.create(vol_b, snapshot_from=vol_a)

restore_volume(vol_b, sparse=True)

This compares the hashes for both volumes' data chunks, side-by-side. For each hash that matches (at their common offset) the 'count' is incremented. Then we check if 'count' is a significant proportion of the total 'lines'.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions