Receive: Analyze overlap between volumes and automatically use them as snapshot sources

Option to find and use related volumes during restore.

When Wyng has a set of (large) volumes to restore it may make sense to first scan through the manifests to detect volumes that are related by cloning, and create snapshots of the related volumes as the starting points for `--sparse` receive.

This can save time and bandwidth when restoring large related volumes.

The idea could be taken further and look simply for a high proportion of shared chunks, not needing to be at the same offsets with the volumes, and simply copy (or even use fs dedup commands) the chunks from the source snapshots, thus preventing transmission from a remote archive location.

One potential pitfall is that if volume B was automatically pre-sourced from volume A and the receive is interrupted, the user could have a volume that is inconsistent, or worse, containing data from A that the user would not expect or want in B. (Restoring to a 'tmp' volume name is advised.)

A p-code example of what this might look like between only two volumes:

```
with open(merge_manifests(vol_a)) as af, \
     open(merge_manifests(vol_b)) as bf:

    lines = count = 0
    for lna, lnb in zip(af, bf):
        lines += 1
        if lna.split()[0] == lnb.split()[0]:
            count += 1

if (count and lines) and lines / count < 5:
    storage.create(vol_b, snapshot_from=vol_a)

restore_volume(vol_b, sparse=True)
```

This compares the hashes for both volumes' data chunks, side-by-side.  For each hash that matches (at their common offset) the 'count' is incremented.  Then we check if 'count' is a significant proportion of the total 'lines'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receive: Analyze overlap between volumes and automatically use them as snapshot sources #247

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Receive: Analyze overlap between volumes and automatically use them as snapshot sources #247

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions