Option to find and use related volumes during restore.
When Wyng has a set of (large) volumes to restore it may make sense to first scan through the manifests to detect volumes that are related by cloning, and create snapshots of the related volumes as the starting points for --sparse receive.
This can save time and bandwidth when restoring large related volumes.
The idea could be taken further and look simply for a high proportion of shared chunks, not needing to be at the same offsets with the volumes, and simply copy (or even use fs dedup commands) the chunks from the source snapshots, thus preventing transmission from a remote archive location.
One potential pitfall is that if volume B was automatically pre-sourced from volume A and the receive is interrupted, the user could have a volume that is inconsistent, or worse, containing data from A that the user would not expect or want in B. (Restoring to a 'tmp' volume name is advised.)
A p-code example of what this might look like between only two volumes:
with open(merge_manifests(vol_a)) as af, \
open(merge_manifests(vol_b)) as bf:
lines = count = 0
for lna, lnb in zip(af, bf):
lines += 1
if lna.split()[0] == lnb.split()[0]:
count += 1
if (count and lines) and lines / count < 5:
storage.create(vol_b, snapshot_from=vol_a)
restore_volume(vol_b, sparse=True)
This compares the hashes for both volumes' data chunks, side-by-side. For each hash that matches (at their common offset) the 'count' is incremented. Then we check if 'count' is a significant proportion of the total 'lines'.
Option to find and use related volumes during restore.
When Wyng has a set of (large) volumes to restore it may make sense to first scan through the manifests to detect volumes that are related by cloning, and create snapshots of the related volumes as the starting points for
--sparsereceive.This can save time and bandwidth when restoring large related volumes.
The idea could be taken further and look simply for a high proportion of shared chunks, not needing to be at the same offsets with the volumes, and simply copy (or even use fs dedup commands) the chunks from the source snapshots, thus preventing transmission from a remote archive location.
One potential pitfall is that if volume B was automatically pre-sourced from volume A and the receive is interrupted, the user could have a volume that is inconsistent, or worse, containing data from A that the user would not expect or want in B. (Restoring to a 'tmp' volume name is advised.)
A p-code example of what this might look like between only two volumes:
This compares the hashes for both volumes' data chunks, side-by-side. For each hash that matches (at their common offset) the 'count' is incremented. Then we check if 'count' is a significant proportion of the total 'lines'.