Wyng send will currently examine all unallocated portions of a volume under certain conditions, such as during the volume's initial send or otherwise doing a full scan. It will also examine/compare all portions that have been de-allocated since the previous send, so there is some impact on incremental backups as well. This results in slower access than what is possible.
Cases where this has an impact:
- Initial add of large volumes to an archive
- Doing a full scan, such as when snapshots or delta maps were lost or
--remap is needed
- Deleting large amounts of data from a volume
- Increasing a volume's size
Optimization could be achieved by creating a twin of the delta map, a zero map, during one of the early stages of the send process including get_delta_digest(). The zero mapping code would have to conform to each storage type, and the reflink version may be able to consume a 'tee' of the fiemap data. (An alternative would be to use SEEK_HOLE and SEEK_DATA, although they're unlikely to work with tlvm.)
The tlvm version might collect any "left-only" references in the case of an incremental send, or else do an extra metadata extraction step using a tlvm command other than thin_delta. The rule with thin_delta output appears to be that if no tag is produced for a given chunk range then it is unallocated on both sides, while left-only and right-only tags show the range is unallocated only on the old or new side. This should be enough to produce a zero map, even when there is no prior snapshot available.
Assuming the result of zero mapping is a per-chunk bitmap like the delta map, the send_volume() function could attempt to skip through 8-bit or larger segments similar to how it handles the delta bmap_list.
One desired result would be the ability to add a mostly empty, terabyte-sized volume to an archive in a matter of seconds or a few minutes. Another result would be incremental send for a volume that had a vast amount of data deleted taking only a fraction of the time it would in the current worst-case scenario.
To illustrate the large difference that delta mapping vs (lack of) unallocated mapping makes:
Adding a new 1TB mostly-empty (1.5MB) volume to an archive took over 14 minutes.
Adding 48MB to that volume and doing an incremental (mapped) send took 9 seconds. So a backup of 32X the data finished in 1/93 the time. (The incremental send didn't have to compare large amounts of zeros because data had not been deleted from the volume, only added.)
Wyng
sendwill currently examine all unallocated portions of a volume under certain conditions, such as during the volume's initialsendor otherwise doing a full scan. It will also examine/compare all portions that have been de-allocated since the previoussend, so there is some impact on incremental backups as well. This results in slower access than what is possible.Cases where this has an impact:
--remapis neededOptimization could be achieved by creating a twin of the delta map, a zero map, during one of the early stages of the send process including get_delta_digest(). The zero mapping code would have to conform to each storage type, and the reflink version may be able to consume a 'tee' of the fiemap data. (An alternative would be to use
SEEK_HOLEandSEEK_DATA, although they're unlikely to work with tlvm.)The tlvm version might collect any "left-only" references in the case of an incremental send,
or else do an extra metadata extraction step using a tlvm command other thanThe rule withthin_delta.thin_deltaoutput appears to be that if no tag is produced for a given chunk range then it is unallocated on both sides, while left-only and right-only tags show the range is unallocated only on the old or new side. This should be enough to produce a zero map, even when there is no prior snapshot available.Assuming the result of zero mapping is a per-chunk bitmap like the delta map, the send_volume() function could attempt to skip through 8-bit or larger segments similar to how it handles the delta bmap_list.
One desired result would be the ability to add a mostly empty, terabyte-sized volume to an archive in a matter of seconds or a few minutes. Another result would be incremental send for a volume that had a vast amount of data deleted taking only a fraction of the time it would in the current worst-case scenario.
To illustrate the large difference that delta mapping vs (lack of) unallocated mapping makes:
Adding a new 1TB mostly-empty (1.5MB) volume to an archive took over 14 minutes.
Adding 48MB to that volume and doing an incremental (mapped) send took 9 seconds. So a backup of 32X the data finished in 1/93 the time. (The incremental send didn't have to compare large amounts of zeros because data had not been deleted from the volume, only added.)