Skip to content

Latest commit

 

History

History
360 lines (304 loc) · 12.9 KB

File metadata and controls

360 lines (304 loc) · 12.9 KB

ZFS

Prerequisites

Install ZFS-on-Linux

For Debian 10 (Buster), ZFS packages are included in the contrib repo.

  1. Add Debian contrib repository to the apt sources.
    cp /etc/apt/sources.list sources.list.orig
    sudo sed -i 's/buster main$/buster main contrib/g' /etc/apt/sources.list
  2. Update, install kernel headers and dependencies.
    sudo apt-get update
    sudo apt-get install dpkg-dev linux-headers-$(uname -r) linux-image-amd64
  3. Install the zfs packages. May take a few minutes.
    sudo apt-get install zfs-dkms
    sudo modprobe zfs
    sudo apt-get install zfsutils-linux
  4. Set ZFS module to load on boot.
    echo zfs | sudo tee /etc/modules-load.d/zfs.conf

Source: zfsonlinux instructions

Create ZFS Pool

Tune ashift Parameter

Benchmark the drives with different ashift values, to find the optimal performance. Source: louwrentis.com

  1. Benchmark using ashift=12

    zpool create test -o ashift=12 mirror /dev/disk/by-id/... /dev/disk/by-id/...
    dd if=/dev/zero of=/test/ashift12.bin bs=1M count=100000
    # (e.g. I got 197 MB/s)
    zpool destroy test
  2. Benchmark using ashift=9

    zpool create test -o ashift=9 mirror /dev/disk/by-id/... /dev/disk/by-id/...
    dd if=/dev/zero of=/test/ashift12.bin bs=1M count=100000
    # (e.g. I got 192 MB/s, slightly slower than above)
    zpool destroy test
  3. Record the optimal ashift value for each vdev.

NOTE: Additional vdevs added to the pool can have their own ashift value.

Create Pool

  1. ZFS pools cannot be resized without destroying and recreating the pool. To mitigate few-byte differences in disk sizes, we partition the disk with a tiny amount of trailing space.

    sudo parted /dev/disk/by-id/TARGET_DEVICE
    (parted) print
    # Verify this is the CORRECT DRIVE!!!
    (parted) mklabel gpt
    (parted) unit GB
    # this example: 4001 GiB drive, will leave 2 GiB free.
    (parted) mkpart primary 0 3999
    (parted) print free
    # Figure how much free space is left/available
    (parted) mkpart primary ext4 3999 4001
    (parted) print free
    # Double-check everything looks good
    (parted) quit
  2. Repeat for all data drives.

  3. Create the pool using the ashift value found in previous subsection. NOTE: For simplicity, use pool name "${HOSTNAME}".

    sudo zpool create ${HOSTNAME} -o ashift=12 \
    mirror /dev/disk/by-id/.._1 /dev/disk/by-id/.._1
  4. Verify ASHIFT and drive paths look correct. Prefer canonical path /dev/disk/by-id rather than the volatile /dev/sdX notation.

    sudo zdb
  5. Set properties for base dataset. Default to lz4 compression and not tracking access times (atime).

    sudo zfs set compression=lz4 ${HOSTNAME}
    sudo zfs set atime=off ${HOSTNAME}

Create ZFS Datasets

  1. Create shared datasets.

    echo "movie music picture tv" > public_categories.txt
    sudo zfs create ${HOSTNAME}/public
    for category in `cat public_categories.txt`; do
        sudo zfs create ${HOSTNAME}/public/${category}
    done
    • Optionally, disallow creating uncategorized files: sudo chmod 550 /${HOSTNAME}/public
  2. Create datasets for each user with sane parameters.

    for user in $USER `cat users.txt`; do
        sudo zfs create ${HOSTNAME}/${user}
        sudo zfs create ${HOSTNAME}/${user}/safe
        sudo zfs set reservation=100G ${HOSTNAME}/${user}
        sudo zfs set refquota=100G ${HOSTNAME}/${user}/safe
        sudo zfs set quota=500G ${HOSTNAME}/${user}/safe
    done
    • Reserve 100G minimum for each user.
    • Limit safe folder to 100G (500G including snapshots).
  3. Link to dataset in user homes, and set permissions.

    for user in $USER `cat users.txt`; do
        # link to USER and PUBLIC
        sudo ln -sn /${HOSTNAME}/${user} /home/${user}/${user}
        sudo ln -sn /${HOSTNAME}/public /home/${user}/public
        sudo chown -R ${user}:${user} /${HOSTNAME}/${user}
        # reveal special directory: .zfs/snapshot
        sudo chown ${user}:${user} /${HOSTNAME}/${user}/{,safe/}.zfs{,/snapshot}
    done
    # sticky bit to make all PUBLIC files owned by `publisher`
    sudo chown -R nobody:publisher /${HOSTNAME}/public
    sudo chmod -R g+ws /${HOSTNAME}/public
    # keep root level 'read-only'
    sudo chmod g-w /${HOSTNAME}/public
    for category in `cat public_categories.txt`; do
        # reveal special directory: .zfs/snapshot
        sudo chown nobody:publisher /${HOSTNAME}/public/${category}/.zfs{,/snapshot}
    done

Enhancements

Auto-Snapshots

  1. Install zfs-auto-snapshot

    wget https://github.com/zfsonlinux/zfs-auto-snapshot/archive/upstream/1.2.4.tar.gz
    tar -xzf 1.2.4.tar.gz
    cd zfs-auto-snapshot-upstream-1.2.4
    sudo make install
  2. Configure the anacron entries as desired

    for f in /etc/cron.*/zfs-auto-snapshot; do
        echo -e "===========================\n$f\n==========================="
        cat $f
    done

Email Health Report

  1. Setup the ZFS health script to run daily.
    wget https://gist.githubusercontent.com/petervanderdoes/bd6660302404ed5b094d/raw \
    -O - | sudo tee /etc/cron.daily/zfs_health
    # EDIT appropriately for Debian date params
    #  (line 111, choose Ubuntu option)
    sudo nano /etc/cron.daily/zfs_health
    sudo chmod +x /etc/cron.daily/zfs_health
    • NOTE: failure output relies on root email setup.
  2. Wait for the script to report pool needs scrub to verify it is working.
  3. Schedule daily scrub of all pools
     echo "#"'!'"/bin/sh
     /sbin/zpool scrub ${HOSTNAME}" | sudo tee /etc/cron.daily/zpool_scrub
     sudo chmod +x /etc/cron.daily/zpool_scrub
    • NOTE: monthly scrub is provided by /etc/cron.d/zfsutils-linux
    • Monthly scrub provided by zfsutils-linux has issues (can skip a month if system is powered down)
    • Better to use anacron:
    # comment-out the scrub line
    sudo vim /etc/cron.d/zfsutils-linux
    (echo '#!/bin/sh'; for pool in rpool bpool ${HOSTNAME}; do echo "/sbin/zpool scrub ${pool}"; done) | sudo tee /etc/cron.monthly/zpool_scrub
    sudo chmod +x /etc/cron.monthly/zpool_scrub

See zfs_health_ignore_error for an example of ignoring a corruption error that you are OK with ignoring.

Recipes

Send and Receive Snapshots

Common commands to use for transferring datasets between two servers.

The example below uses server names SENDER/RECEIVER, pool names SENDER_POOL/RECEIVER_POOL, dataset DATA, and snapshot name SNAPPED.

  1. If not yet done, snapshot the source folder.
    # on SENDER
    sudo zfs snapshot SENDER_POOL/DATA@SNAPPED
  2. Prepare RECEIVER to accept the stream.
    # on RECEIVER
    mbuffer -s 128k -m 1G -I 9090 | sudo zfs recv -s -e RECEIVER_POOL
  3. Initiate transfer from SENDER.
    # on SENDER
    sudo zfs send SENDER_POOL/DATA@SNAPPED | mbuffer -s 128k -m 1G -O RECEIVER:9090
  4. If the transfer is interrupted, the -s flag saves the RECEIVER's intermediate state and allows the transfer to resume later.

Source: evercity.co.uk

Auto Mirror

Use zfs-auto-mirror shell script to pull snapshots from the host.

  1. Create users with limited rights: only zfs-send/-receive permissions for the datasets of interest.
    # on source host
    sudo useradd zfs-sender -m -s /bin/bash
    sudo passwd zfs-sender #create TEMPORARY password
    sudo zfs allow zfs-sender mount,snapshot,send,hold DATASET_PATHS
    ln -s /sbin/zfs /usr/bin/zfs
    
    # on destination host
    sudo useradd zfs-receiver -m -s /bin/bash
    sudo zfs allow zfs-receiver mount,create,receive DATASET_PATHS
    ln -s /sbin/zfs /usr/bin/zfs
    
    sudo -u zfs-receiver /bin/bash
    $ ssh-keygen -t rsa #create with no passphrase, in ~/.ssh/SOURCE_HOST.rsa
    $ echo "Host SOURCE_HOST
    	HostName SOURCE_HOST
    	IdentityFile ~/.ssh/SOURCE_HOST.rsa
    	User zfs-sender" >> ~/.ssh/config
    $ ssh-copy-id -i ~/.ssh/SOURCE_HOST.rsa.pub SOURCE_HOST
    $ exit
    
    # on source host
    sudo passwd -l zfs-sender
  2. Download script on client.
    # on destination host
    sudo -u zfs-receiver /bin/bash
    
    wget https://raw.githubusercontent.com/nadavgolden/zfs-auto-mirror/master/zfs-auto-mirror.sh
    chmod +x zfs-auto-mirror.sh
    
    # apply fix for local snapshot detection (likely only needed for older zfsonlinux versions ~0.7.12)
    echo '141c141
    <     LOCAL_SNAPSHOTS=$(zfs list -t snapshot -H -S creation -o name ${LOCAL_DATASET} | grep ${LABEL} | cut -d "@" -f2-)
    ---
    >     LOCAL_SNAPSHOTS=$(zfs list -r -t snapshot -H -S creation -o name ${LOCAL_DATASET} | grep ${LABEL} | cut -d "@" -f2-)' | patch zfs-auto-mirror.sh
  3. Setup mirroring as you wish, running commands like:
    sudo -u zfs-receiver /bin/bash
    $ ./zfs-auto-mirror.sh -p -d 1 SOURCE_HOST SOURCE_DATASET DESTINATION_DATASET

Source: superuser.com

Expand a mirror by cloning to new drives (defrag)

Apparently ZFS does not have a built-in method to defragment the free space on the drives. This should only matter for HDDs, where new files are forced to span a larger "seek distance" on the drive.

These steps assume expanding a from 8TB to 20TB, where the original 8TB drives will no longer be used. Adjust accordingly.

  1. Ensure the source pool is not being written to (e.g. not mounted, no services querying ZFS)
    zfs set mountpoint=none mypool
  2. Create the new pool:
    • destroy the existing partition tables:
      export NEW1=/dev/disk/by-id/drive-identifier1
      export NEW2=/dev/disk/by-id/drive-identifier2
      sudo fdisk -l $NEW1
      sudo fdisk -l $NEW2
      sudo fdisk $NEW1
      > g # create new GPT partition table
      > w # write changes to disk
      sudo fdisk $NEW2
      > g
      > w
    • create the new pool
      sudo zpool create -m none mypool-20 mirror $NEW1 $NEW2
    • setup the new pool to match the old one
      diff <(sudo zfs get all mypool | cut -d ' ' -f 2-) <(sudo zfs get all mypool-20 | cut -d ' ' -f 2-) -y | less
      sudo zfs set compression=lz4 mypool-20
      sudo zfs set atime=off mypool-20
    • clone the old pool to the new pool, using an appropriate SNAPSHOT name (e.g. transfer_YYYYMMDDTHHMM for the current time)
       export SNAPSHOT="mypool@transfer_YYYYMMDDTHHMM"
       sudo zfs snapshot -r "$SNAPSHOT"
       # dry-run the send
       sudo zfs send -R "$SNAPSHOT" --dryrun --verbose
       # dry-run the receive
       sudo zfs send -R "$SNAPSHOT" | mbuffer -s 128k -m 1G | sudo zfs recv -Fdus mypool-20 -nv
       # execute for real
       sudo zfs send -R "$SNAPSHOT" | mbuffer -s 128k -m 1G | sudo zfs recv -Fdus mypool-20
      • ZFS Send flags
        • -R recursive (send all child datasets/properties as well)
      • mbuffer flags
        • -s use blocks of this size (bytes) for the buffer
        • -m total size (bytes) of the buffer
      • ZFS Receive flags
        • -F force a rollback of the receiving filesystem to the most recent snapshot
        • -d discard the first element from the received dataset name
        • -u do not mount the received datasets
        • -s if the stream is interrupted, print a "resume" token that can be used to resume the send
       sudo zfs send -R "$SNAPSHOT" | mbuffer -s 128k -m 1G | sudo zfs recv -Fdus mypool-20    
      
       in @ 10.2 MiB/s, out @ 10.2 MiB/s, 2101 GiB total, buffer 100% fullin @ 100.0 MiB/s, out @ 100.0 MiB/s, 2678 GiB toin @  0.0 kiB/s, out @  0.0 kiB/s, 6178 GiB total, buffer   0% fulll, buffer   0% full
      
       summary: 6178 GiByte in 11h 11min 29.2sec - average of  157 MiB/s
    • Rename the old pool
       # rename the old pool to mypool-8
       sudo zpool export mypool
       sudo zpool import -d /dev/disk/by-id/ mypool mypool-8
    • Rename the new pool into place
       # rename the new pool from mypool-20 to mypool-8
       sudo zpool export mypool-20
       sudo zpool import -d /dev/disk/by-id/ mypool-20 mypool
    • Verify the pool sizes look correct
      zpool list
  3. Shutdown to physically disconnect the old mypool-8 drives
  4. Re-enable all known access to the zpool

Next Steps

Homepage