Thursday, 27 September 2012

ZFS on Linux - replacing a failed drive in a RAIDZ pool

My server was built using 4 250gb hdds which were passed on to me by a friend who didn't need them any more. One of them failed - no problem, I thought, I have a spare and ZFS will take care of resilvering etc.

Power down the server, swap out the drive, reboot.
# zpool replace tank <failed drive id> <replacement drive id>
gave a Device not in pool error.

Googling suggested exporting then reimporting the pool, but all
# zpool export -r tank
got me was "Pool busy", even though there were no processes accessing it according to fuser and lsof, and zpool iostat showed 0 reads or writes.

Eventually I hit upon this issue in the ZFSonlinux bug tracker:https://github.com/zfsonlinux/zfs/issues/976

Finally a solution to my original problem. I needed full paths to the device names.
# zpool replace tank /dev/disk/by-id/<failed drive id> /dev/disk/by-id/<replacement drive id> 
did the trick:

# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
 continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Fri Sep 28 00:02:04 2012
    12.3G scanned out of 205G at 24.5M/s, 2h14m to go
    3.07G resilvered, 5.98% done
config:

 NAME                                           STATE     READ WRITE CKSUM
 tank                                           DEGRADED     0     0     0
   raidz1-0                                     DEGRADED     0     0     0
     ata-VB0250EAVER_Z2ATRS75                   ONLINE       0     0     0
     ata-WDC_WD2500AAJS-22RYA0_WD-WCAR00411237  ONLINE       0     0     0
     ata-WDC_WD2500JS-75NCB3_WD-WCANK8544801    ONLINE       0     0     0
     replacing-3                                UNAVAIL      0     0     0
       ata-WDC_WD2500JS-75NCB3_WD-WCANKC570943  UNAVAIL      0     0     0
       ata-SAMSUNG_SP2504C_S09QJ1SP156094       ONLINE       0     0     0  (resilvering)

errors: No known data errors
 
Success! Still don't know what was causing the "pool busy" error when trying to export.

I expect this should work in ZFS on Linux for all zpool operations that refer to individual vdevs or disks, like zpool add, zpool remove, etc.

Tuesday, 25 September 2012

Automatically share on boot ZFS filesystems via NFS in Fedora Linux using systemd

My media server project had a minor stumble when I found that after rebooting my server the ZFS shares were not showing on NFS clients, even though nfs-server.service was active.

I realised that the zfs shares were not being exported to NFS - I had assumed that ZFS on linux did this automatically on boot.

I wrote a simple systemd unit file to get this working on startup. This is what the file looks like:

/etc/systemd/system/zfs-share.service

[Unit]
Description=Start ZFS share-nfs sharing

[Service]
Type=oneshot
ExecStartPre=/bin/sleep 30
ExecStart=/usr/sbin/zfs share -a
ExecStop=/usr/sbin/zfs unshare -a
RemainAfterExit=yes

[Install]
WantedBy=default.target
After=nfs-server.service, zfs.service
Requires=nfs-server.service, zfs.service
 

This oneshot service simply runs the command
zfs share -a
after the NFS server and its dependencies have started.

systemctl stop zfs-share.service
should unshare the ZFS exports. In a another post I'll jot down how I got native ZFS working under Fedora 17, but it really was a case of following the simple instructions at http://zfsonlinux.org/

Edited 29/9/2012: Added 30-second sleep as ExecStartPre line to give time for ZFS to mount filesystems properly as I was getting errors due to a race condition.