Thursday, 27 September 2012

ZFS on Linux - replacing a failed drive in a RAIDZ pool

My server was built using 4 250gb hdds which were passed on to me by a friend who didn't need them any more. One of them failed - no problem, I thought, I have a spare and ZFS will take care of resilvering etc.

Power down the server, swap out the drive, reboot.
# zpool replace tank <failed drive id> <replacement drive id>
gave a Device not in pool error.

Googling suggested exporting then reimporting the pool, but all
# zpool export -r tank
got me was "Pool busy", even though there were no processes accessing it according to fuser and lsof, and zpool iostat showed 0 reads or writes.

Eventually I hit upon this issue in the ZFSonlinux bug tracker:https://github.com/zfsonlinux/zfs/issues/976

Finally a solution to my original problem. I needed full paths to the device names.
# zpool replace tank /dev/disk/by-id/<failed drive id> /dev/disk/by-id/<replacement drive id> 
did the trick:

# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
 continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Fri Sep 28 00:02:04 2012
    12.3G scanned out of 205G at 24.5M/s, 2h14m to go
    3.07G resilvered, 5.98% done
config:

 NAME                                           STATE     READ WRITE CKSUM
 tank                                           DEGRADED     0     0     0
   raidz1-0                                     DEGRADED     0     0     0
     ata-VB0250EAVER_Z2ATRS75                   ONLINE       0     0     0
     ata-WDC_WD2500AAJS-22RYA0_WD-WCAR00411237  ONLINE       0     0     0
     ata-WDC_WD2500JS-75NCB3_WD-WCANK8544801    ONLINE       0     0     0
     replacing-3                                UNAVAIL      0     0     0
       ata-WDC_WD2500JS-75NCB3_WD-WCANKC570943  UNAVAIL      0     0     0
       ata-SAMSUNG_SP2504C_S09QJ1SP156094       ONLINE       0     0     0  (resilvering)

errors: No known data errors
 
Success! Still don't know what was causing the "pool busy" error when trying to export.

I expect this should work in ZFS on Linux for all zpool operations that refer to individual vdevs or disks, like zpool add, zpool remove, etc.

2 comments:

  1. ZFS doesn't care. The replacement drive will get resilvered anyway.

    ReplyDelete
  2. ...you may however need to create a GPT label on the disk:

    # parted /dev/disk/path-to-disk
    mklabel GPT
    q

    ReplyDelete