ZFS scrub

Материал из Eugene Paniot Wiki
Перейти к: навигация, поиск

ZFS scrub operations operate on some fairly brain-dead principles. Most notably, it only spends time scrubbing when there's nothing else going on. If you poke a pool with just a bit of data access on a fairly constant basis, scrub will effectively starve itself and do nearly nothing.

Tunables to explore, with my quick notes on what it does (I last looked into this awhile ago, though):

  • zfs_scan_idle - if user I/O occurrs within this many clock ticks, delay scrub I/O by zfs_scrub_delay clock ticks
  • zfs_scrub_delay - how many clock ticks to delay scrub operation if triggered by zfs_scan_idle
  • zfs_top_maxinflight - maximum number of scrub I/O per top-level vdev
  • zfs_scrub_limit - maximum number of scrub I/O per leaf vdev
  • zfs_scan_min_time_ms - minimum ms to spend per txg on scrub operations
  • zfs_no_scrub_io - no notes
  • zfs_no_scrub_prefetch - no notes, name seems to imply not causing prefetch on scrub ops

All of these are changeable on the fly using "echo [tunable]/W0t[number]" to change, and "echo [tunable]/D" to view current setting (which I recommend doing before changing).

So in theory, and in general practice, if you were to, say, change zfs_scan_idle down to 10 (or 1 - or 0, if it supports that, would need to check the code) and zfs_scrub_delay down to 1 (or 0, if it supports that), and if your txg_synctime_ms setting is 5000 or more maybe change zfs_scan_min_time_ms up a bit, it should become a lot more aggressive about actually doing scrub operations even with some level of user I/O occurring.

In your specific case, the %b and asvc_t reported imply some very, very random read workload going on (spinning disks should do better than that if it is truly sequential), and you've already done the "easy" stuff as explained above. So, first I'd turn on zfs_no_scrub_prefetch, to disable prefetch on scrub operations, just to see if that helped. If no joy, depending on the version of Nexenta you're on - you may be running 30/5, 5/1 or 10/5 (that's shorthand we use for the settings of zfs_txg_timeout & (zfs_txg_synctime_ms*1000)). Change zfs_txg_timeout to 10 and zfs_txg_synctime_ms to 5000, then try upping zfs_scan_min_time_ms to 3000 or 4000. This tells ZFS it can spend a lot longer on scrubs, as compared to the default settings on older NexentaStor installs that use 5/1 as the defaults - but careful, this may starve normal I/O if the delay settings have also been set to basically 0!

Available symbols

nm /dev/ksyms | nawk 'BEGIN{FS="|"}$NF~/zfs/{print $3, $NF}' > zfs.symbols