EzDevInfo.com

zfs interview questions

Top zfs frequently asked interview questions

ZFS on Linux, getting slow write? [closed]

Decided to give ZFS on linux v28 a go (native port ubuntu) on my 5x 2TB WD Green advanced format drives. Created the pool using "zpool create -o ashift=12 raidz1 "

zpool status doesnt show anything unsual.

I did a dd if=/dev/zero into the mounted pool, and i can never reach past 20M/s write. I attempted to rsync a couple hundred gigs of files but even then, 'zpool iostat' gives me a maximum of 20M write. Cpu usage isnt very high and my 8GB of ram is 90% utilised – which I believe is normal.

Reading performance seems optimal to me.

I did play around with zfs_vdev_max/min_pending. As I have AHCI enabled I attempted setting these values to 1 but that reduced my writes down to 10M. Bringing it up to min/max of 4/8 reverted it back to 20M write speed.

Im doing a scrub now, and thats going at 170M/s.

Im thinking there must be something I missed ? Or is this normal ?

Attached are my settings. Ignore the sparse file, saving it to replace with a disk later.

zdb:

myData:
version: 28
name: 'myData'
state: 0
txg: 12
pool_guid: 14947267682211456191
hostname: 'microserver'
vdev_children: 1
vdev_tree:
    type: 'root'
    id: 0
    guid: 14947267682211456191
    create_txg: 4
    children[0]:
        type: 'raidz'
        id: 0
        guid: 361537219350560701
        nparity: 1
        metaslab_array: 31
        metaslab_shift: 36
        ashift: 12
        asize: 10001923440640
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'file'
            id: 0
            guid: 18296057043113196254
            path: '/tmp/sparse2'
            DTL: 35
            create_txg: 4
            offline: 1
        children[1]:
            type: 'disk'
            id: 1
            guid: 13192250717230911873
            path: '/dev/disk/by-id/wwn-0x50014ee2062a07cd-part1'
            whole_disk: 1
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 7673445363652446830
            path: '/dev/disk/by-id/wwn-0x50014ee25bd8fbcc-part1'
            whole_disk: 1
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 1997560602751946723
            path: '/dev/disk/by-id/wwn-0x50014ee25b1edbc8-part1'
            whole_disk: 1
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 16890030280879643154
            path: '/dev/disk/by-id/wwn-0x50014ee25b7f2562-part1'
            whole_disk: 1
            create_txg: 4

zfs get all myData:

NAME    PROPERTY              VALUE                  SOURCE
myData  type                  filesystem             -
myData  creation              Tue Apr 24  8:15 2012  -
myData  used                  2.05T                  -
myData  available             5.07T                  -
myData  referenced            2.05T                  -
myData  compressratio         1.00x                  -
myData  mounted               yes                    -
myData  quota                 none                   default
myData  reservation           none                   default
myData  recordsize            128K                   default
myData  mountpoint            /myData                default
myData  sharenfs              off                    default
myData  checksum              on                     default
myData  compression           off                    default
myData  atime                 on                     default
myData  devices               on                     default
myData  exec                  on                     default
myData  setuid                on                     default
myData  readonly              off                    default
myData  zoned                 off                    default
myData  snapdir               hidden                 default
myData  aclinherit            restricted             default
myData  canmount              on                     default
myData  xattr                 on                     default
myData  copies                1                      default
myData  version               5                      -
myData  utf8only              off                    -
myData  normalization         none                   -
myData  casesensitivity       sensitive              -
myData  vscan                 off                    default
myData  nbmand                off                    default
myData  sharesmb              off                    default
myData  refquota              none                   default
myData  refreservation        none                   default
myData  primarycache          all                    default
myData  secondarycache        all                    default
myData  usedbysnapshots       0                      -
myData  usedbydataset         2.05T                  -
myData  usedbychildren        9.68M                  -
myData  usedbyrefreservation  0                      -
myData  logbias               latency                default
myData  dedup                 off                    default
myData  mlslabel              none                   default
myData  sync                  standard               default
myData  refcompressratio      1.00x                  -

zpool get all myData:

NAME    PROPERTY       VALUE       SOURCE
myData  size           9.06T       -
myData  capacity       28%         -
myData  altroot        -           default
myData  health         DEGRADED    -
myData  guid           14947267682211456191  default
myData  version        28          default
myData  bootfs         -           default
myData  delegation     on          default
myData  autoreplace    off         default
myData  cachefile      -           default
myData  failmode       wait        default
myData  listsnapshots  off         default
myData  autoexpand     off         default
myData  dedupditto     0           default
myData  dedupratio     1.00x       -
myData  free           6.49T       -
myData  allocated      2.57T       -
myData  readonly       off         -
myData  ashift         12          local

Source: (StackOverflow)

ZFS vdev naming?

I have no idea what is the rationale behind naming the vdev (virtual devices) used while creating zfs pools in Solaris. Suppose, I have a disk c4d0, what is meant by c4d0p0 and c4d0s0? And, how would I know what to use with ZFS commands. I am terribly confused since I keep getting "invalid vdev specified". Any pointers?


Source: (StackOverflow)

Advertisements

How to make Cassandra use two disks on ZFS in SmartOS?

I heard that there's a huge improvement when Cassandra can write it's logfiles to one disk, and the SS Tables to another. I have two disks, and if I was running Linux I would mount each in a different path and configure Cassandra to write on those.

What I would like to know is how to do that in ZFS and SmartOS.

I'm a complete newbie in SmartOS, and from what I understood I add the disks to the storage pool, are they then managed as being one ?


Source: (StackOverflow)

Is there an API to access the ZFS filesystem

I would like to know if there is a way to access the ZFS api (preferably from python but C is fine too). My goal is to write some tools that will monitor my pools but would definitely like to not to have to parse output of zpool command.


Source: (StackOverflow)

is there some distrbuted stroage like Hadoop butwith the advantages of ZFS?

Is there some distributed storage like Hadoop but with the advantages of ZFS?


Source: (StackOverflow)

Does anyone have experience with ZFS? [closed]

I am considering to adopt ZFS and I would be happy to know your experience in both production and testing environment.


Source: (StackOverflow)

what does 128 bit file system mean

in the introduction to ZFS file system, I saw one statement:

ZFS file system is quite scalable, 128 bit filesystem

what does 128 bit filesystem mean? what makes it scalable ?


Source: (StackOverflow)

Can ZFS mirror an entire hard drive in OpenSolaris?

I am upgrading an OpenSolaris development workstation and recently purchased two 500GB SATA hard drives expecting to use the features of the motherboard to mirror the drives. OpenSolaris doesn't recognize the drives when configured to be mirrored through the BIOS but it sees them just fine otherwise. Can ZFS mirror an entire drive and will the mirror be bootable if the primary fails?


Source: (StackOverflow)

How to read a block in a storage pool (zpool) using dd?

I want to read a block in zpool storage pool using dd command. Since zpool doesn't create a device file like other volume manager like vxvm. I dunno which block device to use for reading. Is there any way to read block by block data in zpool ?


Source: (StackOverflow)

Lost memory on Linux - not cached, not buffers

My Ubuntu 12 server is mysteriously losing/wasting memory. It has 64GB of ram. About 46GB are shown as used even when I shutdown all my applications. This memory is not reported as used for buffers or caching.

The result of top (while my apps are running; the apps use about 9G):

top - 21:22:48 up 46 days, 10:12,  1 user,  load average: 0.01, 0.09, 0.12
Tasks: 635 total,   1 running, 633 sleeping,   1 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65960100k total, 55038076k used, 10922024k free,   271700k buffers
Swap:        0k total,        0k used,        0k free,  4860768k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                         
  5303 1002      20   0 26.2g 1.2g  12m S    0  1.8   2:08.21 java                                                                                                                                             
  5263 1003      20   0  9.8g 995m 4544 S    0  1.5   0:19.82 mysqld                                                                                                                                           
  7021 www-data  20   0 3780m  18m 2460 S    0  0.0   8:37.50 apache2                                                                                                                                          
  7022 www-data  20   0 3780m  18m 2540 S    0  0.0   8:38.28 apache2      
  .... (smaller processes)

Note that top reports 4.8G for cached, not 48G, and it's 55G that are used. The result of free -m:

             total       used       free     shared    buffers     cached
Mem:         64414      53747      10666          0        265       4746
-/+ buffers/cache:      48735      15678
Swap:            0          0          0

What is using my memory? I've tried every diagnostic that I could come across. Forums are swamped with people asking the same question because Linux is using their ram for buffers/cache. This doesn't seem to be what is going on here.

It might be relevant that the system is a host for lxc containers. The top and free results reported above are from the host, but similar memory usage is reported within the containers. Stopping all containers does not free up the memory. Some 46G remain in use. However, if I restart the host the memory is free. It doesn't reach the 46G before a while. (I don't know if it takes days or weeks. It takes more than a few hours.)

It might also be relevant that the system is using zfs. Zfs is reputed memory-hungry, but not that much. This system has two zfs filesystems on two raidz pools, one of 1.5T and one of 200G. I have another server that exhibits exactly the same problem (46G used by nothing) and is configured pretty much identically, but with a 3T array instead of 1.5T. I have lots of snapshots (100 or so) for each zfs filesystem. I normally have one snapshot of each filesystem mounted at any time. Unmounting those does not give me back my memory.

I can see that the VIRT numbers in the screenshot above coincide roughly with the memory used, but the memory remains used even after I shutdown these apps--even after I shutdown the container that's running them.

EDIT: I tried adding some swap, and something interesting happened. I added 30G of swap. Moments later, the amount of memory marked as cached in top had increased from 5G to 25G. Free -m indicated about 20G more usable memory. I added another 10G of swap, and cached memory raised to 33G. If I add another 10G of swap, I get 6G more recognized as cached. All this time, only a few kilobytes of swap are reported used. It's as if the kernel needed to have matching swap for every bit that it recognizes or reports as cached. Here is the output of top with 40G of swap:

top - 23:06:45 up 46 days, 11:56,  2 users,  load average: 0.01, 0.12, 0.13
Tasks: 586 total,   1 running, 585 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65960100k total, 64356228k used,  1603872k free,   197800k buffers
Swap: 39062488k total,     3128k used, 39059360k free, 33101572k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                         
 6440 1002      20   0 26.3g 1.5g  11m S    0  2.4   2:02.87 java                                                                                                                                             
 6538 1003      20   0  9.8g 994m 4564 S    0  1.5   0:17.70 mysqld                                                                                                                                           
 4707 dbourget  20   0 27472 8728 1692 S    0  0.0   0:00.38 bash      

Any suggestions highly appreciated.

EDIT 2: Here are the arc* values from /proc/spl/kstat/zfs/arcstats

arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    1531800648
arc_meta_limit                  4    8654946304
arc_meta_max                    4    8661962768

There is no L2ARC activated for ZFS


Source: (StackOverflow)

How do BTRFS and ZFS snapshots work?

More specifically, how do they manage to look at the entire subvolume and remember everything about it (files, sizes of files, folder structure) while fitting it into such a small amount of data.


Source: (StackOverflow)

What is the status of ZFS on Mac OS X?

In 10.5, Apple released a read-only kernel extension/filesystem to allow ZFS pools to be mounted. A later open-source project was hosted at http://zfs.macosforge.org and was announced to be part of 10.6, but shortly before the public release ZFS was yanked, and recently, the Apple-hosted MacOSForge site has closed down the ZFS project.

So, what's the status of ZFS on Mac OS X? Is it worth using? Why would anyone be wanting to use ZFS anyway?


Source: (StackOverflow)

Symfony filesystem ACLs on ZFS

I'm writing Symfony2-based sites on a Ubuntu 12.04 server, with the code itself hosted on a ZFS filesystem partition/zpool. However, the instructions on the Symfony 2 installation page for setting ACLs on the directories (app/logs & app/cache) do not apply, because ZFS does not support the chmod +a or setfacl commands.

Is there a ZFS-compatible version of the below commands?

sudo setfacl -Rn -m u:"$APACHEUSER":rwX -m u:`whoami`:rwX app/cache app/logs
sudo setfacl -dRn -m u:"$APACHEUSER":rwX -m u:`whoami`:rwX app/cache app/logs

Source: (StackOverflow)

zpool replace leaves pool degraded

been having issues with my FreeNAS 9.10 ZFS pool.

One of the drives was being warned as having many bad sectors so I decided to replace it. Had tremendous problems getting 'zpool replace' to recognise the new drive (it was in ada3, but it wouldn't accept that as parameter). Only thing I could figure out was to add the new drive as a 'spare' to the zpool and then use zpool replace [poolname] [old device id] [spare device id]

This worked and resilvered the new drive. However, once the resilver completed the pool remains in degraded state and seems to want the old drive back.

How do I convince it to 'forget' the old drive and accept the new one permanently??

many thanks

  pool: ZFS_NAS
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
  see: http://illumos.org/msg/ZFS-8000-2Q
scan: resilvered 989G in 4h42m with 0 errors on Mon May  2 19:45:33 2016
config:

NAME                                              STATE     READ WRITE CKSUM
ZFS_NAS                                           DEGRADED     0     0     0
  raidz1-0                                        DEGRADED     0     0     0
    spare-0                                       DEGRADED     0     0     0
      12082773611957310038                        UNAVAIL      0     0     0  was /dev/gptid/1418d56c-431b-11e4-b9f7-28924a2f106f
      gptid/503d6d1c-106e-11e6-a169-28924a2f106f  ONLINE       0     0     0
    gptid/1608e28a-431b-11e4-b9f7-28924a2f106f    ONLINE       0     0     0
    gptid/1699dab6-431b-11e4-b9f7-28924a2f106f    ONLINE       0     0     0
spares
  16673430511205791764                            INUSE     was /dev/gptid/503d6d1c-106e-11e6-a169-28924a2f106f

errors: No known data errors

Source: (StackOverflow)

Btrfs Snapshot WITH Backup

Is there a way to backup a btrfs file system by copying the entire disk over at first backup, but then copying over snapshot files in place of using rsync (or is this a bad idea)?


Source: (StackOverflow)