zfs interview questions
Top zfs frequently asked interview questions
Decided to give ZFS on linux v28 a go (native port ubuntu) on my 5x 2TB WD Green advanced format drives.
Created the pool using "zpool create -o ashift=12 raidz1 "
zpool status doesnt show anything unsual.
I did a dd if=/dev/zero into the mounted pool, and i can never reach past 20M/s write. I attempted to rsync a couple hundred gigs of files but even then, 'zpool iostat' gives me a maximum of 20M write. Cpu usage isnt very high and my 8GB of ram is 90% utilised – which I believe is normal.
Reading performance seems optimal to me.
I did play around with zfs_vdev_max/min_pending. As I have AHCI enabled I attempted setting these values to 1 but that reduced my writes down to 10M. Bringing it up to min/max of 4/8 reverted it back to 20M write speed.
Im doing a scrub now, and thats going at 170M/s.
Im thinking there must be something I missed ? Or is this normal ?
Attached are my settings. Ignore the sparse file, saving it to replace with a disk later.
zdb:
myData:
version: 28
name: 'myData'
state: 0
txg: 12
pool_guid: 14947267682211456191
hostname: 'microserver'
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 14947267682211456191
create_txg: 4
children[0]:
type: 'raidz'
id: 0
guid: 361537219350560701
nparity: 1
metaslab_array: 31
metaslab_shift: 36
ashift: 12
asize: 10001923440640
is_log: 0
create_txg: 4
children[0]:
type: 'file'
id: 0
guid: 18296057043113196254
path: '/tmp/sparse2'
DTL: 35
create_txg: 4
offline: 1
children[1]:
type: 'disk'
id: 1
guid: 13192250717230911873
path: '/dev/disk/by-id/wwn-0x50014ee2062a07cd-part1'
whole_disk: 1
create_txg: 4
children[2]:
type: 'disk'
id: 2
guid: 7673445363652446830
path: '/dev/disk/by-id/wwn-0x50014ee25bd8fbcc-part1'
whole_disk: 1
create_txg: 4
children[3]:
type: 'disk'
id: 3
guid: 1997560602751946723
path: '/dev/disk/by-id/wwn-0x50014ee25b1edbc8-part1'
whole_disk: 1
create_txg: 4
children[4]:
type: 'disk'
id: 4
guid: 16890030280879643154
path: '/dev/disk/by-id/wwn-0x50014ee25b7f2562-part1'
whole_disk: 1
create_txg: 4
zfs get all myData:
NAME PROPERTY VALUE SOURCE
myData type filesystem -
myData creation Tue Apr 24 8:15 2012 -
myData used 2.05T -
myData available 5.07T -
myData referenced 2.05T -
myData compressratio 1.00x -
myData mounted yes -
myData quota none default
myData reservation none default
myData recordsize 128K default
myData mountpoint /myData default
myData sharenfs off default
myData checksum on default
myData compression off default
myData atime on default
myData devices on default
myData exec on default
myData setuid on default
myData readonly off default
myData zoned off default
myData snapdir hidden default
myData aclinherit restricted default
myData canmount on default
myData xattr on default
myData copies 1 default
myData version 5 -
myData utf8only off -
myData normalization none -
myData casesensitivity sensitive -
myData vscan off default
myData nbmand off default
myData sharesmb off default
myData refquota none default
myData refreservation none default
myData primarycache all default
myData secondarycache all default
myData usedbysnapshots 0 -
myData usedbydataset 2.05T -
myData usedbychildren 9.68M -
myData usedbyrefreservation 0 -
myData logbias latency default
myData dedup off default
myData mlslabel none default
myData sync standard default
myData refcompressratio 1.00x -
zpool get all myData:
NAME PROPERTY VALUE SOURCE
myData size 9.06T -
myData capacity 28% -
myData altroot - default
myData health DEGRADED -
myData guid 14947267682211456191 default
myData version 28 default
myData bootfs - default
myData delegation on default
myData autoreplace off default
myData cachefile - default
myData failmode wait default
myData listsnapshots off default
myData autoexpand off default
myData dedupditto 0 default
myData dedupratio 1.00x -
myData free 6.49T -
myData allocated 2.57T -
myData readonly off -
myData ashift 12 local
Source: (StackOverflow)
I have no idea what is the rationale behind naming the vdev (virtual devices) used while creating zfs pools in Solaris. Suppose, I have a disk c4d0, what is meant by c4d0p0 and c4d0s0? And, how would I know what to use with ZFS commands. I am terribly confused since I keep getting "invalid vdev specified". Any pointers?
Source: (StackOverflow)
I heard that there's a huge improvement when Cassandra can write it's logfiles to one disk, and the SS Tables to another. I have two disks, and if I was running Linux I would mount each in a different path and configure Cassandra to write on those.
What I would like to know is how to do that in ZFS and SmartOS.
I'm a complete newbie in SmartOS, and from what I understood I add the disks to the storage pool, are they then managed as being one ?
Source: (StackOverflow)
I would like to know if there is a way to access the ZFS api (preferably from python but C is fine too). My goal is to write some tools that will monitor my pools but would definitely like to not to have to parse output of zpool command.
Source: (StackOverflow)
I am considering to adopt ZFS and I would be happy to know your experience in both production and testing environment.
Source: (StackOverflow)
in the introduction to ZFS file system, I saw one statement:
ZFS file system is quite scalable, 128 bit filesystem
what does 128 bit filesystem mean? what makes it scalable ?
Source: (StackOverflow)
I am upgrading an OpenSolaris development workstation and recently purchased two 500GB SATA hard drives expecting to use the features of the motherboard to mirror the drives. OpenSolaris doesn't recognize the drives when configured to be mirrored through the BIOS but it sees them just fine otherwise. Can ZFS mirror an entire drive and will the mirror be bootable if the primary fails?
Source: (StackOverflow)
I want to read a block in zpool storage pool using dd command. Since zpool doesn't create a device file like other volume manager like vxvm. I dunno which block device to use for reading. Is there any way to read block by block data in zpool ?
Source: (StackOverflow)
My Ubuntu 12 server is mysteriously losing/wasting memory. It has 64GB of ram. About 46GB are shown as used even when I shutdown all my applications. This memory is not reported as used for buffers or caching.
The result of top (while my apps are running; the apps use about 9G):
top - 21:22:48 up 46 days, 10:12, 1 user, load average: 0.01, 0.09, 0.12
Tasks: 635 total, 1 running, 633 sleeping, 1 stopped, 0 zombie
Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65960100k total, 55038076k used, 10922024k free, 271700k buffers
Swap: 0k total, 0k used, 0k free, 4860768k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5303 1002 20 0 26.2g 1.2g 12m S 0 1.8 2:08.21 java
5263 1003 20 0 9.8g 995m 4544 S 0 1.5 0:19.82 mysqld
7021 www-data 20 0 3780m 18m 2460 S 0 0.0 8:37.50 apache2
7022 www-data 20 0 3780m 18m 2540 S 0 0.0 8:38.28 apache2
.... (smaller processes)
Note that top reports 4.8G for cached, not 48G, and it's 55G that are used. The result of free -m:
total used free shared buffers cached
Mem: 64414 53747 10666 0 265 4746
-/+ buffers/cache: 48735 15678
Swap: 0 0 0
What is using my memory? I've tried every diagnostic that I could come across. Forums are swamped with people asking the same question because Linux is using their ram for buffers/cache. This doesn't seem to be what is going on here.
It might be relevant that the system is a host for lxc containers. The top and free results reported above are from the host, but similar memory usage is reported within the containers. Stopping all containers does not free up the memory. Some 46G remain in use. However, if I restart the host the memory is free. It doesn't reach the 46G before a while. (I don't know if it takes days or weeks. It takes more than a few hours.)
It might also be relevant that the system is using zfs. Zfs is reputed memory-hungry, but not that much. This system has two zfs filesystems on two raidz pools, one of 1.5T and one of 200G. I have another server that exhibits exactly the same problem (46G used by nothing) and is configured pretty much identically, but with a 3T array instead of 1.5T. I have lots of snapshots (100 or so) for each zfs filesystem. I normally have one snapshot of each filesystem mounted at any time. Unmounting those does not give me back my memory.
I can see that the VIRT numbers in the screenshot above coincide roughly with the memory used, but the memory remains used even after I shutdown these apps--even after I shutdown the container that's running them.
EDIT: I tried adding some swap, and something interesting happened. I added 30G of swap. Moments later, the amount of memory marked as cached in top had increased from 5G to 25G. Free -m indicated about 20G more usable memory. I added another 10G of swap, and cached memory raised to 33G. If I add another 10G of swap, I get 6G more recognized as cached. All this time, only a few kilobytes of swap are reported used. It's as if the kernel needed to have matching swap for every bit that it recognizes or reports as cached. Here is the output of top with 40G of swap:
top - 23:06:45 up 46 days, 11:56, 2 users, load average: 0.01, 0.12, 0.13
Tasks: 586 total, 1 running, 585 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65960100k total, 64356228k used, 1603872k free, 197800k buffers
Swap: 39062488k total, 3128k used, 39059360k free, 33101572k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6440 1002 20 0 26.3g 1.5g 11m S 0 2.4 2:02.87 java
6538 1003 20 0 9.8g 994m 4564 S 0 1.5 0:17.70 mysqld
4707 dbourget 20 0 27472 8728 1692 S 0 0.0 0:00.38 bash
Any suggestions highly appreciated.
EDIT 2: Here are the arc* values from /proc/spl/kstat/zfs/arcstats
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 1531800648
arc_meta_limit 4 8654946304
arc_meta_max 4 8661962768
There is no L2ARC activated for ZFS
Source: (StackOverflow)
More specifically, how do they manage to look at the entire subvolume and remember everything about it (files, sizes of files, folder structure) while fitting it into such a small amount of data.
Source: (StackOverflow)
In 10.5, Apple released a read-only kernel extension/filesystem to allow ZFS pools to be mounted. A later open-source project was hosted at http://zfs.macosforge.org and was announced to be part of 10.6, but shortly before the public release ZFS was yanked, and recently, the Apple-hosted MacOSForge site has closed down the ZFS project.
So, what's the status of ZFS on Mac OS X? Is it worth using? Why would anyone be wanting to use ZFS anyway?
Source: (StackOverflow)
I'm writing Symfony2-based sites on a Ubuntu 12.04 server, with the code itself hosted on a ZFS filesystem partition/zpool. However, the instructions on the Symfony 2 installation page for setting ACLs on the directories (app/logs & app/cache) do not apply, because ZFS does not support the chmod +a
or setfacl
commands.
Is there a ZFS-compatible version of the below commands?
sudo setfacl -Rn -m u:"$APACHEUSER":rwX -m u:`whoami`:rwX app/cache app/logs
sudo setfacl -dRn -m u:"$APACHEUSER":rwX -m u:`whoami`:rwX app/cache app/logs
Source: (StackOverflow)
been having issues with my FreeNAS 9.10 ZFS pool.
One of the drives was being warned as having many bad sectors so I decided to replace it. Had tremendous problems getting 'zpool replace' to recognise the new drive (it was in ada3, but it wouldn't accept that as parameter). Only thing I could figure out was to add the new drive as a 'spare' to the zpool and then use zpool replace [poolname] [old device id] [spare device id]
This worked and resilvered the new drive. However, once the resilver completed the pool remains in degraded state and seems to want the old drive back.
How do I convince it to 'forget' the old drive and accept the new one permanently??
many thanks
pool: ZFS_NAS
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: resilvered 989G in 4h42m with 0 errors on Mon May 2 19:45:33 2016
config:
NAME STATE READ WRITE CKSUM
ZFS_NAS DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 0
12082773611957310038 UNAVAIL 0 0 0 was /dev/gptid/1418d56c-431b-11e4-b9f7-28924a2f106f
gptid/503d6d1c-106e-11e6-a169-28924a2f106f ONLINE 0 0 0
gptid/1608e28a-431b-11e4-b9f7-28924a2f106f ONLINE 0 0 0
gptid/1699dab6-431b-11e4-b9f7-28924a2f106f ONLINE 0 0 0
spares
16673430511205791764 INUSE was /dev/gptid/503d6d1c-106e-11e6-a169-28924a2f106f
errors: No known data errors
Source: (StackOverflow)
Is there a way to backup a btrfs file system by copying the entire disk over at first backup, but then copying over snapshot files in place of using rsync (or is this a bad idea)?
Source: (StackOverflow)