EzDevInfo.com

freebsd interview questions

Top freebsd frequently asked interview questions

FreeBSD performance tuning: Sysctl parameter, loader.conf, kernel

I wanted to share knowledge of tuning FreeBSD via sysctl.conf / loader.conf / KENCONF / etc. It was initially based on Igor Sysoev's (author of nginx) presentation about FreeBSD tuning up to 100,000-200,000 active connections. Newer versions of FreeBSD can handle much much more.

Tunings are for FreeBSD7 - FreeBSD-CURRENT. Since 7.2 amd64 some of them are tuned well by default. Prior 7.0 some of them are boot only (set via /boot/loader.conf) or do not exist at all.

sysctl.conf:

# No zero mapping feature
# May break wine
# (There are also reports about broken samba3)
#security.bsd.map_at_zero=0

# Servers with threading software apache2 / Pound may want to rise following sysctl
#kern.threads.max_threads_per_proc=4096

# Max backlog size
# Note Application can still limit it by passing second argument to listen(2) syscall
# Note: Listen queue be monitored via `netstat -Lan`
kern.ipc.somaxconn=4096

# Shared memory
# Note: Only FreeBSD 7.2+ can use shared memory > 2Gb
#kern.ipc.shmmax=2147483648

# Sockets
kern.ipc.maxsockets=204800

# Mbuf 2k clusters (on amd64 7.2+ 25600 is default)
# Note: defaults for other variables depend on this variable, for example `tcpreass`
# Note: FreeBSD-7 and older: For such high value vm.kmem_size must be increased to 3G
kern.ipc.nmbclusters=262144

# Jumbo pagesize(_SC_PAGESIZE)/9k/16k clusters
# Used as general packet storage for jumbo frames on some network cards
# Can be monitored via `netstat -m`
#kern.ipc.nmbjumbop=262144
#kern.ipc.nmbjumbo9=65536
#kern.ipc.nmbjumbo16=32768

# For lower latency you can decrease schedulers maximum time slice
# default: stathz/10 (~ 13)
kern.sched.slice=1

# Increase max command-line length showed in `ps` (e.g for Tomcat/Java)
# Default is PAGE_SIZE / 16 or 256 on x86
# This avoids commands to be presented as [executable] in `ps`
# For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=120749
kern.ps_arg_cache_limit=4096

# Every socket is a file, so increase them
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.maxvnodes=200000

# On some systems HPET is almost 2 times faster than default ACPI-fast
# Useful on systems with lots of clock_gettime / gettimeofday calls
# See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html
# After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222
#kern.timecounter.hardware=HPET


# Small receive space, only usable on http-server
# Note: fileservers should increase it to 65535 or even more
#net.inet.tcp.recvspace=8192

# This is useful on Fat-Long-Pipes
#kern.ipc.maxsockbuf=10485760
#net.inet.tcp.recvbuf_max=10485760
#net.inet.tcp.recvbuf_inc=65535

# Small send space is useful for http servers that serve small files 
# Note: Autotuned since 7.x
#net.inet.tcp.sendspace=16384

# This is useful on Fat-Long-Pipes
#net.inet.tcp.sendbuf_max=10485760
#net.inet.tcp.sendbuf_inc=65535

# Turn off send/receive autotuning if think you know better.
#net.inet.tcp.recvbuf_auto=0
#net.inet.tcp.sendbuf_auto=0

# This should be enabled if you going to use big spaces (>64k)
# Also timestamp field is useful when using syncookies
net.inet.tcp.rfc1323=1
# Turn this off on high-speed, lossless connections (LAN 1Gbit+)
#net.inet.tcp.delayed_ack=0

# This feature is useful if you are serving data over modems, Gigabit Ethernet, 
# or even high speed WAN links (or any other link with a high bandwidth delay product), 
# especially if you are also using window scaling or have configured a large send window.
# Automatically disables on small RTT ( http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_subr.c?#rev1.237 )
# This sysctl was removed in 10-CURRENT:
# See: http://www.mail-archive.com/svn-src-head@freebsd.org/msg06178.html
#net.inet.tcp.inflight.enable=0

# TCP slowstart algorithm tunings
# Here we are assuming VERY uncongested network
# Note: Only takes effect if net.inet.tcp.rfc3390 is set to 0,
#       otherwise formula taken from http://tools.ietf.org/html/rfc3390
#net.inet.tcp.slowstart_flightsize=10
#net.inet.tcp.local_slowstart_flightsize=100

# Disable randomizing of ports to avoid false RST
# Before use check SA here www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf
# Note: Port randomization autodisables at high connection rates
#net.inet.ip.portrange.randomized=0

# Increase portrange
# For outgoing connections only. Good for seed-boxes and ftp servers.
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535

# Dtops route cache degradation during a DDoS.
# http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html
#net.inet.ip.rtexpire=2
net.inet.ip.rtminexpire=2
net.inet.ip.rtmaxcache=1024

# Security
net.inet.ip.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.icmp.maskrepl=0
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.drop_synfin=1
# 
# There is also good example of sysctl.conf with comments:
# http://www.thern.org/projects/sysctl.conf
#
# icmp may NOT rst, helpful for those pesky spoofed 
# icmp/udp floods that end up taking up your outgoing
# bandwidth/ifqueue due to all that outgoing RST traffic.
#
#net.inet.tcp.icmp_may_rst=0

# Security
# Do not send responses on attempts to connect to the closed ports
#net.inet.udp.blackhole=1
#net.inet.tcp.blackhole=2

# IPv6 Security
# For more info see http://www.fosslc.org/drupal/content/security-implications-ipv6
# Disable Node info replies
# To see this vulnerability in action run `ping6 -a sglAac ::1` or `ping6 -w ::1` on unprotected node
net.inet6.icmp6.nodeinfo=0
# Turn on IPv6 privacy extensions
# For more info see proposal http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2008-06/msg00103.html
net.inet6.ip6.use_tempaddr=1
net.inet6.ip6.prefer_tempaddr=1
# Disable ICMP redirect
net.inet6.icmp6.rediraccept=0
# Disable acceptation of RA and auto link-local generation if you don't use them
#net.inet6.ip6.accept_rtadv=0
#net.inet6.ip6.auto_linklocal=0

# Increases default TTL
# Default is 64
#net.inet.ip.ttl=128

# Lessen max segment life to conserve resources
# ACK waiting time in milliseconds
# (default: 30000. RFC from 1979 recommends 120000)
net.inet.tcp.msl=5000

# Max number of time-wait sockets
net.inet.tcp.maxtcptw=200000
# Don't use tw on local connections
# As of 15 Apr 2009. Igor Sysoev says that nolocaltimewait has some buggy implementaion.
# So disable it or now till get fixed
#net.inet.tcp.nolocaltimewait=1

# FIN_WAIT_2 state fast recycle
net.inet.tcp.fast_finwait2_recycle=1

# Time before tcp keepalive probe is sent
# default is 2 hours (7200000)
#net.inet.tcp.keepidle=60000

# Use HTCP congestion control (don't forget to load cc_htcp kernel module)
net.inet.tcp.cc.algorithm=htcp

# Should be increased until net.inet.ip.intr_queue_drops is zero
net.inet.ip.intr_queue_maxlen=4096

# Protocol decoding in interrupt thread.
# If you have NIC that automatically sets flow_id then it's better to not
# use direct_force, and use advantages of multithreaded netisr(9)
# If you have Yandex drives you better off with `net.isr.direct_force=1` and
# `net.inet.tcp.read_locking=0` otherwise you may run into some TCP related
# problems.
# Note: If you have old NIC that don't set flow_ids you may need to
# patch `ip_input` to manually set FLOW_ID via `nh_m2flow`.
#
# FreeBSD 8+
#net.isr.direct=1
#net.isr.direct_force=1
# In FreeBSD 9+ it was renamed to
#net.isr.dispatch=direct

# This is for routers only
#net.inet.ip.forwarding=1
#net.inet.ip.fastforwarding=1

# This speed ups dummynet when channel isn't saturated
net.inet.ip.dummynet.io_fast=1
# Increase dummynet(4) hash
#net.inet.ip.dummynet.hash_size=65535
#net.inet.ip.dummynet.max_chain_len=8

# Should be increased when you have A LOT of files on server 
# (Increase until vfs.ufs.dirhash_mem becomes lower)
vfs.ufs.dirhash_maxmem=67108864

# Note from commit http://svn.freebsd.org/base/head@211031 :
# For systems with RAID volumes and/or virtualization environments, where
# read performance is very important, increasing this sysctl tunable to 32
# or even more will demonstratively yield additional performance benefits.
vfs.read_max=32

# Explicit Congestion Notification
# (See http://en.wikipedia.org/wiki/Explicit_Congestion_Notification)
net.inet.tcp.ecn.enable=1

# Flowtable - flow caching mechanism
# Useful for routers
#net.inet.flowtable.enable=1
#net.inet.flowtable.nmbflows=65535

# IPFW dynamic rules and timeouts tuning
# Increase dyn_buckets till net.inet.ip.fw.curr_dyn_buckets is lower
net.inet.ip.fw.dyn_buckets=65536
net.inet.ip.fw.dyn_max=65536
net.inet.ip.fw.dyn_ack_lifetime=120
net.inet.ip.fw.dyn_syn_lifetime=10
net.inet.ip.fw.dyn_fin_lifetime=2
net.inet.ip.fw.dyn_short_lifetime=10
# Make packets pass firewall only once when using dummynet
# i.e. packets going thru pipe are passing out from firewall with accept
#net.inet.ip.fw.one_pass=1

# shm_use_phys Wires all shared pages, making them unswappable
# Use this to lessen Virtual Memory Manager's work when using Shared Mem.
# Useful for databases
#kern.ipc.shm_use_phys=1

# ZFS
# Enable prefetch. Useful for sequential load type i.e fileserver.
# FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386 systems and 
# on any amd64 systems with less than 4GB of available memory
# See: http://old.nabble.com/Samba-read-speed-performance-tuning-td27964534.html
#vfs.zfs.prefetch_disable=0

# On highload servers you may notice following message in dmesg:
# "Approaching the limit on PV entries, consider increasing either the
# vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable"   
vm.pmap.shpgperproc=2048

loader.conf:

# Accept filters for data, http and DNS requests
# Useful when your software creates process/thread on each request (i.e. apache)
# Note: DNS accf available on 8.0+
# Note: In case of badly written software this can increase performance, 
# but I still would recommend against using accept filters in production because of
# their opacity - they really break abstractions. Also it's not trivial to debug/monitor
# their state.
#accf_data_load="YES" 
#accf_http_load="YES"
#accf_dns_load="YES"

# Async IO system calls
aio_load="YES"

# Linux specific devices in /dev
# As for 8.1 it only /dev/full 
#lindev_load="YES"

# Adds NCQ support in FreeBSD
# WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+
# 8.0+ only
#ahci_load="YES"
#siis_load="YES"

# FreeBSD 9+
# New Congestion Control for FreeBSD
cc_htcp_load="YES"
#cc_cubic_load="YES"

# Increase kernel memory size to 3G. 
#
# Use ONLY if you have KVA_PAGES in kernel configuration, and you have more than 3G RAM 
# Otherwise panic will happen on next reboot!
#
# It's required for high buffer sizes: kern.ipc.nmbjumbop, kern.ipc.nmbclusters, etc
# Useful on highload stateful firewalls, proxies or ZFS fileservers
# (FreeBSD 7.2+ amd64 users: Check that current value is lower!)
#vm.kmem_size="3G"

# If you have really busy forking webserver (i.e. apache13) you may run out of processes
#kern.maxproc=10000

# If your server has lots of swap (>4Gb) you should increase following value
# according to http://lists.freebsd.org/pipermail/freebsd-hackers/2009-October/029616.html
# Otherwise you'll be getting errors
# "kernel: swap zone exhausted, increase kern.maxswzone"
#kern.maxswzone="256M" 

# Older versions of FreeBSD can't tune maxfiles on the fly
#kern.maxfiles="200000"

# Useful for databases 
# Sets maximum data size to 1G
# (FreeBSD 7.2+ amd64 users: Check that current value is lower!)
#kern.maxdsiz="1G"

# Maximum buffer size(vfs.maxbufspace)
# You can check current one via vfs.bufspace
# Should be lowered/upped depending on server's load-type
# Usually decreased to preserve kmem
# (default is 10% of mem)
#kern.maxbcache="512M"

# Sendfile buffers
# Note: i386 only
#kern.ipc.nsfbufs=10240

# syncache tuning
net.inet.tcp.syncache.hashsize=32768
net.inet.tcp.syncache.bucketlimit=32
net.inet.tcp.syncache.cachelimit=1048576

# Send RST on listen queue overflow / memory shortage. 
# Hosts behind Load-Balancer should set it to 1 to fail fast.
# Hosts facing clients should set it to 0 for client to retry connection.
#net.inet.tcp.syncache.rst_on_sock_fail=0

# Increased hostcache
# Later host cache can be viewed via net.inet.tcp.hostcache.list hidden sysctl
# Very useful for it's RTT RTTVAR
# Must be power of two
net.inet.tcp.hostcache.hashsize=65536
# hashsize * bucketlimit (which is 30 by default)
# It allocates 255Mb (1966080*136) of RAM
net.inet.tcp.hostcache.cachelimit=1966080

# TCP control-block Hash table tuning
# See: http://serverfault.com/questions/372512/why-change-net-inet-tcp-tcbhashsize-in-freebsd
net.inet.tcp.tcbhashsize=524288

# Disable ipfw deny all
# Should be uncommented when there is a chance that
# kernel and ipfw binary may be out-of sync on next reboot
#net.inet.ip.fw.default_to_accept=1

#
# SIFTR (Statistical Information For TCP Research) is a kernel module that
# logs a range of statistics on active TCP connections to a log file.
# See prerelease notes:
# http://groups.google.com/group/mailing.freebsd.current/browse_thread/thread/b4c18be6cdce76e4
# and man 4 sitfr
#siftr_load="YES"

# Enable superpages, for 7.2+ only
# See: http://lists.freebsd.org/pipermail/freebsd-hackers/2009-November/030094.html
vm.pmap.pg_ps_enabled=1

# Useful if you are using Intel-Gigabit NIC
#hw.em.rxd=4096
#hw.em.txd=4096
#hw.em.rx_process_limit=-1
# Also if you have A LOT interrupts on NIC - play with following parameters
# NOTE: You should set them for every NIC
#dev.em.0.rx_int_delay: 250
#dev.em.0.tx_int_delay: 250
#dev.em.0.rx_abs_int_delay: 250
#dev.em.0.tx_abs_int_delay: 250
# There is also multithreaded version of em/igb drivers that can be found here:
# http://people.yandex-team.ru/~wawa/
#
# for additional em monitoring and statistics use 
# sysctl dev.em.0.stats=1 ; dmesg
# sysctl dev.em.0.debug=1 ; dmesg
# Also after r209242 (-CURRENT) there is a separate sysctl for each stat variable;   
# Same tunings for igb
#hw.igb.rxd=4096
#hw.igb.txd=4096
#hw.igb.rx_process_limit=-1

# Some useful netisr tunables. See sysctl net.isr
#net.isr.maxthreads=4
#net.isr.defaultqlimit=10240
#net.isr.maxqlimit=10240
# Bind netisr threads to CPUs
#net.isr.bindthreads=1

#
# FreeBSD 9.x+
# Increase interface send queue length
# See commit message http://svn.freebsd.org/viewvc/base?view=revision&revision=207554
#net.link.ifqmaxlen=1024

# Nicer boot logo =)
loader_logo="beastie"

And finally here is KERNCONF:

# Just some of them, see also
# cat /sys/{i386,amd64,}/conf/NOTES

# This one useful only on i386
#options         KVA_PAGES=512
# From UPDATING 20121223:
#    After switching to Clang as the default compiler some users of ZFS
#    on i386 systems started to experience stack overflow kernel panics.
#    Please consider using 'options KSTACK_PAGES=4' in such configurations.
#options         KSTACK_PAGES=4

# You can play with HZ in environments with high interrupt rate (default is 1000) 
# 100 is for my notebook to prolong it's battery life
#options         HZ=100

# Eliminate datacopy on socket read-write
# To take advantage with zero copy sockets you should have an MTU >= 4k
# This req. is only for receiving data.
# Read more in man zero_copy_sockets
# Also this epic thread on kernel trap:
#    http://kerneltrap.org/node/6506
# In conclusion Linus says:
#    "anybody that does it that way (FreeBSD) is totally incompetent"
#
# Also see /usr/src/UPDATING 20121023 for notes about
# SOCKET_SEND_COW and SOCKET_RECV_PFLIP
#options         ZERO_COPY_SOCKETS

# Support TCP sign. Used for IPSec
options         TCP_SIGNATURE
# There was stackoverflow found in KAME IPSec stack:
# See http://secunia.com/advisories/43995/
# For quick workaround you can use `ipfw add deny proto ipcomp`
options         IPSEC

# This ones can be loaded as modules. They described in loader.conf section     
#options         ACCEPT_FILTER_DATA
#options         ACCEPT_FILTER_HTTP

# Adding ipfw, also can be loaded as modules
options         IPFIREWALL
# On 8.1+ you can disable verbose to see blocked packets on ipfw0 interface.
# Also there is no point in compiling verbose into the kernel, because
# now there is net.inet.ip.fw.verbose tunable.
#options         IPFIREWALL_VERBOSE
#options         IPFIREWALL_VERBOSE_LIMIT=10
# The IPFIREWALL_FORWARD kernel option has been removed. Its
# functionality now turned on by default.
#options         IPFIREWALL_FORWARD
# Adding kernel NAT
options         IPFIREWALL_NAT
options         LIBALIAS
# Traffic shaping
options         DUMMYNET          
# Divert, i.e. for userspace NAT
options         IPDIVERT

# This is for OpenBSD's pf firewall
device          pf
device          pflog
# pf's QoS - ALTQ
options         ALTQ
options         ALTQ_CBQ        # Class Bases Queuing (CBQ)
options         ALTQ_RED        # Random Early Detection (RED)
options         ALTQ_RIO        # RED In/Out
options         ALTQ_HFSC       # Hierarchical Packet Scheduler (HFSC)
options         ALTQ_PRIQ       # Priority Queuing (PRIQ)
options         ALTQ_NOPCC      # Required for SMP build

# Pretty console 
# Manual can be found here http://forums.freebsd.org/showthread.php?t=6134
#options         VESA
#options         SC_PIXEL_MODE

# Disable reboot on Ctrl Alt Del
#options         SC_DISABLE_REBOOT
# Change normal|kernel messages color
options         SC_NORM_ATTR=(FG_GREEN|BG_BLACK)
options         SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK)
# More scroll space
options         SC_HISTORY_SIZE=8192

# Adding hardware crypto device
device          crypto
device          cryptodev

# Useful network interfaces
device          vlan
device          tap                     #Virtual Ethernet driver
device          gre                     #IP over IP tunneling
device          if_bridge               #Bridge interface
device          pfsync                  #synchronization interface for PF
device          carp                    #Common Address Redundancy Protocol
device          enc                     #IPsec interface
device          lagg                    #Link aggregation interface
device          stf                     #IPv4-IPv6 port

# Also for my notebook, but may be used with Opteron
device         amdtemp
# Same for Intel processors
device         coretemp

# man 4 cpuctl
device         cpuctl                   # CPU control pseudo-device

# Support for ECMP. More than one route for destination
# Works even with default route so one can use it as LB for two ISP
# For now code is unstable and panics (panic: rtfree 2) on route deletions.
#options         RADIX_MPATH

# Multicast routing
#options         MROUTING
#options         PIM

# Debug & DTrace
options        KDB                     # Kernel debugger related code
options        KDB_TRACE               # Print a stack trace for a panic
options        KDTRACE_FRAME           # amd64-only(?)
options        KDTRACE_HOOKS           # all architectures - enable general DTrace hooks
#options        DDB
#options        DDB_CTF                 # all architectures - kernel ELF linker loads CTF data

# Adaptive spining in lockmgr (8.x+)
# See http://www.mail-archive.com/svn-src-all@freebsd.org/msg10782.html
options         ADAPTIVE_LOCKMGRS

# UTF-8 in console (8.x+) 
#options         TEKEN_UTF8

# FreeBSD 8.1+
# Deadlock resolver thread 
# For additional information see http://www.mail-archive.com/svn-src-all@freebsd.org/msg18124.html 
# (FYI: "resolution" is panic so use with caution)
#options         DEADLKRES

# Increase maximum size of Raw I/O and sendfile(2) readahead
#options MAXPHYS=(1024*1024)
#options MAXBSIZE=(1024*1024)

# For scheduler debug enable following option.
# Debug will be available via `kern.sched.stats` sysctl
# For more information see http://svnweb.freebsd.org/base/head/sys/conf/NOTES?view=markup
#options SCHED_STATS

# A framework for very efficient packet I/O from userspace, capable of 
# line rate at 10G (FreeBSD10+)
# See http://svnweb.freebsd.org/base?view=revision&revision=227614
#device netmap

If you are tuning network for maximum performance you may wish to play with ifconfig options like:

# You can list all capabilities via `ifconfig -m`
ifconfig [-]rxcsum [-]txcsum [-]tso [-]lro mtu

In case you've enabled DDB in kernel config, you should edit your /etc/ddb.conf and add something like this to enable automatic reboot (and textdump as bonus):

script kdb.enter.panic=textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; call doadump; reset
script kdb.enter.default=textdump set; capture on; bt; ps; capture off; call doadump; reset

And do not forget to add ddb_enable="YES" to /etc/rc.conf

Since FreeBSD 9 you can select to enable/disable flowcontrol on your NIC:

# See http://en.wikipedia.org/wiki/Ethernet_flow_control and
# http://www.mail-archive.com/svn-src-head@freebsd.org/msg07927.html for additional info
ifconfig bge0 media auto mediaopt flowcontrol

Most of FreeBSD's limits can be monitored by:

# vmstat -z

and

# limits

Variety of network counters can be monitored via

# netstat -s

In FreeBSD-8+ netstat's -Q option appeared, try following command to display netisr stats

# netstat -Q

For solving non-trivial TCP problems one can use net.inet.tcp.log_debug, it produces dmesg output similar to:

host kernel: TCP: [0.0.0.0]:0 to [1.1.1.1]:1; syncache_socket: Socket create failed due to limits or memory shortage
host kernel: TCP: [0.0.0.0]:0 to [1.1.1.1]:1 tcpflags 0x10<ACK>; tcp_input: Listen socket: Socket allocation failed due to limits or memory shortage, sending RST

NB!
Last but not the least: if you are into network tuning - it's good practice to buy the best network card you can afford. I personally prefer Intel's igb(4), list of models can be found in if_igb.c

PS. also see

# man 7 tuning

And FreeBSD Wiki on network performance tuning made by developers themselves.

PPS. Calomel.org - Open Source Research and Reference blog has nice write up about network performance and recent article about FreeBSD Tuning and Optimization.

Thanks
I wanted to thank FreeBSD community, especially author of nginx - Igor Sysoev, nginx-ru@ and FreeBSD-performance@ mailing lists for providing useful information about FreeBSD tuning. Yandex BSD lovers from noc@ and search-admin@, especially melifaro@ and zont@.

Disclaimer
This is definitely not something that you should copy/paste into your production configs! Some of provided "tunings" can be even harmful. Use provided data as reference for further investigation or A/B testing. I say it again just to be explicit: DO NOT BLINDLY APPLY "TUNINGS" YOU'VE FOUND ON INTERNET!.
Before applying any sysctl on production system you should investigate it's impact (it's essential to look in kernel's source code) and measure it's performance benefits(if any) in testing environment.
Use this post at your own risk.

FreeBSD WIP
* Whats cooking for FreeBSD 7?
* Whats cooking for FreeBSD 8?
* Whats cooking for FreeBSD 9?
* What's new for FreeBSD 10?
* What's new for FreeBSD 11?

Question to viewers
What tunings are you using on yours FreeBSD servers?

You can also post your /etc/sysctl.conf, /boot/loader.conf, kernel options, etc with description of its' meaning (do not copy-paste from sysctl -d). Don't forget to specify server type (frontend, backend, cache, db, storage, gateway, etc)

Let's share experience!


Source: (StackOverflow)

ZFS on FreeBSD: recovery from data corruption

I have several TBs of very valuable personal data in a zpool which I can not access due to data corruption. The pool was originally set up back in 2009 or so on a FreeBSD 7.2 system running inside a VMWare virtual machine on top of a Ubuntu 8.04 system. The FreeBSD VM is still available and running fine, only the host OS has now changed to Debian 6. The hard drives are made accessible to the guest VM by means of VMWare generic SCSI devices, 12 in total.

There are 2 pools:

  • zpool01: 2x 4x 500GB
  • zpool02: 1x 4x 160GB

The one that works is empty, the broken one holds all the important data:

[user@host~]$ uname -a
FreeBSD host.domain 7.2-RELEASE FreeBSD 7.2-RELEASE #0: \
  Fri May  1 07:18:07 UTC 2009                          \
  root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

[user@host ~]$ dmesg | grep ZFS
WARNING: ZFS is considered to be an experimental feature in FreeBSD.
ZFS filesystem version 6
ZFS storage pool version 6

[user@host ~]$ sudo zpool status
  pool: zpool01
 state: UNAVAIL
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    zpool01     UNAVAIL      0     0     0  insufficient replicas
      raidz1    UNAVAIL      0     0     0  corrupted data
        da5     ONLINE       0     0     0
        da6     ONLINE       0     0     0
        da7     ONLINE       0     0     0
        da8     ONLINE       0     0     0
      raidz1    ONLINE       0     0     0
        da1     ONLINE       0     0     0
        da2     ONLINE       0     0     0
        da3     ONLINE       0     0     0
        da4     ONLINE       0     0     0

  pool: zpool02
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    zpool02     ONLINE       0     0     0
      raidz1    ONLINE       0     0     0
        da9     ONLINE       0     0     0
        da10    ONLINE       0     0     0
        da11    ONLINE       0     0     0
        da12    ONLINE       0     0     0

errors: No known data errors

I was able to access the pool a couple of weeks ago. Since then, I had to replace pretty much all of the hardware of the host machine and install several host operating systems.

My suspicion is that one of these OS installations wrote a bootloader (or whatever) to one (the first ?) of the 500GB drives and destroyed some zpool metadata (or whatever) - 'or whatever' meaning that this is just a very vague idea and that subject is not exactly my strong side...


There is plenty of websites, blogs, mailing lists, etc. about ZFS. I post this question here in the hope that it helps me gather enough information for a sane, structured, controlled, informed, knowledgeable approach to get my data back - and hopefully help someone else out there in the same situation.


The first search result when googling for 'zfs recover' is the ZFS Troubleshooting and Data Recovery chapter from Solaris ZFS Administration Guide. In the first ZFS Failure Modes section, it says in the 'Corrupted ZFS Data' paragraph:

Data corruption is always permanent and requires special consideration during repair. Even if the underlying devices are repaired or replaced, the original data is lost forever.

Somewhat disheartening.

However, the second google search result is Max Bruning's weblog and in there, I read

Recently, I was sent an email from someone who had 15 years of video and music stored in a 10TB ZFS pool that, after a power failure, became defective. He unfortunately did not have a backup. He was using ZFS version 6 on FreeBSD 7 [...] After spending about 1 week examining the data on the disk, I was able to restore basically all of it.

and

As for ZFS losing your data, I doubt it. I suspect your data is there, but you need to find the right way to get at it.

(that sounds so much more like something I wanna hear...)

First step: What exactly is the problem ?

How can I diagnose why exactly the zpool is reported as corrupted ? I see there is zdb which doesn't seem to be officially documented by Sun or Oracle anywhere on the web. From its man page:

NAME
       zdb - ZFS debugger

SYNOPSIS
       zdb pool

DESCRIPTION
       The  zdb  command is used by support engineers to diagnose failures and
       gather statistics. Since the ZFS file system is  always  consistent  on
       disk  and is self-repairing, zdb should only be run under the direction
       by a support engineer.

       If no arguments are specified, zdb, performs basic  consistency  checks
       on  the pool and associated datasets, and report any problems detected.

       Any options supported by this command are internal to Sun  and  subject
       to change at any time.

Further, Ben Rockwood has posted a detailed article and there is a video of Max Bruning talking about it (and mdb) at the Open Solaris Developer Conference in Prague on June 28, 2008.

Running zdb as root on the broken zpool gives the following output:

[user@host ~]$ sudo zdb zpool01
    version=6
    name='zpool01'
    state=0
    txg=83216
    pool_guid=16471197341102820829
    hostid=3885370542
    hostname='host.domain'
    vdev_tree
        type='root'
        id=0
        guid=16471197341102820829
        children[0]
                type='raidz'
                id=0
                guid=48739167677596410
                nparity=1
                metaslab_array=14
                metaslab_shift=34
                ashift=9
                asize=2000412475392
                children[0]
                        type='disk'
                        id=0
                        guid=4795262086800816238
                        path='/dev/da5'
                        whole_disk=0
                        DTL=202
                children[1]
                        type='disk'
                        id=1
                        guid=16218262712375173260
                        path='/dev/da6'
                        whole_disk=0
                        DTL=201
                children[2]
                        type='disk'
                        id=2
                        guid=15597847700365748450
                        path='/dev/da7'
                        whole_disk=0
                        DTL=200
                children[3]
                        type='disk'
                        id=3
                        guid=9839399967725049819
                        path='/dev/da8'
                        whole_disk=0
                        DTL=199
        children[1]
                type='raidz'
                id=1
                guid=8910308849729789724
                nparity=1
                metaslab_array=119
                metaslab_shift=34
                ashift=9
                asize=2000412475392
                children[0]
                        type='disk'
                        id=0
                        guid=5438331695267373463
                        path='/dev/da1'
                        whole_disk=0
                        DTL=198
                children[1]
                        type='disk'
                        id=1
                        guid=2722163893739409369
                        path='/dev/da2'
                        whole_disk=0
                        DTL=197
                children[2]
                        type='disk'
                        id=2
                        guid=11729319950433483953
                        path='/dev/da3'
                        whole_disk=0
                        DTL=196
                children[3]
                        type='disk'
                        id=3
                        guid=7885201945644860203
                        path='/dev/da4'
                        whole_disk=0
                        DTL=195
zdb: can't open zpool01: Invalid argument

I suppose the 'invalid argument' error at the end occurs because the zpool01 does not actually exist: It doesn't occur on the working zpool02, but there doesn't seem to be any further output either...

OK, at this stage, it is probably better to post this before the article gets too long.

Maybe someone can give me some advice on how to move forward from here and while I'm waiting for a response, I'll watch the video, go through the details of the zdb output above, read Bens article and try to figure out what's what...


20110806-1600+1000

Update 01:

I think I have found the root cause: Max Bruning was kind enough to respond to an email of mine very quickly, asking for the output of zdb -lll. On any of the 4 hard drives in the 'good' raidz1 half of the pool, the output is similar to what I posted above. However, on the first 3 of the 4 drives in the 'broken' half, zdb reports failed to unpack label for label 2 and 3. The fourth drive in the pool seems OK, zdb shows all labels.

Googling that error message brings up this post. From the first response to that post:

With ZFS, that are 4 identical labels on each physical vdev, in this case a single hard drive. L0/L1 at the start of the vdev, and L2/L3 at the end of the vdev.

All 8 drives in the pool are of the same model, Seagate Barracuda 500GB. However, I do remember I started the pool with 4 drives, then one of them died and was replaced under warranty by Seagate. Later on, I added another 4 drives. For that reason, the drive and firmware identifiers are different:

[user@host ~]$ dmesg | egrep '^da.*?: <'
da0:  <VMware, VMware Virtual S 1.0> Fixed Direct Access SCSI-2 device 
da1:  <ATA ST3500418AS CC37> Fixed Direct Access SCSI-5 device 
da2:  <ATA ST3500418AS CC37> Fixed Direct Access SCSI-5 device 
da3:  <ATA ST3500418AS CC37> Fixed Direct Access SCSI-5 device 
da4:  <ATA ST3500418AS CC37> Fixed Direct Access SCSI-5 device 
da5:  <ATA ST3500320AS SD15> Fixed Direct Access SCSI-5 device 
da6:  <ATA ST3500320AS SD15> Fixed Direct Access SCSI-5 device 
da7:  <ATA ST3500320AS SD15> Fixed Direct Access SCSI-5 device 
da8:  <ATA ST3500418AS CC35> Fixed Direct Access SCSI-5 device 
da9:  <ATA SAMSUNG HM160JC AP10> Fixed Direct Access SCSI-5 device 
da10: <ATA SAMSUNG HM160JC AP10> Fixed Direct Access SCSI-5 device 
da11: <ATA SAMSUNG HM160JC AP10> Fixed Direct Access SCSI-5 device 
da12: <ATA SAMSUNG HM160JC AP10> Fixed Direct Access SCSI-5 device 

I do remember though that all drives had the same size. Looking at the drives now, it shows that the size has changed for three of them, they have shrunk by 2 MB:

[user@host ~]$ dmesg | egrep '^da.*?: .*?MB '
da0:   10240MB (20971520  512 byte sectors: 255H 63S/T 1305C)
da1:  476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
da2:  476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
da3:  476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
da4:  476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
da5:  476938MB (976771055 512 byte sectors: 255H 63S/T 60801C) <--
da6:  476938MB (976771055 512 byte sectors: 255H 63S/T 60801C) <--
da7:  476938MB (976771055 512 byte sectors: 255H 63S/T 60801C) <--
da8:  476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
da9:  152627MB (312581808 512 byte sectors: 255H 63S/T 19457C)
da10: 152627MB (312581808 512 byte sectors: 255H 63S/T 19457C)
da11: 152627MB (312581808 512 byte sectors: 255H 63S/T 19457C)
da12: 152627MB (312581808 512 byte sectors: 255H 63S/T 19457C)

So by the looks of it, it was not one of the OS installations that 'wrote a bootloader to one the drives' (as I had assumed before), it was actually the new motherboard (an ASUS P8P67 LE) creating a 2 MB host protected area at the end of three of the drives which messed up my ZFS metadata.

Why did it not create an HPA on all drives ? I believe this is because the HPA creation is only done on older drives with a bug that was fixed later on by a Seagate hard drive BIOS update: When this entire incident began a couple of weeks ago, I ran Seagate's SeaTools to check if there is anything physically wrong with the drives (still on the old hardware) and I got a message telling me that some of my drives need a BIOS update. As I am now trying to reproduce the exact details of that message and the link to the firmware update download, it seems that since the motherboard created the HPA, both SeaTools DOS versions to fail to detect the harddrives in question - a quick invalid partition or something similar flashes by when they start, that's it. Ironically, they do find a set of Samsung drives, though.

(I've skipped on the painful, time-consuming and ultimately fruitless details of screwing around in a FreeDOS shell on a non-networked system.) In the end, I installed Windows 7 on a separate machine in order to run the SeaTools Windows version 1.2.0.5. Just a last remark about DOS SeaTools: Don't bother trying to boot them standalone - instead, invest a couple of minutes and make a bootable USB stick with the awesome Ultimate Boot CD - which apart from DOS SeaTools also gets you many many other really useful tools.

When started, SeaTools for Windows bring up this dialog:

SeaTools Firmware Update Dialog

The links lead to the Serial Number Checker (which for some reason is protected by a captcha - mine was 'Invasive users') and a knowledge base article about the firmware update. There's probably further links specific to the hard drive model and some downloads and what not, but I won't follow that path for the moment:

I won't rush into updating the firmware of three drives at a time that have truncated partitions and are part of a broken storage pool. That's asking for trouble. For starters, the firmware update most likely can not be undone - and that might irrevocably ruin my chances to get my data back.

Therefore, the very first thing I'm going to do next is image the drives and work with the copies, so there's an original to go back to if anything goes wrong. This might introduce an additional complexity, as ZFS will probably notice that drives were swapped (by means of the drive serial number or yet another UUID or whatever), even though it's bit-exact dd copies onto the same hard drive model. Moreover, the zpool is not even live. Boy, this might get tricky.

The other option however would be to work with the originals and keep the mirrored drives as backup, but then I'll probably run into above complexity when something went wrong with the originals. Naa, not good.

In order to clear out the three hard drives that will serve as imaged replacements for the three drives with the buggy BIOS in the broken pool, I need to create some storage space for the stuff that's on there now, so I'll dig deep in the hardware box and assemble a temporary zpool from some old drives - which I can also use to test how ZFS deals with swapping dd'd drives.

This might take a while...


20111213-1930+1100

Update 02:

This did take a while indeed. I've spent months with several open computer cases on my desk with various amounts of harddrive stacks hanging out and also slept a few nights with earplugs, because I could not shut down the machine before going to bed as it was running some lengthy critical operation. However, I prevailed at last! :-) I've also learned a lot in the process and I would like to share that knowledge here for anyone in a similar situation.

This article is already much longer than anyone with a ZFS file server out of action has the time to read, so I will go into details here and create an answer with the essential findings further below.

I dug deep in the obsolete hardware box to assemble enough storage space to move the stuff off the single 500GB drives to which the defective drives were mirrored. I also had to rip out a few hard drives out of their USB cases, so I could connect them over SATA directly. There was some more, unrelated issues involved and some of the old drives started to fail when I put them back into action requiring a zpool replace, but I'll skip on that.

Tip: At some stage, there was a total of about 30 hard drives involved in this. With that much hardware, it is an enormous help to have them stacked properly; cables coming loose or hard drive falling off your desk surely won't help in the process and might cause further damage to your data integrity.

I spent a couple of minutes creating some make-shift cardboard hard drive fixtures which really helped to keep things sorted:

some of the make-shift storage space just a bunch of screws plus some cardboard the fan is not exactly required, the stack is from an earlier project the distance pieces aren't required either...

Ironically, when I connected the old drives the first time, I realized there's an old zpool on there I must have created for testing with an older version of some, but not all of the personal data that's gone missing, so while the data loss was somewhat reduced, this meant additional shifting back and forth of files.

Finally, I mirrored the problematic drives to backup drives, used those for the zpool and left the original ones disconnected. The backup drives have a newer firmware, at least SeaTools does not report any required firmware updates. I did the mirroring with a simple dd from one device to the other, e.g.

sudo dd if=/dev/sda of=/dev/sde

I believe ZFS does notice the hardware change (by some hard drive UUID or whatever), but doesn't seem to care.

The zpool however was still in the same state, insufficient replicas / corrupted data.

As mentioned in the HPA Wikipedia article mentioned earlier, the presence of a host protected area is reported when Linux boots and can be investigated using hdparm. As far as I know, there is no hdparm tool available on FreeBSD, but by this time, I anyway had FreeBSD 8.2 and Debian 6.0 installed as dual-boot system, so I booted into Linux:

user@host:~$ for i in {a..l}; do sudo hdparm -N /dev/sd$i; done

   ...
/dev/sdd:
 max sectors   = 976773168/976773168, HPA is disabled
/dev/sde:
 max sectors   = 976771055/976773168, HPA is enabled
/dev/sdf:
 max sectors   = 976771055/976773168, HPA is enabled
/dev/sdg:
 max sectors   = 976771055/976773168, HPA is enabled
/dev/sdh:
 max sectors   = 976773168/976773168, HPA is disabled
   ...

So the problem obviously was that the new motherboard created a HPA of a couple of megabytes at the end of the drive which 'hid' the upper two ZFS labels, i.e. prevented ZFS from seeing them.


Dabbling with the HPA seems a dangerous business. From the hdparm man page, parameter -N:

Get/set max visible number of sectors, also known as the Host Protected Area setting.
  ...
To change the current max (VERY DANGEROUS, DATA LOSS IS EXTREMELY LIKELY), a new value
should be provided (in base10) immediately following the -N option.
This value is specified as a count of sectors, rather than the "max sector address"
of the drive. Drives have the concept of a temporary (volatile) setting which is lost on
the next hardware reset, as well as a more permanent (non-volatile) value which survives
resets and power cycles.  By default, -N affects only the temporary (volatile) setting.
To change the permanent (non-volatile) value, prepend a leading p character immediately
before the first digit of the value. Drives are supposed to allow only a single permanent
change per session. A hardware reset (or power cycle) is required before another
permanent -N operation can succeed.
  ...

In my case, the HPA is removed like this:

user@host:~$ sudo hdparm -Np976773168 /dev/sde

/dev/sde:
 setting max visible sectors to 976773168 (permanent)
 max sectors   = 976773168/976773168, HPA is disabled

and in the same way for the other drives with an HPA. If you get the wrong drive or something about the size parameter you specify is not plausible, hdparm is smart enough to figure:

user@host:~$ sudo hdparm -Np976773168 /dev/sdx

/dev/sdx:
 setting max visible sectors to 976773168 (permanent)
Use of -Nnnnnn is VERY DANGEROUS.
You have requested reducing the apparent size of the drive.
This is a BAD idea, and can easily destroy all of the drive's contents.
Please supply the --yes-i-know-what-i-am-doing flag if you really want this.
Program aborted.

After that, I restarted the FreeBSD 7.2 virtual machine on which the zpool had been originally created and zpool status reported a working pool again. YAY! :-)

I exported the pool on the virtual system and re-imported it on the host FreeBSD 8.2 system.

Some more major hardware upgrades, another motherboard swap, a ZFS pool update to ZFS 4 / 15, a thorough scrubbing and now my zpool consists of 8x1TB plus 8x500GB raidz2 parts:

[user@host ~]$ sudo zpool status
  pool: zpool
 state: ONLINE
 scrub: none requested
config:

NAME        STATE     READ WRITE CKSUM
zpool       ONLINE       0     0     0
  raidz2    ONLINE       0     0     0
    ad0     ONLINE       0     0     0
    ad1     ONLINE       0     0     0
    ad2     ONLINE       0     0     0
    ad3     ONLINE       0     0     0
    ad8     ONLINE       0     0     0
    ad10    ONLINE       0     0     0
    ad14    ONLINE       0     0     0
    ad16    ONLINE       0     0     0
  raidz2    ONLINE       0     0     0
    da0     ONLINE       0     0     0
    da1     ONLINE       0     0     0
    da2     ONLINE       0     0     0
    da3     ONLINE       0     0     0
    da4     ONLINE       0     0     0
    da5     ONLINE       0     0     0
    da6     ONLINE       0     0     0
    da7     ONLINE       0     0     0

errors: No known data errors

[user@host ~]$ df -h
Filesystem         Size    Used   Avail Capacity  Mounted on
/dev/label/root     29G     13G     14G    49%    /
devfs              1.0K    1.0K      0B   100%    /dev
zpool              8.0T    3.6T    4.5T    44%    /mnt/zpool

As a last word, it seems to me ZFS pools are very, very hard to kill. The guys from Sun from who created that system have all the reason the call it the last word in filesystems. Respect!


Source: (StackOverflow)

Advertisements

Why is Linux more popular than BSD? [closed]

Some would argue that BSD/Unix has always been more reliable and stable than Linux (not me, of course, don't hurt me!). Why does Linux always seem to beat BSD? Is it the romance of the Linux story? I don't intend to offend anyone, please don't take offense. Also, please be thoughtful and polite in your response.


Source: (StackOverflow)

OpenBSD, FreeBSD: your update philosophy?

I've used FreeBSD for about 5 years - server/Desktop - and I've tended to take my apt-get/yum upgrade everything habits along with me ( I admin Debian/RHEL/Cent boxes as well -- I know, I know ...should be more discerning regardless of platform ). So it's usually a:

portsnap fetch
portsnap update
portmanager -u

For the ports

Sometimes followed by a:

freebsd-update fetch
freebsd-update install

For the system ...etc. Then just clean up any messes afterwards ...if they occur.

This, I realize, is a fairly excessive un-BSD way to do things. What is your philosophy for your BSD boxes? Do you run a portaudit/portversion -- check output then update (make deinstall ...etc) after careful consideration?

I'm fairly new to OpenBSD, I confess. I see myself cvsupping the ports tree, running the "out of date" script, then just upgrading critical ports --- but leaving the kernel/binaries alone and just upgrading every six months. Do you patch/recompile/rebuild kernel, binaries --- why?

What's a conservative approach for critical services ( reasonably critical -- this ain't no bank or hospital ) on BSD boxes? Are you using a similar approach on your Linux boxes? I generally don't touch the kernel on any servers unless a security alert has stricken terror into my soul.

Yeah, there's docs and books galore -- what do you people actually do? Assuming we know the basics -- what's the wisdom? Use cases/environments and scenarios vary, as do the stakes/stakeholders/users. Books and man pages cover tools and uses, but lack practical application. Recommend a book if you know of one that covers it!

Thanks for reading!

Bubnoff

Conclusions ~ Thanks to everyone who took the time to answer this post. My strategy overall is now to follow the mailing lists for both BSDs and be more selective/discerning with updating than I have been in the past.

FreeBSD ~ Portaudit is a good answer. With the mailing lists and diligent audits, I think this will serve well here. It's interesting the different emphasis on ports between OpenBSD verses FreeBSD.

OpenBSD ~ Will follow the mailing list and use the package tools ( pkg_info and pkg_add -u ) where deemed critical. Upgrades: Looks like you need to upgrade at least once a year. They support the newest release plus one back - so right now it's 4.8 and 4.7.

Thanks again.


Source: (StackOverflow)

FreeBSD jails vs virtualization performance

I'll soon buy myself a nice server (something like the quad-cpu HP DL585) for personal use (home related stuff, data-mining projects, web server and some cron jobs), and I wonder how to separate all those tasks into dedicated environments.

I'm used to FreeBSD & Linux server administration and virtualization setups (Xen & VmWare), so virtualization came quite naturally. But as I plan to use FreeBSD, I can also use jails (I'm not interested in running any other OS than FreeBSD).

But I lack experience in FreeBSD jails, and never planned to use them before. So I'm looking for some experience return and perhaps for someone with experience in both domains who could advise me on which one would be best suited for my humble needs.


Source: (StackOverflow)

Freebsd change default Internet channel route

I have two Internet channel and Gateway on freebsd. When I switch channel with the command route change default chan2, the command netstat -nr shows changed default route. But traceroute shows that the packets go through the old route chan1.

Example:

$netstat -nr 
Routing tables Internet: Destination Gateway  Flags    Refs   Use  Netif Expire
                         default     xxx.xxx.183.54 US 0 8432    em3

$sudo route change default xxx.xxx.144.125 
change net default: gateway> xxx.xxx.144.125

$netstat -nr
Routing tables Internet: Destination Gateway Flags Refs Use  Netif Expire
                         default     xxx.xxx.144.125   US  2  16450  em3

BUT

$ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 64 hops max, 52 byte packets
 1  xxx.xxx.183.53 (xxx.xxx.183.53)  0.527 ms  0.415 ms  0.483ms

All works if I run the following combination:

$sudo route del default

$sleep 10

$sudo route add default xxx.xxx.144.125


Source: (StackOverflow)

Why can't I ping an address on the loopback device under FreeBSD?

From Wikipedia:

The most commonly used IP address on the loopback device is 127.0.0.1 for IPv4, although any address in the range 127.0.0.0 to 127.255.255.255 is mapped to it.

This is not true, at least on FreeBSD:

$ ping 127.1.1.1
PING 127.1.1.1 (127.1.1.1): 56 data bytes
ping: sendto: Can't assign requested address

Is this correct behaviour?


Source: (StackOverflow)

Dovecot user lookup fails when using username@domain format

I have Dovecot v2.0.11 installed on a FreeBSD server and the user lookups for incoming email addresses are failing, but lookups for system users is successful.

Dovecot is setup to use system users, so my dovecot.conf has

 userdb {
  driver = passwd
}

and

passdb {
  driver = passwd
}

I have auth debug enabled.

For example, I have a user called webmaster, and using doveadm user for "webmaster" works as follows:

#doveadm user webmaster
userdb: webmaster
  system_groups_user: webmaster
  uid       : 1020
  gid       : 1020
  home      : /home/webmaster

However using doveadm user to lookup webmaster@myregisteredname.com fails as follows:

# doveadm user webmaster@myregisteredname.com
userdb lookup: user webmaster@myregisteredname.com doesn't exist

This is resulting in incoming mail for webmaster@myregisteredname.com to bounce with an "unknown user" error.

Here's the failure logged in /var/log/maillog:

Apr 16 20:13:35 www dovecot: auth: passwd(webmaster@myregisteredname.com): unknown user

Here's the failure logged in /var/log/debug.log:

Apr 16 20:13:35 www dovecot: auth: Debug: master in: USER       1       webmaster@myregisteredname.com  service=doveadm
Apr 16 20:13:35 www dovecot: auth: Debug: passwd(webmaster@myregisteredname.com): lookup
Apr 16 20:13:35 www dovecot: auth: Debug: master out: NOTFOUND  1

The users and their home directories were imported from another server and the users were setup using the vipw tool. I'm sure there's someting I missed on the import that's not "linking" the system user with the dovecot lookup.

Any ideas about what that something may be?

EDIT: Using BillThor's advice, I updated dovecot.conf as follows:

#doveconf -n passdb userdb
passdb {
  args = username_format=%n
  driver = passwd
}
userdb {
  args = username_format=%n
  driver = passwd
}

However, now, doveadm user fails in a different fashion:

#doveadm user webmaster@pantronx.com
doveadm(root): Error: userdb lookup(webmaster@myregisteredname.com): Disconnected unexpectedly
doveadm(root): Fatal: userdb lookup failed for webmaster@myregisteredname.com

And, it no longer works for users without a domain:

#doveadm user webmaster
doveadm(root): Error: userdb lookup(webmaster): Disconnected unexpectedly
doveadm(root): Fatal: userdb lookup failed for webmaster

When I get the above messages, the following is in /var/log/maillog:

Apr 17 17:30:02 www dovecot: auth: Fatal: passdb passwd: Unknown setting: username_format=%u
Apr 17 17:30:02 www dovecot: master: Error: service(auth): command startup failed, throttling

Source: (StackOverflow)

FreeBSD: How to know real file size on zfs with compression on?

I'm using zfs on my FreeBSD 9.0 x64 and pretty happy with it, but I find it hard to count directory real, not compressed, size.

Surely I can walk over the directory and count every file size with ls, but I'd expect some extra key for du for that purpose.

So, how can I tell the directory size for dir placed on zfs with compression on?

Thamk you in advance for the advice, I simple can't rememeber there is no such a 'simple' way, without 'find ./ -type d -exec ls -l '{}' \; | awk ...'!


Source: (StackOverflow)

Need to replace disk in zpool ... confused

I need to replace a bad disk in a zpool on FreeNAS.

zpool status shows

  pool: raid-5x3
 state: ONLINE
 scrub: scrub completed after 15h52m with 0 errors on Sun Mar 30 13:52:46 2014
config:

    NAME                                            STATE     READ WRITE CKSUM
    raid-5x3                                        ONLINE       0     0     0
      raidz1                                        ONLINE       0     0     0
        ada5p2                                      ONLINE       0     0     0
        gptid/a767b8ef-1c95-11e2-af4c-f46d049aaeca  ONLINE       0     0     0
        ada8p2                                      ONLINE       0     0     0
        ada10p2                                     ONLINE       0     0     0
        ada7p2                                      ONLINE       0     0     0

errors: No known data errors

  pool: raid2
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

    NAME                                            STATE     READ WRITE CKSUM
    raid2                                           DEGRADED     0     0     0
      raidz1                                        DEGRADED     0     0     0
        gptid/5f3c0517-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
        gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca  UNAVAIL      0     0     0  cannot open
        gptid/60570005-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
        gptid/60ebeaa5-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
        gptid/61925b86-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0

errors: No known data errors

glabel status shows

                                      Name  Status  Components
                             ufs/FreeNASs3     N/A  da0s3
                             ufs/FreeNASs4     N/A  da0s4
                    ufsid/4fa405ab96518680     N/A  da0s1a
                            ufs/FreeNASs1a     N/A  da0s1a
                            ufs/FreeNASs2a     N/A  da0s2a
gptid/5f3c0517-3ff2-11e2-9437-f46d049aaeca     N/A  ada1p2
gptid/60570005-3ff2-11e2-9437-f46d049aaeca     N/A  ada3p2
gptid/60ebeaa5-3ff2-11e2-9437-f46d049aaeca     N/A  ada4p2
gptid/a767b8ef-1c95-11e2-af4c-f46d049aaeca     N/A  ada6p2
gptid/61925b86-3ff2-11e2-9437-f46d049aaeca     N/A  ada9p2
gptid/4599731b-8f15-11e1-a14c-f46d049aaeca     N/A  ada10p2

camcontrol devlist shows

<Hitachi HDS723030BLE640 MX6OAAB0>  at scbus0 target 0 lun 0 (pass0,ada0)
<ST3000VX000-9YW166 CV13>          at scbus4 target 0 lun 0 (pass1,ada1)
<ST3000VX000-9YW166 CV13>          at scbus6 target 0 lun 0 (pass3,ada3)
<Hitachi HDS723030BLE640 MX6OAAB0>  at scbus7 target 0 lun 0 (pass4,ada4)
<ST3000DM001-9YN166 CC4C>          at scbus8 target 0 lun 0 (pass5,ada5)
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus8 target 1 lun 0 (pass6,ada6)
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus9 target 0 lun 0 (pass7,ada7)
<ST3000DM001-9YN166 CC4C>          at scbus9 target 1 lun 0 (pass8,ada8)
<Hitachi HDS723030BLE640 MX6OAAB0>  at scbus10 target 0 lun 0 (pass9,ada9)
<Hitachi HDS5C3030ALA630 MEAOA580>  at scbus11 target 0 lun 0 (pass10,ada10)
< USB Flash Memory 1.00>           at scbus12 target 0 lun 0 (pass11,da0)

I'm pretty sure that ada2 is the bad disk.

It appears that I left a spare in there - ada0 - last time I was in the box. Can I replace ada2 with ada0 remotely? Until someone gets to the office? With what commands?

Here's what I don't understand:

  1. Why don't ada0, ada2, ada5, ada7, and ada8 appear in glabel status?
  2. Why does zpool status show those long gptid's for some disks, and "ada" names for others?
  3. If I want to zpool replace raid2 -- what do I use for the device and new-device names?

Source: (StackOverflow)

Recommendations on ZFS on FreeBSD as a NAS box?

Please note that the answers on this page are from 2009, and should not be relied on as gospel. If you have a specific question about then please click the Ask a Question button and ask a specific question.

I was thinking of building a home backup system using FreeBSD 7.2 and the ZFS file system. Has anyone had any experience with that file system?

Specifically:

  • Is it possible to boot from ZFS? (Would I want to?)
  • How easy is it to add a drive?
  • How well does it handles drives of different sizes?
  • Can you add new drives on the fly (or at least with just a reboot)?
  • Would I be better served by something off the shelf?

Any other thoughts and suggestions would be welcome.

Edit:

Just to be clear I have read the FreeBSD page on ZFS. I am looking for suggestions from people with practical experience with a similar setup to what I want.


Source: (StackOverflow)

FreeBSD link aggregation no faster than single link

We put a 4 port Intel I340-T4 NIC in a FreeBSD 9.3 server1 and configured it for link aggregation in LACP mode in an attempt to decrease the time it takes to mirror 8 to 16 TiB of data from a master file server to 2-4 clones in parallel. We were expecting to get up to 4 Gbit/sec aggregate bandwidth, but no matter what we've tried, it never comes out faster than 1 Gbit/sec aggregate.2

We're using iperf3 to test this on a quiescent LAN.3 The first instance nearly hits a gigabit, as expected, but when we start a second one in parallel, the two clients drop in speed to roughly ½ Gbit/sec. Adding a third client drops all three clients' speeds to ~⅓ Gbit/sec, and so on.

We've taken care in setting up the iperf3 tests that traffic from all four test clients comes into the central switch on different ports:

LACP test setup

We've verified that each test machine has an independent path back to the rack switch and that the file server, its NIC, and the switch all have the bandwidth to pull this off by breaking up the lagg0 group and assigning a separate IP address to each of the four interfaces on this Intel network card. In that configuration, we did achieve ~4 Gbit/sec aggregate bandwidth.

When we started down this path, we were doing this with an old SMC8024L2 managed switch. (PDF datasheet, 1.3 MB.) It wasn't the highest-end switch of its day, but it's supposed to be able to do this. We thought the switch might be at fault, due to its age, but upgrading to a much more capable HP 2530-24G did not change the symptom.

The HP 2530-24G switch claims the four ports in question are indeed configured as a dynamic LACP trunk:

# show trunks
Load Balancing Method:  L3-based (default)

  Port | Name                             Type      | Group Type    
  ---- + -------------------------------- --------- + ----- --------
  1    | Bart trunk 1                     100/1000T | Dyn1  LACP    
  3    | Bart trunk 2                     100/1000T | Dyn1  LACP    
  5    | Bart trunk 3                     100/1000T | Dyn1  LACP    
  7    | Bart trunk 4                     100/1000T | Dyn1  LACP    

We've tried both passive and active LACP.

We've verified that all four NIC ports are getting traffic on the FreeBSD side with:

$ sudo tshark -n -i igb$n

Oddly, tshark shows that in the case of just one client, the switch splits the 1 Gbit/sec stream over two ports, apparently ping-ponging between them. (Both the SMC and HP switches showed this behavior.)

Since the clients' aggregate bandwidth only comes together in a single place — at the switch in the server's rack — only that switch is configured for LACP.

It doesn't matter which client we start first, or which order we start them in.

ifconfig lagg0 on the FreeBSD side says:

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
    ether 90:e2:ba:7b:0b:38
    inet 10.0.0.2 netmask 0xffffff00 broadcast 10.0.0.255
    inet6 fe80::92e2:baff:fe7b:b38%lagg0 prefixlen 64 scopeid 0xa 
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    media: Ethernet autoselect
    status: active
    laggproto lacp lagghash l2,l3,l4
    laggport: igb3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: igb2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

We've applied as much of the advice in the FreeBSD network tuning guide as makes sense to our situation. (Much of it is irrelevant, such as the stuff about increasing max FDs.)

We've tried turning off TCP segmentation offloading, with no change in the results.

We do not have a second 4-port server NIC to set up a second test. Because of the successful test with 4 separate interfaces, we're going on the assumption that none of the hardware is damaged.3

We see these paths forward, none of them appealing:

  1. Buy a bigger, badder switch, hoping that SMC's LACP implementation just sucks, and that the new switch will be better. (Upgrading to an HP 2530-24G didn't help.)

  2. Stare at the FreeBSD lagg configuration some more, hoping that we missed something.4

  3. Forget link aggregation and use round-robin DNS to effect the load balancing instead.

  4. Replace the server NIC and switch again, this time with 10 GigE stuff, at about 4× the hardware cost of this LACP experiment.


Footnotes

  1. Why not move to FreeBSD 10, you ask? Because FreeBSD 10.0-RELEASE still uses ZFS pool version 28, and this server's been upgraded to ZFS pool 5000, a new feature in FreeBSD 9.3. The 10.x line won't get that until FreeBSD 10.1 ships about a month hence. And no, rebuilding from source to get onto the 10.0-STABLE bleeding edge isn't an option, since this is a production server.

  2. Please don't jump to conclusions. Our test results later in the question tell you why this is not a duplicate of this question.

  3. iperf3 is a pure network test. While the eventual goal is to try and fill that 4 Gbit/sec aggregate pipe from disk, we are not yet involving the disk subsystem.

  4. Buggy or poorly designed, maybe, but no more broken than when it left the factory.

  5. I've already gone cross-eyed from doing that.


Source: (StackOverflow)

How can I get FreeNAS and pfSense to report more using SNMP?

I have two servers which run variants of FreeBSD: One is a pfSense router and one is a FreeNAS 8 server. Both these servers run SNMP, and I am collecting and graphing their information using a third Cacti server.

The SNMP data from both the pfSense server and the FreeNAS server does not include memory usage, CPU usage, nor Load Average.

The traffic graphs for the pfSense server look fine. The disk usage reports from the FreeNAS server look beautiful. I just don't get any data for memory usage, CPU usage, nor Load Average. I know both these servers should be capable of providing this data, because in the pfSense and freeNAS web admins I can view graphs. But I would prefer to have all the graphs in Cacti for ease of management.

How can I get my pfSense server to provide memory usage, CPU usage, and Load Average data via SNMP? How can I get my FreeNAS server to provide memory usage, CPU usage, and Load Average data via SNMP? I assume the same procedure will work for both servers.


Source: (StackOverflow)

Can I trust that ZFS is consistent between Linux and FreeBSD?

I'm planning to build a FreeNAS box sometime soon, but if ZFS on Linux eventually proves to be reliable, I might want to switch, just to have a more familiar OS.

So I'm wondering if I can trust that the different implementations of ZFS are compatible. In other words, if I just swap out the boot disk from FreeNAS to Linux or OpenIndiana, can I trust that nothing bad will happen to my data?

This may seem like a stupid question--obviously it ought to be compatible--but I'm guessing that ZFS isn't commonly used in cases where drives are moved between computers, so I'm hoping someone can provide a better answer than just "it ought to be".


Source: (StackOverflow)

Best practices to block social sites

In our company we have around 100 workstations with Internet access, and the day-to-day situation is getting worse and worse from the perspective of using Internet access for the purpose of doing private jobs, and wasting time on social sites.

Open hearted as I am I don't like blocking sites like Facebook, YouTube, and other similar sites but day by day my colleagues do not finish their tasks and any time I look at their monitors they are running Internet Explorer or Mozilla Firefox, chat and things like that. On the other hand I would like to block YouTube when we have a very low Internet access speed.

Here are my questions:

  • Do other companies block social sites?
  • Do I need a dedicated device for that, like a hardware firewall or a super expensive router? Or can I do that with my existing FreeBSD 6.1 self-made router with two LAN cards and configured NAT to act like a router?

I was trying to do that using ipfw and routerfirewall but without success. My code looks like:

ipfw add 25 deny tcp from 192.168.0.0/20 to www.facebook.com
ipfw add 25 deny udp from 192.168.0.0/20 to www.facebook.
ipfw add 25 deny tcp from 192.168.0.0/20 to www.dernek.
ipfw add 25 deny udp from 192.168.0.0/20 to www.dernek.
ipfw add 25 deny tcp from 192.168.0.0/20 to www.youtube.
ipfw add 25 deny udp from 192.168.0.0/20 to www.youtube.com

What can I do to fix this problem?


Source: (StackOverflow)