linux-networking interview questions
Top linux-networking frequently asked interview questions
I create an open ad-hoc wlan by using iwconfig
(I have the same issue with wpa_supplicant
as well). there are 4 nodes on the network as seen on the figure below. The nodes run ubuntu 12.04 and debian squeeze, and have 3.7.1, 3.5 and 3.2 kernels. I use two different usb dongle brands (TP link and ZCN) that all have AR9271 chipset and ath9k_htc
driver (here is lsusb output and ethtool output).
The problem I am experiencing is that two nodes (10.0.0.2
and 10.0.0.5
) which have TP link usb wifi dongles can ping any node on the network, and vice-versa. However, the other nodes (10.0.0.6
and 10.0.0.7
) that have ZCN wifi dongle cannot ping each other, but they have no problem communicating with TP-link wifi modules. tcpdump
shows that 10.0.0.6
and 10.0.0.7
cannot see their arp-request, e.g.
20:37:52.470305 ARP, Request who-has 10.0.0.7 tell 10.0.0.6, length 28
20:37:53.463713 ARP, Request who-has 10.0.0.7 tell 10.0.0.6, length 28
20:37:54.463622 ARP, Request who-has 10.0.0.7 tell 10.0.0.6, length 28
20:37:55.472868 ARP, Request who-has 10.0.0.7 tell 10.0.0.6, length 28
20:37:56.463439 ARP, Request who-has 10.0.0.7 tell 10.0.0.6, length 28
20:37:57.463469 ARP, Request who-has 10.0.0.7 tell 10.0.0.6, length 28
but they are able to see and get reply from TP-link's modules.
20:39:23.634459 ARP, Request who-has 10.0.0.2 tell 10.0.0.6, length 28
20:39:23.634551 ARP, Reply 10.0.0.2 is-at 64:70:02:18:d4:6a (oui Unknown), length 28
20:39:23.636687 IP 10.0.0.6 > 10.0.0.2: ICMP echo request, id 572, seq 1, length 64
20:39:23.636809 IP 10.0.0.2 > 10.0.0.6: ICMP echo reply, id 572, seq 1, length 64
20:39:24.635497 IP 10.0.0.6 > 10.0.0.2: ICMP echo request, id 572, seq 2, length 64
20:39:24.635558 IP 10.0.0.2 > 10.0.0.6: ICMP echo reply, id 572, seq 2, length 64
20:39:28.651946 ARP, Request who-has 10.0.0.6 tell 10.0.0.2, length 28
20:39:28.654021 ARP, Reply 10.0.0.6 is-at 00:19:70:94:7c:8b (oui Unknown), length 28
My question is that what could be the reason that 10.0.0.6
and 10.0.0.7
cannot see the arp-request
that they send each other? How can I find out the problem?
If I add couple more nodes with ZCN wifi dongle on the network, these nodes are also not able to talk with each other, but they are fine with TP-link. Or if I swap the wifi modules, the nodes with ZCN have always problem but TP-link modules are fine.
here is the /etc/network/interfaces
, ifconfig
, iwconfig
, ip a
, ip r
, route
outputs
EDIT: I was suspecting if the problem is arp_filter
related but /proc/sys/net/ipv4/conf/*/arp_filter
is 0
on the all subdomains(*). If I add arp info of 10.0.0.6
and 10.0.0.7
manually on these nodes, tcpdump
and wireshark
does not show that they send ping
to each other. If I ping
the broadcast address (10.0.0.255 in my case), 10.0.0.6
and 10.0.0.7
are able hear it.
EDIT2: Here is pcap files http://filebin.net/6cle9a5iae from 10.0.0.6
(ZCN module), 10.0.0.7
(ZCN module), and 10.0.0.5
(TP-link module that does not have problem). here is the ping outputs from 10.0.0.6
http://pastebin.com/swFP2CJ9 I captured the packages simultaneously. The link also includes ifconfig
; iwconfig
; and uname- a
outputs for each node.
Source: (StackOverflow)
I'm currently having a major problem with e1000e
(not working at all) in Ubuntu Maverick (1.0.2-k4), after resume I'm getting a lot of stuff in dmesg:
[ 9085.820197] e1000e 0000:02:00.0: PCI INT A disabled
[ 9089.907756] e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k4
[ 9089.907762] e1000e: Copyright (c) 1999 - 2009 Intel Corporation.
[ 9089.907797] e1000e 0000:02:00.0: Disabling ASPM L1
[ 9089.907827] e1000e 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 9089.907857] e1000e 0000:02:00.0: setting latency timer to 64
[ 9089.908529] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 9089.908922] e1000e 0000:02:00.0: Disabling ASPM L0s
[ 9089.908954] e1000e 0000:02:00.0: (unregistered net_device): PHY reset is blocked due to SOL/IDER session.
[ 9090.024625] e1000e 0000:02:00.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:0a:e4:3e:ce:74
[ 9090.024630] e1000e 0000:02:00.0: eth0: Intel(R) PRO/1000 Network Connection
[ 9090.024712] e1000e 0000:02:00.0: eth0: MAC: 2, PHY: 2, PBA No: 005302-003
[ 9090.109492] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 9090.164219] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
and, a bunch of
[ 2128.005447] e1000e 0000:02:00.0: eth0: Detected Hardware Unit Hang:
[ 2128.005452] TDH <89>
[ 2128.005454] TDT <27>
[ 2128.005456] next_to_use <27>
[ 2128.005458] next_to_clean <88>
[ 2128.005460] buffer_info[next_to_clean]:
[ 2128.005463] time_stamp <6e608>
[ 2128.005465] next_to_watch <8a>
[ 2128.005467] jiffies <6f929>
[ 2128.005469] next_to_watch.status <0>
[ 2128.005471] MAC Status <80080703>
[ 2128.005473] PHY Status <796d>
[ 2128.005475] PHY 1000BASE-T Status <4000>
[ 2128.005477] PHY Extended Status <3000>
[ 2128.005480] PCI Status <10>
I decided to compile the latest stable e1000e
to 1.2.17
, now I'm getting:
[ 9895.678050] e1000e: Intel(R) PRO/1000 Network Driver - 1.2.17-NAPI
[ 9895.678055] e1000e: Copyright(c) 1999 - 2010 Intel Corporation.
[ 9895.678098] e1000e 0000:02:00.0: Disabling ASPM L1
[ 9895.678129] e1000e 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 9895.678162] e1000e 0000:02:00.0: setting latency timer to 64
[ 9895.679136] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 9895.679160] e1000e 0000:02:00.0: Disabling ASPM L0s
[ 9895.679192] e1000e 0000:02:00.0: (unregistered net_device): PHY reset is blocked due to SOL/IDER session.
[ 9895.791758] e1000e 0000:02:00.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:0a:e4:3e:ce:74
[ 9895.791766] e1000e 0000:02:00.0: eth0: Intel(R) PRO/1000 Network Connection
[ 9895.791850] e1000e 0000:02:00.0: eth0: MAC: 3, PHY: 2, PBA No: 005302-003
[ 9895.892464] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 9895.948175] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 9895.949111] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 9895.954694] e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex, Flow Control: RX/TX
[ 9895.954703] e1000e 0000:02:00.0: eth0: 10/100 speed: disabling TSO
[ 9895.955157] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 9906.832056] eth0: no IPv6 routers present
With 1.2.20
I get:
[ 9711.525465] e1000e: Intel(R) PRO/1000 Network Driver - 1.2.20-NAPI
[ 9711.525472] e1000e: Copyright(c) 1999 - 2010 Intel Corporation.
[ 9711.525521] e1000e 0000:02:00.0: Disabling ASPM L1
[ 9711.525554] e1000e 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 9711.525586] e1000e 0000:02:00.0: setting latency timer to 64
[ 9711.526460] e1000e 0000:02:00.0: irq 45 for MSI/MSI-X
[ 9711.526487] e1000e 0000:02:00.0: Disabling ASPM L0s
[ 9711.526523] e1000e 0000:02:00.0: (unregistered net_device): PHY reset is blocked due to SOL/IDER session.
[ 9711.639763] e1000e 0000:02:00.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:0a:e4:3e:ce:74
[ 9711.639771] e1000e 0000:02:00.0: eth0: Intel(R) PRO/1000 Network Connection
[ 9711.639854] e1000e 0000:02:00.0: eth0: MAC: 3, PHY: 2, PBA No: 005302-003
[ 9712.060770] e1000e 0000:02:00.0: irq 45 for MSI/MSI-X
[ 9712.116195] e1000e 0000:02:00.0: irq 45 for MSI/MSI-X
[ 9712.117098] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 9712.122684] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[ 9712.122693] e1000e 0000:02:00.0: eth0: 10/100 speed: disabling TSO
[ 9712.123142] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 9722.920014] eth0: no IPv6 routers present
But, I'm still getting these
[ 9982.992851] PCI Status <10>
[ 9984.993602] e1000e 0000:02:00.0: eth0: Detected Hardware Unit Hang:
[ 9984.993606] TDH <5d>
[ 9984.993608] TDT <6b>
[ 9984.993611] next_to_use <6b>
[ 9984.993613] next_to_clean <5b>
[ 9984.993615] buffer_info[next_to_clean]:
[ 9984.993617] time_stamp <24da80>
[ 9984.993619] next_to_watch <5d>
[ 9984.993621] jiffies <24f200>
[ 9984.993624] next_to_watch.status <0>
[ 9984.993626] MAC Status <80080703>
[ 9984.993628] PHY Status <796d>
[ 9984.993630] PHY 1000BASE-T Status <4000>
[ 9984.993632] PHY Extended Status <3000>
[ 9984.993635] PCI Status <10>
[ 9986.001047] e1000e 0000:02:00.0: eth0: Reset adapter
[ 9986.176202] e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex, Flow Control: RX/TX
[ 9986.176211] e1000e 0000:02:00.0: eth0: 10/100 speed: disabling TSO
I'm not sure where to start troubleshooting this. Any ideas?
Here is the result of ethtool -d eth0
MAC Registers
-------------
0x00000: CTRL (Device control register) 0x18100248
Endian mode (buffers): little
Link reset: reset
Set link up: 1
Invert Loss-Of-Signal: no
Receive flow control: enabled
Transmit flow control: enabled
VLAN mode: disabled
Auto speed detect: disabled
Speed select: 1000Mb/s
Force speed: no
Force duplex: no
0x00008: STATUS (Device status register) 0x80080703
Duplex: full
Link up: link config
TBI mode: disabled
Link speed: 10Mb/s
Bus type: PCI Express
Port number: 0
0x00100: RCTL (Receive control register) 0x04048002
Receiver: enabled
Store bad packets: disabled
Unicast promiscuous: disabled
Multicast promiscuous: disabled
Long packet: disabled
Descriptor minimum threshold size: 1/2
Broadcast accept mode: accept
VLAN filter: enabled
Canonical form indicator: disabled
Discard pause frames: filtered
Pass MAC control frames: don't pass
Receive buffer size: 2048
0x02808: RDLEN (Receive desc length) 0x00001000
0x02810: RDH (Receive desc head) 0x00000001
0x02818: RDT (Receive desc tail) 0x000000F0
0x02820: RDTR (Receive delay timer) 0x00000000
0x00400: TCTL (Transmit ctrl register) 0x3103F0FA
Transmitter: enabled
Pad short packets: enabled
Software XOFF Transmission: disabled
Re-transmit on late collision: enabled
0x03808: TDLEN (Transmit desc length) 0x00001000
0x03810: TDH (Transmit desc head) 0x00000000
0x03818: TDT (Transmit desc tail) 0x00000000
0x03820: TIDV (Transmit delay timer) 0x00000008
PHY type: IGP2
and ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0
tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
Here is also the lspci -vvv
for this controller
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
Subsystem: Lenovo ThinkPad X60s
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 45
Region 0: Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 2000 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0300c Data: 415a
Capabilities: [e0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number 00-0a-e4-ff-ff-3e-ce-74
Kernel driver in use: e1000e
Kernel modules: e1000e
I filed a bug on this upstream, still no idea how to get more useful information.
Here is a the result of the running that script
EEPROM FIX UPDATE
$ sudo bash fixeep-82573-dspd.sh eth0
eth0: is a "82573L Gigabit Ethernet Controller"
This fixup is applicable to your hardware
Your eeprom is up to date, no changes were made
Do I still need to do anything? Also here is my EEPROM dump
$ sudo ethtool -e eth0
Offset Values
------ ------
0x0000 00 0a e4 3e ce 74 30 0b b2 ff 51 00 ff ff ff ff
0x0010 53 00 03 02 6b 02 7e 20 aa 17 9a 10 86 80 df 80
0x0020 00 00 00 20 54 7e 00 00 14 00 da 00 04 00 00 27
0x0030 c9 6c 50 31 3e 07 0b 04 8b 29 00 00 00 f0 02 0f
0x0040 08 10 00 00 04 0f ff 7f 01 4d ff ff ff ff ff ff
0x0050 14 00 1d 00 14 00 1d 00 af aa 1e 00 00 00 1d 00
0x0060 00 01 00 40 1f 12 07 40 ff ff ff ff ff ff ff ff
0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 4a e0
I'd also like to note that I used eth0
every day for years and until recently never had an issue.
Source: (StackOverflow)
Simple question:
How can I setup multiple MAC addresses on one physical network interface (linux)?
Why?
My ISP is checking ip<->mac on GW and I d like to route traffic through my "linuxbox" and than forward it with different source ip.
Without checking ip<->mac, I will use eth0, eth0:0, but in this situation I need unique MAC address for every IP.
Source: (StackOverflow)
I've been handed 3 Linux boxes, 1 front facing with apache on it and another 2 which, as far as I can tell, don't do an awful lot. All running on Redhat.
The question is simple: How can I tell what the server is actually doing? Zero documentation is available from the creator.
Source: (StackOverflow)
I'm archiving data from one server to another. Initially I started a rsync
job. It took 2 weeks for it to build the file list just for 5 TB of data and another week to transfer 1 TB of data.
Then I had to kill the job as we need some down time on the new server.
It's been agreed that we will tar it up since we probably won't need to access it again. I was thinking of breaking it into 500 GB chunks. After I tar
it then I was going to copy it across through ssh
. I was using tar
and pigz
but it is still too slow.
Is there a better way to do it? I think both servers are on Redhat. Old server is Ext4 and the new one is XFS.
File sizes range from few kb to few mb and there are 24 million jpegs in 5TB. So I'm guessing around 60-80 million for 15TB.
edit: After playing with rsync, nc, tar, mbuffer and pigz for a couple of days. The bottleneck is going to be the disk IO. As the data is striped across 500 SAS disks and around 250 million jpegs. However, now I learnt about all these nice tools that I can use in future.
Source: (StackOverflow)
On the server node, it is possible to access an exported folder. However, after reboots (both server and client), the folder is no longer accessible from the clients.
On server
# ls /data
Folder1
Forlder2
and the /etc/exports file contains
/data 192.168.1.0/24(rw,no_subtree_check,async,no_root_squash)
On client
# ls /data
ls: cannot access /data: Stale NFS file handle
I have to say that there were no problem with the shared folder from client side however after reboots (server and client), I see this message.
Any way to fix that?
Source: (StackOverflow)
I'm using names like a.alpha for the hostname of my linux box, but it seams that these name are not completely usable. The response of a hostname shell command is correct (a.alpha).
But the name printed after my user account is "user@a" instead of "user@a.alpha". When I use avahi, I can reach (by hostname) a.alpha, but not b.alpha. Is that normal?
Source: (StackOverflow)
In RHEL, instead of using service network restart
command, how can i restart a particular network interface, lets say "eth1", with only one command.
"Only one command" because that is the only interface where my ssh
is working on also. So if i'm about to use: ifdown
and then ifup
, i will never be able to hit the ifup
command as my ssh
has been terminated once after ifdown eth1
command.
So there should be a single command which allows me to altogether bring down and then bring up the interface which is serving my current ssh
connection. So i do not need to worry about connection totally lost to my server.
Any idea please?
Source: (StackOverflow)
On my local network there are (among others) 5 machines (running
Debian Jessie or Arch) wirelessly connected to a Netgear WNDR4000
router. Below is a graph of the ping times to the router
from each of the machines, collected over a period of around half an hour.
Observations:
When things are going well, the ping times are all below 3ms (under
1ms for two of the machines, including the problem machine purple)
At irregular intervals (of the order of 100s), three of these
machines (red, green, purple) suffer degradation of ping times, while the other two appear unaffected.
The degradation periods coincide for all 3 machines.
The degradation for purple is two orders of magnitude more severe than
for green and red, with ping times typically reaching over 20000ms
for purple and 200ms for red and green.
If purple is physically moved nearer the router, the degradation
completely disappears for purple while continuing as before for both
red and green.
Red is 3m away and in direct line of sight from the base station;
purple's usual location is about 10m away without direct line of
sight.
This makes network access on purple intolarably slow (when it is in
its normal location).
Can you suggest how to go about diagnosing and fixing the problem?
Source: (StackOverflow)
When looking at a variety of Linux and FreeBSD systems, I've noticed that on some systems /etc/hosts
contains an entry for the public hostname of the host, but not on other systemst.
What is the best practice here? Should my /etc/hosts file contain an entry for the hosts FQDN (e.g. myhost.example.org) and for the short hostname (e.g. myhost)? Should the record for the FQDN point to the localhost or should it point to the external IP of the box?
For example, the default configuration on many RHEL/EL boxes doesn't put the public hostname into /etc/hosts
:
myhost # cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
myhost #
The other variant is that the host's short hostname and FQDN also point to 127.0.0.1. I've been told that this is an older practice which is frowned upon these days, but plenty of admins still do this.
myhost # cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 myhost myhost.example.org
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
myhost #
The third variant is that the hosts's FQDN and short hostname are given the external IP address of the host. This third varient seems optimal to me because it reduces lookups against the DNS servers.
myhost # cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
74.125.239.xxx myhost myhost.example.org
myhost #
What is the best practice here?
Source: (StackOverflow)
I have a chain appended with many rules like:
> :i_XXXXX_i - [0:0]
> -A INPUT -s 282.202.203.83/32 -j i_XXXXX_i
> -A INPUT -s 222.202.62.253/32 -j i_XXXXX_i
> -A INPUT -s 222.202.60.62/32 -j i_XXXXX_i
> -A INPUT -s 224.93.27.235/32 -j i_XXXXX_i
> -A OUTPUT -d 282.202.203.83/32 -j i_XXXXX_i
> -A OUTPUT -d 222.202.62.253/32 -j i_XXXXX_i
> -A OUTPUT -d 222.202.60.62/32 -j i_XXXXX_i
> -A OUTPUT -d 224.93.27.235/32 -j i_XXXXX_i
when I try to delete this chain with:
iptables -X XXXX
but got error like (tried iptables -F XXXXX before):
iptables: Too many links.
Is there a easy way to delete the chain by once command?
Source: (StackOverflow)
I am not sure if this is just specific to my distro's packages or is a vbox limitation. So, any help would be appreciated.
Ok, so I have a network of VMs, with one VM acting as a NAT for the other VMs in the "internal" network. One of those VMs is running a DHCP & TFTP server, and I just need to boot other VMs off of this server, but all I get starting the VMs when booting from the network is "Fatal: Could not read from the boot medium ! System halted".
This is kinda killjoy since vbox is pretty easy to use and to experiment with, but I can't get around this error. Help ?
Source: (StackOverflow)
Does anyone have some data or basic calculations that can answer when frame coalescing (NAPI) is required and when a single interrupt per frame is sufficient?
My hardware: IBM BladeServer HS22, Broadcom 5709 Gigabit NIC hardware (MSI-X), with dual Xeon E5530 quad-core processors. Main purpose is Squid proxy server. Switch is a nice Cisco 6500 series.
Our basic problem is that during peak times (100 Mbps traffic, only 10,000 pps) that latency and packet loss increases. I have done a lot of tuning and kernel upgrade to 2.6.38 and it has improved the packet loss but latency is still poor. Pings are sporadic; jumping even to 200ms on local Gbps LAN. Squid average response jumps from 30ms to 500+ms even though CPU/memory load is fine.
The interrupts climb to about 15,000/second during the peak. Ksoftirqd isn't using much CPU; I have installed irqbalance to balance the IRQs (8 each for eth0 and eth1) across all the cores but that hasn't helped much.
Intel NICs seem to never have these kinds of problems, but do the fact of the bladesystem and fixed configuration hardware, we are kind of stuck with the Broadcoms.
Everything is pointing at the NIC as being the main culprit. The best idea I have right now is to try decrease the interrupts while keeping both latency low and throughput high.
The bnx2 unfortunately doesn't support adaptive-rx or tx.
The NAPI vs Adaptive Interrupts thread answer provides a great over view of interrupt moderation but no concrete information on how to calculate optimal ethtool coalesce settings for given workaround. Is there a better approach then just trial and error?
Does the above mentioned workload and hardware configuration even need NAPI? Or should it be able to live on single interrupt per packet?
Source: (StackOverflow)
Is there some easy way to find out mac address of all machines on my network rather than doing an SSH into each and ifconfig | grep HWaddr
if there are 300 machines on network I really need some easy solution.
Source: (StackOverflow)
Ubuntu Server 10.04.1 x86
I've got a machine with a FCGI HTTP service behind nginx, that serves a lot of small HTTP requests to a lot of different clients. (About 230 requests per second in the peak hours, average response size with headers is 650 bytes, several millions of different clients per day.)
As a result, I have a lot of sockets, hanging in TIME_WAIT (graph is captured with TCP settings below):
I'd like to reduce the number of sockets.
What can I do besides this?
$ cat /proc/sys/net/ipv4/tcp_fin_timeout
1
$ cat /proc/sys/net/ipv4/tcp_tw_recycle
1
$ cat /proc/sys/net/ipv4/tcp_tw_reuse
1
Update: some details on the actual service layout on the machine:
client -----TCP-socket--> nginx (load balancer reverse proxy)
-----TCP-socket--> nginx (worker)
--domain-socket--> fcgi-software
--single-persistent-TCP-socket--> Redis
--single-persistent-TCP-socket--> MySQL (other machine)
I probably should switch load-balancer --> worker connection to domain sockets as well, but the issue about TIME_WAIT sockets would remain — I plan to add a second worker on a separate machine soon. Won't be able to use domain sockets in that case.
Source: (StackOverflow)