EzDevInfo.com

high-load interview questions

Top high-load frequently asked interview questions

What caused socket connections to be slow after Full GC?

We have a client server app, 1 server, about 10 clients. They communicate via tcp sockets using custom queries.

The system had been running smooth for many months, but at some point, after the daily scheduled server FULL GC that took about 50s, we figured out that the time between the queries sent by the client and responses received from the server was large, > 10-20s. After some 3 hours the system recovered, everything was running fine again.

While investigating the issue, we found:

  1. No garbage collection problems on both clients and server
  2. Query processing time on server was small.
  3. Load on server was high.
  4. The network bandwidth was not saturated.
  5. Connections were not reset during the FULL GC (the daily FULL GC was a normal event until then)
  6. The machine and OS changed recently from Centos 6 (kernel 2.6.32) to Centos 7 (kernel 3.10.0), but the new config was extensivelly tested. Also Oracle JDK version changed from 1.7.65 to 1.7.75.

We took a thread dump on the server:

java.lang.Thread.State: RUNNABLE
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at util.network.BytesBasedSocketConnection$ReadConnectionRunnable.run(BytesBasedSocketConnection.java:293)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

The FilterInputStream.read() is the following:

    public int read() throws IOException {
    return in.read();
}

The in in our code is a BufferedInputStream.

The questions are: Why most connections slowed after the Full GC pause? Why the stacktrace ends in FilterInputStream.read()? Shouldn't it end somewhere in the BufferedInputStream or in socket input stream? Can this read lead to high load on server?

The code we use for reading:

int constructLength = _socketDIS.readInt();
ByteArrayOutputStream constructBOAS = new ByteArrayOutputStream(constructLength);
for (int i = 0; i != constructLength; i++)
      constructBOAS.write(_socketDIS.read());
constructBOAS.close();
byte[] bytes = constructBOAS.toByteArray();

where:

_socketDIS = new DataInputStream(new BufferedInputStream(_socket.getInputStream()));

Here is the stacktrace from the well working client connections:

java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    - locked <0x00007f522cbebca8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at util.network.BytesBasedSocketConnection$ReadConnectionRunnable.run(BytesBasedSocketConnection.java:287)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

UPDATE:

Regarding the EJP answer:

  1. There was no EOS involved, the connections were up, but they were very slow

  2. Even if there was an EOS I can't see how the code could spin at the EOS, the for is bounded by the constructLength value. But still, the suggested improvement is valid.

  3. The stacktrace with the problem ends in a read done on a DataInputStream ((_socketDIS.read()) that is inherited from FilterInputStream.read(), see code above. DataInputStream, not BufferedInputStream is missing the read(). Here in FilterInputStream.read() there is a in.read() called on a BufferedInputStream, this one has its own read() method defined. But the stacktrace stops in the middle, is not reaching BufferedInputStream.read(). Why?


Source: (StackOverflow)

C: epoll and multithreading

I need to create specialized HTTP server, for this I plan to use epoll sycall, but I want to utilize multiple processors/cores and I can't come up with architecture solution. ATM my idea is followng: create multiple threads with own epoll descriptors, main thread accepts connections and distributes them among threads epoll. But are there any better solutions? Which books/articles/guides can I read on high load architectures? I've seen only C10K article, but most links to examples are dead :( and still no in-depth books on this subject :(.

Thank you for answers.

UPD: Please be more specific, I need materials and examples (nginx is not an example because its too complex and has multiple abstraction layers to support multiple systems).


Source: (StackOverflow)

Advertisements

Java High-load NIO TCP server

As a part of my research I'm writing an high-load TCP/IP echo server in Java. I want to serve about 3-4k of clients and see the maximum possible messages per second that I can squeeze out of it. Message size is quite small - up to 100 bytes. This work doesn't have any practical purpose - only a research.

According to numerous presentations that I've seen (HornetQ benchmarks, LMAX Disruptor talks, etc), real-world high-load systems tend to serve millions of transactions per second (I believe Disruptor mentioned about 6 mils and and Hornet - 8.5). For example, this post states that it possible to achieve up to 40M MPS. So I took it as a rough estimate of what should modern hardware be capable of.

I wrote simplest single-threaded NIO server and launched a load test. I was little surprised that I can get only about 100k MPS on localhost and 25k with actual networking. Numbers look quite small. I was testing on Win7 x64, core i7. Looking at CPU load - only one core is busy (which is expected on a single-threaded app), while the rest sit idle. However even if I load all 8 cores (including virtual) I will have no more than 800k MPS - not even close to 40 millions :)

My question is: what is a typical pattern for serving massive amounts of messages to clients? Should I distribute networking load over several different sockets inside a single JVM and use some sort of load balancer like HAProxy to distribute load to multiple cores? Or I should look towards using multiple Selectors in my NIO code? Or maybe even distribute the load between multiple JVMs and use Chronicle to build an inter-process communication between them? Will testing on a proper serverside OS like CentOS make a big difference (maybe it is Windows that slows things down)?

Below is the sample code of my server. It always answers with "ok" to any incoming data. I know that in real world I'd need to track the size of the message and be prepared that one message might be split between multiple reads however I'd like to keep things super-simple for now.

public class EchoServer {

private static final int BUFFER_SIZE = 1024;
private final static int DEFAULT_PORT = 9090;

// The buffer into which we'll read data when it's available
private ByteBuffer readBuffer = ByteBuffer.allocate(BUFFER_SIZE);

private InetAddress hostAddress = null;

private int port;
private Selector selector;

private long loopTime;
private long numMessages = 0;

public EchoServer() throws IOException {
    this(DEFAULT_PORT);
}

public EchoServer(int port) throws IOException {
    this.port = port;
    selector = initSelector();
    loop();
}

private void loop() {
    while (true) {
        try{
            selector.select();
            Iterator<SelectionKey> selectedKeys = selector.selectedKeys().iterator();
            while (selectedKeys.hasNext()) {
                SelectionKey key = selectedKeys.next();
                selectedKeys.remove();

                if (!key.isValid()) {
                    continue;
                }

                // Check what event is available and deal with it
                if (key.isAcceptable()) {
                    accept(key);
                } else if (key.isReadable()) {
                    read(key);
                } else if (key.isWritable()) {
                    write(key);
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
            System.exit(1);
        }
    }
}

private void accept(SelectionKey key) throws IOException {
    ServerSocketChannel serverSocketChannel = (ServerSocketChannel) key.channel();

    SocketChannel socketChannel = serverSocketChannel.accept();
    socketChannel.configureBlocking(false);
    socketChannel.setOption(StandardSocketOptions.SO_KEEPALIVE, true);
    socketChannel.setOption(StandardSocketOptions.TCP_NODELAY, true);
    socketChannel.register(selector, SelectionKey.OP_READ);

    System.out.println("Client is connected");
}

private void read(SelectionKey key) throws IOException {
    SocketChannel socketChannel = (SocketChannel) key.channel();

    // Clear out our read buffer so it's ready for new data
    readBuffer.clear();

    // Attempt to read off the channel
    int numRead;
    try {
        numRead = socketChannel.read(readBuffer);
    } catch (IOException e) {
        key.cancel();
        socketChannel.close();

        System.out.println("Forceful shutdown");
        return;
    }

    if (numRead == -1) {
        System.out.println("Graceful shutdown");
        key.channel().close();
        key.cancel();

        return;
    }

    socketChannel.register(selector, SelectionKey.OP_WRITE);

    numMessages++;
    if (numMessages%100000 == 0) {
        long elapsed = System.currentTimeMillis() - loopTime;
        loopTime = System.currentTimeMillis();
        System.out.println(elapsed);
    }
}

private void write(SelectionKey key) throws IOException {
    SocketChannel socketChannel = (SocketChannel) key.channel();
    ByteBuffer dummyResponse = ByteBuffer.wrap("ok".getBytes("UTF-8"));

    socketChannel.write(dummyResponse);
    if (dummyResponse.remaining() > 0) {
        System.err.print("Filled UP");
    }

    key.interestOps(SelectionKey.OP_READ);
}

private Selector initSelector() throws IOException {
    Selector socketSelector = SelectorProvider.provider().openSelector();

    ServerSocketChannel serverChannel = ServerSocketChannel.open();
    serverChannel.configureBlocking(false);

    InetSocketAddress isa = new InetSocketAddress(hostAddress, port);
    serverChannel.socket().bind(isa);
    serverChannel.register(socketSelector, SelectionKey.OP_ACCEPT);
    return socketSelector;
}

public static void main(String[] args) throws IOException {
    System.out.println("Starting echo server");
    new EchoServer();
}
}

Source: (StackOverflow)

Redis periodicly stops responding on high load

I am using a simple redis server setup to store some values in my PHP application. Yesterday I installes phpredis module to use redis as PHP Session backend, which increased request rate on redis DB form 100 to 2000, and DB size from 60Mb to 200Mb.

And after this, redis is not avalable on every 10th request - just not responding. Log file does not show anything that could explain this.

I have more than 50% of memory free. Here are the resources used by redis:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                            
31075 root      20   0  170m 161m  936 S   41  2.0  11:10.52 redis-server

What can be the cause of this? Maybe I should tweak some redis settings for higher load?

Here is my redis.conf:

# Redis configuration file example

# Note on units: when memory size is needed, it is possible to specifiy
# it in the usual form of 1k 5GB 4M and so forth:
#
# 1k => 1000 bytes
# 1kb => 1024 bytes
# 1m => 1000000 bytes
# 1mb => 1024*1024 bytes
# 1g => 1000000000 bytes
# 1gb => 1024*1024*1024 bytes
#
# units are case insensitive so 1GB 1Gb 1gB are all the same.

# By default Redis does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis.pid when daemonized.
daemonize no

# When running daemonized, Redis writes a pid file in /var/run/redis.pid by
# default. You can specify a custom pid file location here.
pidfile /var/run/redis.pid

# Accept connections on the specified port, default is 6379.
# If port 0 is specified Redis will not listen on a TCP socket.
port 6379

# If you want you can bind a single interface, if the bind option is not
# specified all the interfaces will listen for incoming connections.
#
# bind 127.0.0.1

# Specify the path for the unix socket that will be used to listen for
# incoming connections. There is no default, so Redis will not listen
# on a unix socket when not specified.
#
# unixsocket /tmp/redis.sock

# Close the connection after a client is idle for N seconds (0 to disable)
timeout 300

# Set server verbosity to 'debug'
# it can be one of:
# debug (a lot of information, useful for development/testing)
# verbose (many rarely useful info, but not a mess like the debug level)
# notice (moderately verbose, what you want in production probably)
# warning (only very important / critical messages are logged)
loglevel debug

# Specify the log file name. Also 'stdout' can be used to force
# Redis to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
logfile /var/log/redis/redis.log

# To enable logging to the system logger, just set 'syslog-enabled' to yes,
# and optionally update the other syslog parameters to suit your needs.
# syslog-enabled no

# Specify the syslog identity.
# syslog-ident redis

# Specify the syslog facility.  Must be USER or between LOCAL0-LOCAL7.
# syslog-facility local0

# Set the number of databases. The default database is DB 0, you can select
# a different one on a per-connection basis using SELECT <dbid> where
# dbid is a number between 0 and 'databases'-1
databases 16

################################ SNAPSHOTTING  #################################
#
# Save the DB on disk:
#
#   save <seconds> <changes>
#
#   Will save the DB if both the given number of seconds and the given
#   number of write operations against the DB occurred.
#
#   In the example below the behaviour will be to save:
#   after 900 sec (15 min) if at least 1 key changed
#   after 300 sec (5 min) if at least 10 keys changed
#   after 60 sec if at least 10000 keys changed
#
#   Note: you can disable saving at all commenting all the "save" lines.

save 900 1
save 300 10
save 60 10000

# Compress string objects using LZF when dump .rdb databases?
# For default that's set to 'yes' as it's almost always a win.
# If you want to save some CPU in the saving child set it to 'no' but
# the dataset will likely be bigger if you have compressible values or keys.
rdbcompression yes

# The filename where to dump the DB
dbfilename dump.rdb

# The working directory.
#
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
# 
# Also the Append Only File will be created inside this directory.
# 
# Note that you must specify a directory here, not a file name.
dir /backups/redisdumps

################################# REPLICATION #################################

# Master-Slave replication. Use slaveof to make a Redis instance a copy of
# another Redis server. Note that the configuration is local to the slave
# so for example it is possible to configure the slave to save the DB with a
# different interval, or to listen to another port, and so on.
#
# slaveof <masterip> <masterport>

# If the master is password protected (using the "requirepass" configuration
# directive below) it is possible to tell the slave to authenticate before
# starting the replication synchronization process, otherwise the master will
# refuse the slave request.
#
# masterauth <master-password>

# When a slave lost the connection with the master, or when the replication
# is still in progress, the slave can act in two different ways:
#
# 1) if slave-serve-stale-data is set to 'yes' (the default) the slave will
#    still reply to client requests, possibly with out of data data, or the
#    data set may just be empty if this is the first synchronization.
#
# 2) if slave-serve-stale data is set to 'no' the slave will reply with
#    an error "SYNC with master in progress" to all the kind of commands
#    but to INFO and SLAVEOF.
#
slave-serve-stale-data yes

################################## SECURITY ###################################

# Require clients to issue AUTH <PASSWORD> before processing any other
# commands.  This might be useful in environments in which you do not trust
# others with access to the host running redis-server.
#
# This should stay commented out for backward compatibility and because most
# people do not need auth (e.g. they run their own servers).
# 
# Warning: since Redis is pretty fast an outside user can try up to
# 150k passwords per second against a good box. This means that you should
# use a very strong password otherwise it will be very easy to break.
#
# requirepass foobared

# Command renaming.
#
# It is possilbe to change the name of dangerous commands in a shared
# environment. For instance the CONFIG command may be renamed into something
# of hard to guess so that it will be still available for internal-use
# tools but not available for general clients.
#
# Example:
#
# rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52
#
# It is also possilbe to completely kill a command renaming it into
# an empty string:
#
# rename-command CONFIG ""

################################### LIMITS ####################################

# Set the max number of connected clients at the same time. By default there
# is no limit, and it's up to the number of file descriptors the Redis process
# is able to open. The special value '0' means no limits.
# Once the limit is reached Redis will close all the new connections sending
# an error 'max number of clients reached'.
#
# maxclients 128

# Don't use more memory than the specified amount of bytes.
# When the memory limit is reached Redis will try to remove keys with an
# EXPIRE set. It will try to start freeing keys that are going to expire
# in little time and preserve keys with a longer time to live.
# Redis will also try to remove objects from free lists if possible.
#
# If all this fails, Redis will start to reply with errors to commands
# that will use more memory, like SET, LPUSH, and so on, and will continue
# to reply to most read-only commands like GET.
#
# WARNING: maxmemory can be a good idea mainly if you want to use Redis as a
# 'state' server or cache, not as a real DB. When Redis is used as a real
# database the memory usage will grow over the weeks, it will be obvious if
# it is going to use too much memory in the long run, and you'll have the time
# to upgrade. With maxmemory after the limit is reached you'll start to get
# errors for write operations, and this may even lead to DB inconsistency.
#
# maxmemory <bytes>

# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached? You can select among five behavior:
# 
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key accordingly to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys->random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
# 
# Note: with all the kind of policies, Redis will return an error on write
#       operations, when there are not suitable keys for eviction.
#
#       At the date of writing this commands are: set setnx setex append
#       incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
#       sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
#       zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
#       getset mset msetnx exec sort
#
# The default is:
#
# maxmemory-policy volatile-lru

# LRU and minimal TTL algorithms are not precise algorithms but approximated
# algorithms (in order to save memory), so you can select as well the sample
# size to check. For instance for default Redis will check three keys and
# pick the one that was used less recently, you can change the sample size
# using the following configuration directive.
#
# maxmemory-samples 3

############################## APPEND ONLY MODE ###############################

# By default Redis asynchronously dumps the dataset on disk. If you can live
# with the idea that the latest records will be lost if something like a crash
# happens this is the preferred way to run Redis. If instead you care a lot
# about your data and don't want to that a single record can get lost you should
# enable the append only mode: when this mode is enabled Redis will append
# every write operation received in the file appendonly.aof. This file will
# be read on startup in order to rebuild the full dataset in memory.
#
# Note that you can have both the async dumps and the append only file if you
# like (you have to comment the "save" statements above to disable the dumps).
# Still if append only mode is enabled Redis will load the data from the
# log file at startup ignoring the dump.rdb file.
#
# IMPORTANT: Check the BGREWRITEAOF to check how to rewrite the append
# log file in background when it gets too big.

appendonly no

# The name of the append only file (default: "appendonly.aof")
# appendfilename appendonly.aof

# The fsync() call tells the Operating System to actually write data on disk
# instead to wait for more data in the output buffer. Some OS will really flush 
# data on disk, some other OS will just try to do it ASAP.
#
# Redis supports three different modes:
#
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log . Slow, Safest.
# everysec: fsync only if one second passed since the last fsync. Compromise.
#
# The default is "everysec" that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
#
# If unsure, use "everysec".

# appendfsync always
appendfsync everysec
# appendfsync no

# When the AOF fsync policy is set to always or everysec, and a background
# saving process (a background save or AOF log background rewriting) is
# performing a lot of I/O against the disk, in some Linux configurations
# Redis may block too long on the fsync() call. Note that there is no fix for
# this currently, as even performing fsync in a different thread will block
# our synchronous write(2) call.
#
# In order to mitigate this problem it's possible to use the following option
# that will prevent fsync() from being called in the main process while a
# BGSAVE or BGREWRITEAOF is in progress.
#
# This means that while another child is saving the durability of Redis is
# the same as "appendfsync none", that in pratical terms means that it is
# possible to lost up to 30 seconds of log in the worst scenario (with the
# default Linux settings).
# 
# If you have latency problems turn this to "yes". Otherwise leave it as
# "no" that is the safest pick from the point of view of durability.
no-appendfsync-on-rewrite no

################################## SLOW LOG ###################################

# The Redis Slow Log is a system to log queries that exceeded a specified
# execution time. The execution time does not include the I/O operations
# like talking with the client, sending the reply and so forth,
# but just the time needed to actually execute the command (this is the only
# stage of command execution where the thread is blocked and can not serve
# other requests in the meantime).
# 
# You can configure the slow log with two parameters: one tells Redis
# what is the execution time, in microseconds, to exceed in order for the
# command to get logged, and the other parameter is the length of the
# slow log. When a new command is logged the oldest one is removed from the
# queue of logged commands.

# The following time is expressed in microseconds, so 1000000 is equivalent
# to one second. Note that a negative number disables the slow log, while
# a value of zero forces the logging of every command.
slowlog-log-slower-than 10000

# There is no limit to this length. Just be aware that it will consume memory.
# You can reclaim memory used by the slow log with SLOWLOG RESET.
slowlog-max-len 1024

################################ VIRTUAL MEMORY ###############################

### WARNING! Virtual Memory is deprecated in Redis 2.4
### The use of Virtual Memory is strongly discouraged.

# Virtual Memory allows Redis to work with datasets bigger than the actual
# amount of RAM needed to hold the whole dataset in memory.
# In order to do so very used keys are taken in memory while the other keys
# are swapped into a swap file, similarly to what operating systems do
# with memory pages.
#
# To enable VM just set 'vm-enabled' to yes, and set the following three
# VM parameters accordingly to your needs.

vm-enabled no
# vm-enabled yes

# This is the path of the Redis swap file. As you can guess, swap files
# can't be shared by different Redis instances, so make sure to use a swap
# file for every redis process you are running. Redis will complain if the
# swap file is already in use.
#
# The best kind of storage for the Redis swap file (that's accessed at random) 
# is a Solid State Disk (SSD).
#
# *** WARNING *** if you are using a shared hosting the default of putting
# the swap file under /tmp is not secure. Create a dir with access granted
# only to Redis user and configure Redis to create the swap file there.
vm-swap-file /tmp/redis.swap

# vm-max-memory configures the VM to use at max the specified amount of
# RAM. Everything that deos not fit will be swapped on disk *if* possible, that
# is, if there is still enough contiguous space in the swap file.
#
# With vm-max-memory 0 the system will swap everything it can. Not a good
# default, just specify the max amount of RAM you can in bytes, but it's
# better to leave some margin. For instance specify an amount of RAM
# that's more or less between 60 and 80% of your free RAM.
vm-max-memory 0

# Redis swap files is split into pages. An object can be saved using multiple
# contiguous pages, but pages can't be shared between different objects.
# So if your page is too big, small objects swapped out on disk will waste
# a lot of space. If you page is too small, there is less space in the swap
# file (assuming you configured the same number of total swap file pages).
#
# If you use a lot of small objects, use a page size of 64 or 32 bytes.
# If you use a lot of big objects, use a bigger page size.
# If unsure, use the default :)
vm-page-size 32

# Number of total memory pages in the swap file.
# Given that the page table (a bitmap of free/used pages) is taken in memory,
# every 8 pages on disk will consume 1 byte of RAM.
#
# The total swap size is vm-page-size * vm-pages
#
# With the default of 32-bytes memory pages and 134217728 pages Redis will
# use a 4 GB swap file, that will use 16 MB of RAM for the page table.
#
# It's better to use the smallest acceptable value for your application,
# but the default is large in order to work in most conditions.
vm-pages 134217728

# Max number of VM I/O threads running at the same time.
# This threads are used to read/write data from/to swap file, since they
# also encode and decode objects from disk to memory or the reverse, a bigger
# number of threads can help with big objects even if they can't help with
# I/O itself as the physical device may not be able to couple with many
# reads/writes operations at the same time.
#
# The special value of 0 turn off threaded I/O and enables the blocking
# Virtual Memory implementation.
vm-max-threads 4

############################### ADVANCED CONFIG ###############################

# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given numer of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64

# Similarly to hashes, small lists are also encoded in a special way in order
# to save a lot of space. The special representation is only used when
# you are under the following limits:
list-max-ziplist-entries 512
list-max-ziplist-value 64

# Sets have a special encoding in just one case: when a set is composed
# of just strings that happens to be integers in radix 10 in the range
# of 64 bit signed integers.
# The following configuration setting sets the limit in the size of the
# set in order to use this special memory saving encoding.
set-max-intset-entries 512

# Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in
# order to help rehashing the main Redis hash table (the one mapping top-level
# keys to values). The hash table implementation redis uses (see dict.c)
# performs a lazy rehashing: the more operation you run into an hash table
# that is rhashing, the more rehashing "steps" are performed, so if the
# server is idle the rehashing is never complete and some more memory is used
# by the hash table.
# 
# The default is to use this millisecond 10 times every second in order to
# active rehashing the main dictionaries, freeing memory when possible.
#
# If unsure:
# use "activerehashing no" if you have hard latency requirements and it is
# not a good thing in your environment that Redis can reply form time to time
# to queries with 2 milliseconds delay.
#
# use "activerehashing yes" if you don't have such hard requirements but
# want to free memory asap when possible.
activerehashing yes

################################## INCLUDES ###################################

# Include one or more other config files here.  This is useful if you
# have a standard template that goes to all redis server but also need
# to customize a few per-server settings.  Include files can include
# other files, so use this wisely.
#
# include /path/to/local.conf
# include /path/to/other.conf

Source: (StackOverflow)

What would you recommend for a high traffic ajax intensive website?

For a website like reddit with lots of up/down votes and lots of comments per topic what should I go with?

Lighttpd/Php or Lighttpd/CherryPy/Genshi/SQLAlchemy?

and for database what would scale better / be fastest MySQL ( 4.1 or 5 ? ) or PostgreSQL?


Source: (StackOverflow)

How can I implement a simple WHOIS proxy in Perl?

I have several WHOIS servers for which I want to have a single proxy. The proxy should forward requests to appropriate servers based on the data in the query. How to approach this problem?


Source: (StackOverflow)

High CPU Usage in php-fpm

We have very strange problem on our Web-project.

We use:

2 Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
12 GB memory
We have about 20 hits per seconds. 4-5 requests per second are heavy – it is a search requests.
We use nginx + php-fpm (5.3.22)
MySQL server installed on another machine.
Most of time we have load average less than 10 and cpu usage about 50%
Sometimes we get cpu usage about 95% and after that load average grows to 50 and more!!!
You can see Load Average and CPU Usage here (my reputation low to send images here)
Load Average
CPU Usage

We have to reload php-fpm ( /etc/init.d/php-fpm reload) to normalize situation.
This can happens 4-5 times per day.
I tried to use strace to exam this situation.
Sorry for long logs! This output of command strace -cp PID
PID – is the random php-fpm process id (We start 100 php-fpm processes).
This two results in the moment with high cpu usage.

Process 17272 attached - interrupt to quit
Process 17272 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.56    0.008817         267        33           munmap
 13.38    0.001799         900         2           clone
  9.66    0.001299           2       589           read
  7.43    0.000999         125         8           mremap
  2.84    0.000382           1       559        96 access
  0.59    0.000080          40         2           waitpid
  0.29    0.000039           0       627           gettimeofday
  0.16    0.000022           0       346           write
  0.04    0.000006           0        56           getcwd
  0.04    0.000005           0       348           poll
  0.00    0.000000           0        55           open
  0.00    0.000000           0        69           close
  0.00    0.000000           0        17           chdir
  0.00    0.000000           0       189           time
  0.00    0.000000           0        28           lseek
  0.00    0.000000           0         2           pipe
  0.00    0.000000           0        17           times
  0.00    0.000000           0         8           brk
  0.00    0.000000           0         8           getrusage
  0.00    0.000000           0        18           setitimer
  0.00    0.000000           0         8           flock
  0.00    0.000000           0         1           nanosleep
  0.00    0.000000           0        11           rt_sigaction
  0.00    0.000000           0        13           rt_sigprocmask
  0.00    0.000000           0         6           pread64
  0.00    0.000000           0         7           pwrite64
  0.00    0.000000           0        33           mmap2
  0.00    0.000000           0        18         4 stat64
  0.00    0.000000           0        34           lstat64
  0.00    0.000000           0        92           fstat64
  0.00    0.000000           0        63           fcntl64
  0.00    0.000000           0        53           clock_gettime
  0.00    0.000000           0         1           socket
  0.00    0.000000           0         1         1 connect
  0.00    0.000000           0         9           accept
  0.00    0.000000           0         1           send
  0.00    0.000000           0        21           recv
  0.00    0.000000           0         9         1 shutdown
  0.00    0.000000           0         1           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.013448                  3363       102 total



[root@hp-php ~]# strace -cp 30767
Process 30767 attached - interrupt to quit
Process 30767 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.88    0.016926         220        77           munmap
 29.06    0.009301           2      4343           read
  8.73    0.002794         466         6           clone
  3.59    0.001149           0      5598           time
  3.18    0.001017           0      3745           write
  1.12    0.000358           0      7316           gettimeofday
  0.64    0.000205           1       164           fcntl64
  0.39    0.000124          21         6           waitpid
  0.22    0.000070           0      1496       326 access
  0.13    0.000041           0      3769           poll
  0.03    0.000009           0       151           close
  0.02    0.000008           0       114           clock_gettime
  0.02    0.000007           0       110           getcwd
  0.00    0.000000           0       112           open
  0.00    0.000000           0        38           chdir
  0.00    0.000000           0        47           lseek
  0.00    0.000000           0         6           pipe
  0.00    0.000000           0        38           times
  0.00    0.000000           0       135           brk
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0        14           getrusage
  0.00    0.000000           0        38           setitimer
  0.00    0.000000           0        19           flock
  0.00    0.000000           0        40           mlock
  0.00    0.000000           0        40           munlock
  0.00    0.000000           0         6           nanosleep
  0.00    0.000000           0        27           rt_sigaction
  0.00    0.000000           0        31           rt_sigprocmask
  0.00    0.000000           0        13           pread64
  0.00    0.000000           0        18           pwrite64
  0.00    0.000000           0        78           mmap2
  0.00    0.000000           0       111        10 stat64
  0.00    0.000000           0        49           lstat64
  0.00    0.000000           0       182           fstat64
  0.00    0.000000           0         8           socket
  0.00    0.000000           0         8         5 connect
  0.00    0.000000           0        19           accept
  0.00    0.000000           0         7           send
  0.00    0.000000           0        66           recv
  0.00    0.000000           0         3           recvfrom
  0.00    0.000000           0        20         1 shutdown
  0.00    0.000000           0         5           setsockopt
  0.00    0.000000           0         4           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.032009                 28080       342 total

Yes, out scripts reads much information. This is normal.
But why munmap works very long??!! And when we have problem munmap ALWAYS in top!
For comparison this is result of trace random php-fpm process in regular situation:

Process 28606 attached - interrupt to quit
Process 28606 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.72    0.001816           1      2601           read
 32.88    0.001306         435         3           clone
  9.19    0.000365           0      2175           write
  6.95    0.000276           0      7521           time
  2.24    0.000089           0      4158           gettimeofday
  2.01    0.000080           1       114           brk
  0.28    0.000011           0      2166           poll
  0.20    0.000008           0       833       155 access
  0.20    0.000008           0        53           recv
  0.18    0.000007           2         3           waitpid
  0.15    0.000006           0        18           munlock
  0.00    0.000000           0        69           open
  0.00    0.000000           0        96           close
  0.00    0.000000           0        29           chdir
  0.00    0.000000           0        36           lseek
  0.00    0.000000           0         3           pipe
  0.00    0.000000           0        29           times
  0.00    0.000000           0        10           getrusage
  0.00    0.000000           0         5           munmap
  0.00    0.000000           0         1           ftruncate
  0.00    0.000000           0        29           setitimer
  0.00    0.000000           0         1           sigreturn
  0.00    0.000000           0        11           flock
  0.00    0.000000           0        18           mlock
  0.00    0.000000           0         5           nanosleep
  0.00    0.000000           0        19           rt_sigaction
  0.00    0.000000           0        24           rt_sigprocmask
  0.00    0.000000           0         6           pread64
  0.00    0.000000           0        12           pwrite64
  0.00    0.000000           0        69           getcwd
  0.00    0.000000           0         5           mmap2
  0.00    0.000000           0        35         7 stat64
  0.00    0.000000           0        41           lstat64
  0.00    0.000000           0        96           fstat64
  0.00    0.000000           0       108           fcntl64
  0.00    0.000000           0        87           clock_gettime
  0.00    0.000000           0         5           socket
  0.00    0.000000           0         4         4 connect
  0.00    0.000000           0        16         2 accept
  0.00    0.000000           0         8           send
  0.00    0.000000           0        15           shutdown
  0.00    0.000000           0         4           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.003972                 20541       168 total

Process 29168 attached - interrupt to quit
Process 29168 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.81    0.002366           1      1717           read
 26.41    0.001140           1      1696           poll
  8.29    0.000358           0      1662           write
  7.37    0.000318           2       131       121 stat64
  1.53    0.000066           0      3249           gettimeofday
  1.18    0.000051           0       746       525 access
  0.23    0.000010           0        27           fcntl64
  0.19    0.000008           0        62           brk
  0.00    0.000000           0         1           restart_syscall
  0.00    0.000000           0         7           open
  0.00    0.000000           0        16           close
  0.00    0.000000           0         3           chdir
  0.00    0.000000           0      1039           time
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         3           times
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0         1           getrusage
  0.00    0.000000           0         4           munmap
  0.00    0.000000           0         3           setitimer
  0.00    0.000000           0         1           sigreturn
  0.00    0.000000           0         1           flock
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           pwrite64
  0.00    0.000000           0         3           getcwd
  0.00    0.000000           0         4           mmap2
  0.00    0.000000           0         7           fstat64
  0.00    0.000000           0         9           clock_gettime
  0.00    0.000000           0         6           socket
  0.00    0.000000           0         5         1 connect
  0.00    0.000000           0         3         2 accept
  0.00    0.000000           0         5           send
  0.00    0.000000           0        64           recv
  0.00    0.000000           0         3           recvfrom
  0.00    0.000000           0         2           shutdown
  0.00    0.000000           0         1           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.004317                 10489       649 total

And you can see that munmap not in top.
Now we don’t have ideas how to solve this problem :(
We examined next potential problems and answers are "NO":

  • additioan user activity
  • long scripts execution (several seconds)
  • using swap

Can you help us?


Source: (StackOverflow)

check whether mmap'ed address is correct

I'm writing a high-loaded daemon that should be run on the FreeBSD 8.0 and on Linux as well. The main purpose of daemon is to pass files that are requested by their identifier. Identifier is converted into local filename/file size via request to db. And then I use sequential mmap() calls to pass file blocks with send().

However sometimes there are mismatch of filesize in db and filesize on filesystem (realsize < size in db). In this situation I've sent all real data blocks and when next data block is mapped -- mmap returns no errors, just usual address (I've checked errno variable also, it's equal to zero after mmap). And when daemon tries to send this block it gets Segmentation Fault. (This behaviour is guarantedly issued on FreeBSD 8.0 amd64)

I was using safe check before open to ensure size with stat() call. However real life shows to me that segfault still can be raised in rare situtaions.

So, my question is there a way to check whether pointer is accessible before dereferencing it? When I've opened core in gdb, gdb says that given address is out of bound. Probably there is another solution somebody can propose.

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <time.h>
#include <unistd.h>

#define FILENAME        "./datafile"

int main()
{
    unsigned long i, j;

    srand(time(NULL));
    unsigned long pagesize = sysconf(_SC_PAGESIZE);

    unsigned long basesize = 4 * pagesize;
    unsigned long cropsize = 2 * pagesize;

    // create 4*pagesize sized file
    int f = creat(FILENAME, 0644);
    for (i = 0; i < basesize; i++) {
        unsigned char c = (unsigned char)rand();
        if (write(f, &c, 1) < 1) { perror("write"); break; }
    }
    close(f);

    f = open(FILENAME, O_RDONLY);

    // walk trough file
    unsigned char xor = 0;
    unsigned long offset = 0;
    for (j = 0; j < 4; j++) {
        // trunc file to 2*pagesize
        if (j == 2) truncate(FILENAME, cropsize);

        char *data = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE, f, offset);
        if (data == MAP_FAILED) { perror("mmap"); break; }
        printf("mmap: %lu@%lu for %i\n", pagesize, offset, f);

        for (i = 0; i < pagesize; i++) xor ^= data[i];

        offset += pagesize;
    }

    close(f);

    return 0;
}

Source: (StackOverflow)

hosting a high traffic facebook app (game)

we are currently developing a high traffic facebook application. all the traffic will be within one month, where there are 500.000 to 1.000.000 expected users. after that month, the game is over and we have a winner - so the app will be archived.

we are currently planning to develop the application with ruby on rails and searching for hosting options that can deal with the traffic. the problem is not so much the users, but the peak values: we will have around 500.000 requests coming daily within a short timeframe (lets say within 3 minutes in the worst case)

we are expecting 500.000 to 1.000.000 users of the application, with peaks at 1:00pm (timezone GMT+1), where most (up to 80% of the users) will send most of the requests. the requests are from 11th of june to 11.july - after that, the app/game is closed/over.

we are currently developing an aggressive caching mechanism - currently we are thinking about 2 or 3 small apps/webservices, that will handle the load.

the load is distributed as follows: a) main application, cached data (11 screens, 200k each) b) voting: every day until 1:00pm (timezone GMT+1) - every user votes with about 10k data sent, high concurrent peak values!

questions:

  • is there any specific application setup that is recommendable?
  • are there any hosting partners that can be recommended?

thanks!


Source: (StackOverflow)

Highly scalable tags on Google App Engine (Python)

I have a lot of (e.g.) posts, that marked with one or more tags. Post can be created or deleted, and also user can make search request for one or more tags (combined with logical AND). First idea that came to my mind was a simple model

class Post(db.Model):
  #blahblah
  tags = db.StringListProperty()

Implementation of create and delete operations is obvious. Search is more complex. To search for N tags it will do N GQL queries like "SELECT * FROM Post WHERE tags = :1" and merge the results using the cursors, and it has terrible performance.

Second idea is to separate tags in different entities

class Post(db.Model):
    #blahblah
    tags = db.ListProperty(db.Key) # For fast access

class Tag(db.Model):
    name = db.StringProperty(name="key")
    posts = db.ListProperty(db.Key) # List of posts that marked with tag

It takes Tags from db by key (much faster than take it by GQL) and merge it in memory, I think this implementation has a better performance than the first one, but very frequently usable tags can exceed maximal size that allowed for single datastore object. And there is another problem: datastore can modify one single object only ~1/sec, so for frequently usable tags we also have a bottleneck with modify latency.

Any suggestions?


Source: (StackOverflow)

Patterns and technologies for a system capable of processing 40,000 messages per second

We need to build a system capable of processing 40,000 messages per second. No messages can be lost in case of any software or hardware failures.

Each message size is about 2-4Kb.

Processing of a message consists of validating the message, doing some simple arithmetical calculations, saving result to database and (sometimes) sending notifications to other systems.

Preferable software technology is .Net.

What software and hardware patterns are the most suitable for such task?

How much hardware will it require?


Source: (StackOverflow)

How can I make nginx return a static response and send request headers to app?

I am making a high-load web statistics system through embedding <img> tag to site. The thing I want to do is:

  1. nginx gets request for an image from some host
  2. it gives as answer to host little 1px static image from filesystem
  3. at this time it somehow transfers request's headers to application and closes connection to host

I am working with Ruby and I'm going to make a pure-Rack app to get the headers and put them into a queue for further calculations.

The problem I can't solve is, how can I configure sphinx to give headers to the Rack app, and return a static image as the reply without waiting a for response from the Rack application?

Also, Rack is not required if there is more common Ruby-solution.


Source: (StackOverflow)

Is Google Application Engine a good platform for a high-traffic chat website?

I'm looking to create a high-traffic chat website, possibly with video streaming with some image manipulation happening on the server.

Scanning over the Channel API (http://code.google.com/appengine/docs/python/channel/overview.html) has made me hopeful this can be done without AJAX polling, and the general opinion is that GAE is very scalable.

I still have a few concerns:

1) Can it support tens of thousands simultaneous users that interact with each other in real-time without lagging? Is there a cap on CPU usage?

2) I'll (probably) be writing it on top of the J2EE framework. Does GAE guarantee that each new request will have access to a global in-memory datastore that will be available as long as the application is running on the server ("ServletContext" in Java-speak) and will be storing possibly gigabytes of data? Is there a memory cap?

3) Will the full J2SE and J2EE stack be available? Will I be able to include just any library I wish?

4) Are there better solutions for this kind of issue than GAE? I've been thinking about renting several dedicated servers, but this will go into the thousands/month...

Thanks in advance!


Source: (StackOverflow)

Better Practice: Placing the load on SQL or Web server?

I'm the webmaster for a major US university. We have a great deal of requests on our website, which I've built and been in charge of for the last 7 years or so. I've been building ever-more-complex features into our website and it's always been my practice to put as much of the programming burden on our multi-processor Microsoft SQL server as possible - using stored procedures, views, etc, and fill-in what can't be done with PHP, ASP, or Perl from the IIS web server. Both servers are very powerful and capable machines. Since I've been doing this alone for so long without anyone else to brainstorm with, I'm curious if my approach is ideal for even higher load situations we'll have in the future.

My question is: Is it better practice to place more of the load burden on the SQL server using nested SELECT statements, views, stored procedures and aggregate functions, or should I be pulling multiple simpler queries and processing through them using server-side compile-time scripts like PHP? Keep on keepin' on or come up with a better way?

I've recently become more interested in performance after I did some load traces and learned just how much I've been putting on the shoulders of the SQL server. Both the web server and SQL servers are fast and responsive throughout the day, and almost without regard for how much I put on them, but I'd like to be ready and have trained myself and upgraded my existing code optimized best practices in mind by the time it becomes important.

Thanks for your advice and input.


Source: (StackOverflow)

How digg (or other high load category websites) are storing user sessions?

How does digg or any other high-traffic website store user sessions? What do they use for storing the user sessions? File system, DB (which one?), memcache or both?

Let's imagine a simple situation. Logged user has set the flag "Remember me" during login. We've set a session cookie with expiration date 1 year. For example, we are keeping session in memcache, but we also should keep record of this session in DB (in my version). Only users with "Remember me" flag are stored in DB. Is it a right way of storing sessions? I mean high traffic websites, of course (with 2 or more application servers, 2 or more databases, memecache servers etc.). In small websites storing session by default way (in file system) is ok.

I've tried to search google, but failed to find any information about it. I've read some solutions from "Advanced PHP programming" book, but main accent was made to customizing session storing handler.

Really hope to hear good ideas or links!

Thank you.


Source: (StackOverflow)