EzDevInfo.com

varnish interview questions

Top varnish frequently asked interview questions

List contents of varnish cache?

Is there a way to list the contents of the varnish cache storage? Also, it would be nice to somehow list the most frequent cache hits.

I found a way to see the most frequent cache misses by listing what is being sent to the backend with:

varnishtop -b -i TxURL

It would be very useful to see what are my top cache hits URLs.

Edit: I am using version: varnish-3.0.3 revision 9e6a70f

Source: (StackOverflow)

Stripping out select querystring attribute/value pairs so varnish will not vary cache by them

My goal is to "whitelist" certain querystring attributes and their values so varnish will not vary cache between the urls.

Example:

Url 1: http://foo.com/someproduct.html?utm_code=google&type=hello  
Url 2: http://foo.com/someproduct.html?utm_code=yahoo&type=hello  
Url 3: http://foo.com/someproduct.html?utm_code=yahoo&type=goodbye

In the above example I want to whitelist "utm_code" but not "type" So after the first url is hit I want varnish to serve that cached content to the second url.

However, in the case of the third url, the attribute "type" value is different so that should be a varnish cache miss.

I have tried the 2 methods below (found on a drupal help article I can't locate right now) that did not seem to work. Might be because I have the regex wrong.

# 1. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_(campaign|content|medium|source|term)=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

# 2. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_campaign=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])foo_bar=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])bar_baz=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

Source: (StackOverflow)

Is Varnishd the right caching solution to use with Rails?

I want to cache full pages on our web application (thousands of pages) that are rendered by the Rails stack, but don't change very often. Each render is quite expensive in terms of resources.

My understanding of how Varnishd works is that when an initial call is made to a URL, Varnishd will check its cache store, a miss will take place and so the request will be passed through to Rails and resulting page which gets generated is then added to the Varnishd cache.

Any subsequent calls made to that URL and then served from the Varnishd cache, the Rails stack is not involved.

Is this correct or am I way off?

How can have my app tell Varnishd when a specific page has been updated & to reflect any changes made in its cache store?

Is Varnishd a good choice for this purpose?

Thanks for your help - I know these are very basic questions, but docs just don't make this clear (to me at least).

Source: (StackOverflow)

Best way to cache RESTful API results of GET calls

I'm thinking about the best way to create a cache layer in front or as first layer for GET requests to my RESTful API (written in Ruby).

Not every request can be cached, because even for some GET requests the API has to validate the requesting user / application. That means I need to configure which request is cacheable and how long each cached answer is valid. For a few cases I need a very short expiration time of e.g. 15s and below. And I should be able to let cache entries expire by the API application even if the expiration date is not reached yet.

I already thought about many possible solutions, my two best ideas:

first layer of the API (even before the routing), cache logic by myself (to have all configuration options in my hand), answers and expiration date stored to Memcached
a webserver proxy (high configurable), perhaps something like Squid but I never used a proxy for a case like this before and I'm absolutely not sure about it

I also thought about a cache solution like Varnish, I used Varnish for "usual" web applications and it's impressive but the configuration is kind of special. But I would use it if it's the fastest solution.

An other thought was to cache to the Solr Index, which I'm already using in the data layer to not query the database for most requests.

If someone has a hint or good sources to read about this topic, let me know.

Source: (StackOverflow)

Why isn't Varnish sending 304 unmodified when If-Modified-Since header is sent?

When sending a GET request directly to the backend with If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET set, Apache correctly returns a 304 with no content.

When I send the same request through Varnish 3.0.2, it responds with a 200 and resends all the content even though the client already has it. Obviously, this isn't a good use of bandwidth. My understanding is that Varnish supports intelligent handling of this header and should be sending a 304, so I figure I'd done something wrong with my .vcl file.

Varnishlog gives this:

 16 SessionOpen  c 84.97.17.233 64416 :80
   16 ReqStart     c 84.97.17.233 64416 1597323690
   16 RxRequest    c GET
   16 RxURL        c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
   16 RxProtocol   c HTTP/1.0
   16 RxHeader     c Host: www.quotaproject.org
   16 RxHeader     c User-Agent: Sprawk/1.3 (http://www.sprawk.com/)
   16 RxHeader     c Accept: */*
   16 RxHeader     c Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
   16 RxHeader     c Connection: close
   16 RxHeader     c If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET
   16 VCL_call     c recv lookup
   16 VCL_call     c hash
   16 Hash         c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
   16 Hash         c www.quotaproject.org
   16 VCL_return   c hash
   16 Hit          c 1597322756
   16 VCL_call     c hit
   16 VCL_acl      c NO_MATCH CTRLF5
   16 VCL_return   c deliver
   16 VCL_call     c deliver deliver
   16 TxProtocol   c HTTP/1.1
   16 TxStatus     c 200
   16 TxResponse   c OK
   16 TxHeader     c Server: Apache
   16 TxHeader     c Last-Modified: Wed, 09 Jun 2004 16:07:50 GMT
   16 TxHeader     c Vary: Accept-Encoding
   16 TxHeader     c Content-Type: application/pdf
   16 TxHeader     c Date: Wed, 22 Feb 2012 18:25:05 GMT
   16 TxHeader     c Age: 12432
   16 TxHeader     c Connection: close
   16 Gzip         c U D - 107685 115763 80 796748 861415
   16 Length       c 98304
   16 ReqEnd       c 1597323690 1329935105.713264704 1329935106.208528996 0.000071526 0.000068426 0.495195866
   16 SessionClose c EOF mode
   16 StatSess     c 84.97.17.233 64416 0 1 1 0 0 0 203 98304

If I understand this correctly, the object is already in Varnish's cache so it doesn't need to contact the backend, but it already knows the Last-Modified so why would it not respond with 304?

And here's my VCL file:

 backend idea {
  # .host = "www.idea.int";
  .host = "83.145.60.235"; # IDEA's public website IP
  .port = "80";
}
backend qp {
  # .host = "www.quotaproject.org";
  .host = "83.145.60.235"; # IDEA's public website IP
  .port = "80";
}
#
#Below is a commented-out copy of the default VCL logic.  If you
#redefine any of these subroutines, the built-in logic will be
#appended to your code.
#
sub vcl_recv {
  # force domain so that Apache handles the VH correctly
  if (req.http.host ~ "^qp" || req.http.host ~ "quotaproject.org$") {
    set req.http.Host = "www.quotaproject.org";
    set req.backend = qp;
  } else {
    # default to idea.int
     set req.http.Host = "www.idea.int";
     set req.backend = idea;
  }
  # Before anything else we need to fix gzip compression 
  if (req.http.Accept-Encoding) {
      if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
          # No point in compressing these
          remove req.http.Accept-Encoding;
      } else if (req.http.Accept-Encoding ~ "gzip") {
          set req.http.Accept-Encoding = "gzip";
      } else if (req.http.Accept-Encoding ~ "deflate") {
          set req.http.Accept-Encoding = "deflate";
      } else {
          # unknown algorithm
          remove req.http.Accept-Encoding;
      }
  }
  # ajax requests bypass cache. TODO: Make sure you Javascript implementation for AJAX actually sets XMLHttpRequest
  if (req.http.X-Requested-With == "XMLHttpRequest") {
        return(pass);
   }
  if (req.request != "GET" &&
     req.request != "HEAD" &&
     req.request != "PUT" &&
     req.request != "POST" &&
     req.request != "TRACE" &&
     req.request != "OPTIONS" &&
     req.request != "DELETE") {
     /* Non-RFC2616 or CONNECT which is weird. */
     return (pipe);
   }
   # Purge everything url - this isn't the squid way, but works
    if (req.url ~ "^/varnishpurge") {
       if (!client.ip ~ purge) {
            error 405 "Not allowed.";
       }
       if (req.url == "/varnishpurge") {
            ban("req.http.host == " + req.http.host + " && req.url ~ ^/");
            error 841 "Purged site.";
       }
       else {
            ban("req.http.host == " + req.http.host + " && req.url ~ ^" + regsub( req.url, "^/varnishpurge(.*)$", "\1" ) + "$");
            error 842 "Purged page.";
       }
    }
  # spoof the client IP (taken from http://utvbloggen.se/snabb-guide-till-varnish/)
  remove req.http.X-Forwarded-For;
  set req.http.X-Forwarded-For = client.ip;
  # Force delivery from cache even if other things indicate otherwise
  if (req.url ~ "\.(flv)") {
    # pipe flash start away
    return(pipe);
  }
  if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
    # cookies are irrelevant here
    unset req.http.Cookie;
    unset req.http.Authorization; 
  }
  # Force short-circuit to the real site for these dynamic pages
  if (req.url ~ "/customcf/" || req.url ~ "/uid/editData.cfm" || req.url ~ "^/private/") {
    return(pass);
  }
  # Remove user agent, since Apache will server these resources the same way
  if (req.http.User-Agent) {
    set req.http.User-Agent = "";
  }
  if (req.http.Cookie) {
    # removes all cookies named __utm? (utma, utmb...) - tracking thing
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|; ) *__utm.=[^;]+;? *", "\1"); 
    # remove cStates for RHM boxes (the server doesn't need to know these, JS will handle this client-side)
    set req.http.cookie = regsub(req.http.cookie, "(; )?cStates=[^;]*", ""); #cStates might sometimes have a blank value
    # remove ColdFusion session cookie stuff
    if (!req.url ~ "^/publications/" && !req.url ~ "^/uid/admin/") {
      set req.http.cookie = regsub(req.http.cookie, "(; )?CFID=[^;]+", "");
      set req.http.cookie = regsub(req.http.cookie, "(; )?CFTOKEN=[^;]+", "");
    }
    # Remove the cookie header if it's empty after cleanup
    if (req.http.cookie ~ "^;? *$") {
      # The only cookie data left is a semicolon or spaces
      remove req.http.cookie;
    }
  }
}
#
# Called when the requested object was not found in the cache
#
sub vcl_hit {
  # Allow administrators to easily flush the cache from their browser
  if (client.ip ~ CTRLF5) {
    if (req.http.pragma ~ "no-cache" || req.http.Cache-Control ~ "no-cache") {
      set obj.ttl = 0s;
      return(pass);
    }
  }
}
#
# Called when the requested object has been retrieved from the
# backend, or the request to the backend has failed
#
sub vcl_fetch {
  set beresp.grace = 1h;
  # strip the cookie before the image is inserted into cache.
  if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
    remove beresp.http.set-cookie;
    set beresp.ttl = 100w;
  }
  # Remove CF session cookies for everything but the publications subsite
  if (!req.url ~ "^/publications/" && !req.url ~ "/customcf/" && !req.url ~ "^/uid/admin/" && !req.url ~ "^/uid/editData.cfm") {
    remove beresp.http.set-cookie;
  }
  if (beresp.ttl < 48h) {
    set beresp.ttl = 48h;
  }
}
#
# Called before a cached object is delivered to the client
#
sub vcl_deliver {
  # We'll be hiding some headers added by Varnish. We want to make sure people are not seeing we're using Varnish.
  remove resp.http.X-Varnish;
  remove resp.http.Via;
  # We'd like to hide the X-Powered-By headers. Nobody has to know we can run PHP and have version xyz of it.
  remove resp.http.X-Powered-By;
}

Can anyone see the problem or problems?

Update: According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3

Note: When handling an If-Modified-Since header field, some
      servers will use an exact date comparison function, rather than a
      less-than function, for deciding whether to send a 304 (Not
      Modified) response.

It seems this may be Varnish's behaviour. I'm sending another date which is previous to the real file's last modified date, but not exactly what is cached in Varnish.

Source: (StackOverflow)

Setting up varnish on same server as webserver

Our company recently decided to start working with the Varnish HTTP accelerator. Most important why we chose this solution was because we are a company that specializes in building web shops (Magento Enterprise) => Magento has a commercial plugin that works together with varnish.

The varnish configuration is already present on our testing environment, which contains 1 (software) load balancer running a varnish instance, 2 apache webservers and 1 storage + 1 mysql server.

However now the time has come to add the Varnish to our development environment (virtualbox with 1GB of ram running debian which has the database, webserver, files running all on the same machine)

Could anyone post a default.vcl configuration file for this setup?

Apache2 runs on port 80.

Thanks in advance, Kenny

EDIT: I found and posted the solution below.

Source: (StackOverflow)

How to send a purge request in varnish

I can't see a similar question, but apologies if I'm duping.

We're running a varnish cache on our system, but want to install a system where we can purge individual pages when they are edited (fairly normal). We've been trying to get it to work by using an HTTP header. So, our VCL is set up like:

acl purge {
      "localhost";
#### Our server IP #####
}

sub vcl_recv {
    if (req.request == "PURGE") {
            if (!client.ip ~ purge) {
                    error 405 "Not allowed.";
            }
            return (lookup);
    }
}

sub vcl_hit {
    if (req.request == "PURGE") {
            purge;
    }
 }

sub vcl_miss {
        if (req.request == "PURGE") {
                 purge;
        }
}

However, I'm stuck on how to actually SEND the http purge request. We're using PHP for the website, so I've tried using:

header("PL: PURGE / HTTP/1.0");
header("Host: url to purge");

But this doesn't seem to do anything (and varnishlog doesn't seem to show anything purging).

I've also experimented with cURL but, again, it doesn't seem to be working. Am I missing something really basic here, or is the basis sound, meaning my implementation is bugged?

Many thanks,

Source: (StackOverflow)

Flask 301 Response

My flask app is doing a 301 redirect for one of the urls.

The traceback in New Relic is:

Traceback (most recent call last):
  File "/var/www/app/env/local/lib/python2.7/site-packages/flask/app.py", line 1358, in full_dispatch_request
    rv = self.dispatch_request()
  File "/var/www/app/env/local/lib/python2.7/site-packages/flask/app.py", line 1336, in dispatch_request
    self.raise_routing_exception(req)
  File "/var/www/app/env/local/lib/python2.7/site-packages/flask/app.py", line 1319, in raise_routing_exception
    raise request.routing_exception
RequestRedirect: 301: Moved Permanently

It doesn't look like it is even hitting my code or rather the traceback isn't showing any of my files in it. At one point I did have Nginx redirect all non SSL request to HTTPS but had to disable that as Varnish was not able to make the request to port 443 with out an error... probably some configuration that I did or didn't make.

It doesn't always return a 301 though, I can request the URL and get it without any trouble. But someone out in the world requesting the URL is getting a 301 response.

It is a GET request with some custom headers to link it to the account.

At no point in my code is there a 301 redirect.

Source: (StackOverflow)

How do I set HTTP Headers in Ruby/Sinatra app, hosted on Heroku?

I've got a working app based in Ruby and Sinatra that is deployed on Heroku.

I want to take advantage of the HTTP caching available on Heroku, which uses Varnish.

I'm not sure what the best way to set the headers is, and the correct syntax.

Any thoughts on the best approach and syntax?

before do
    headers "Content-Type" => "text/html; charset=utf8"
end

get '/' do
    headers['Cache-Control'] = 'public, max-age=600'

    # SOME STUFF HERE

    haml :home, {:layout => :layout_minfooter}

end

Source: (StackOverflow)

User-friendly error pages from Varnish

We are using Varnish at the front of Plone. In the case Plone goes down or serves an internal error we'd like to show a user-friendly static HTML page which some CSS styling + images. ("The server is being updated page")

How to configure Varnish to do this?

Source: (StackOverflow)

Varnish: cache only specific domain

I have been Googling aggressively, but without luck.

I'm using Varnish with great results, but I would like to host multiple websites on a single server (Apache), without Varnish caching all of them.

Can I specify what websites by URL to cache?

Thanks

Source: (StackOverflow)

How to make Varnish ignore, not delete cookies [closed]

I want to use Varnish to cache certain pages even in the presence of cookies. There are 3 possibilities that I need to take care of:

An anonymous user is viewing some page
A logged in user is viewing some page with light customization. These customizations are all stored in a signed-cookie and are dynamically populated by Javascript. The vary-cookie http header is not set.
A logged in user is viewing some page with customized data from the database. The vary-cookie http header is set.

The expected behaviors would be:

Cache the page. This is the most basic scenario for Varnish to handle.
Cache the page and do not delete the cookie because some Javascript logic needs it.
Never cache this page because vary-cookie is signalling the cookie contents will affect the output of this page.

I have read some docs on Varnish and I cannot tell if this is the default behavior or if there is some setup I have to do in VCL to make it happen.

Source: (StackOverflow)

Getting Varnish To Work on Magento

First please forgive me for total lack of understanding of Varnish. This is my first go at doing anything with Varnish.

I am following the example at: http://www.kalenyuk.com.ua/magento-performance-optimization-with-varnish-cache-47.html

However when I install and run this, Varnish does not seem to cache. I do get the X-Varnish header with a single number and a Via header that has a value of 1.1 varnish

I have been told (by my ISP) it is because of the following cookie that Magento sets:

Set-Cookie: frontend=6t2d2q73rv9s1kddu8ehh8hvl6; expires=Thu, 17-Feb-2011 14:29:19 GMT; path=/; domain=XX.X.XX.XX; httponly

They said that I either have to change Magento to handle this or configure Varnish to handle this. Since changing Magento is out of the question, I was wondering if someone can give me a clue as to how I would configure Varnish to handle this cookie?

Source: (StackOverflow)

Regex Syntax changes between POSIX and PCRE

We are currently in the process of upgrading our Varnish Cache servers. As part of the process, we upgraded only one of them to see how it behaves compared to the older versions.

Some of the major changes made in this new version is changing the regex engine from POSIX to PCRE. That means that some of our purges (regex purges) have stopped working on the newer server.

I was wondering if anyone can list/point me to a list of actual syntax differences between POSIX and PCRE. Or maybe a function that converts a POSIX regex to PCRE regex.

This is so that I can convert only the purges going to the newer server - without affecting the current regex syntax that is implemented in the system for the other servers.

Source: (StackOverflow)

Howto control Varnish and a Browser using Cache-Control: max-age Header in a Rails environment?

Recently I added a Varnish instance to a Rails application stack. Varnish in it's default configuration can be convinced from caching a certain resource using the Cache-Control Header like so:

Cache-Control: max-age=86400, public=true

I achieved that one using the expires_in statement in my controllers:

def index
  expires_in 24.hours, public: true
  respond_with 'some content'
end

That worked well. What I did not expect is, that the Cache-Control header ALSO affects the browser. That leads to the problem that both - Varnish and my users browser cache a certain resource. The resource is purged from varnish correctly, but the browser does not attempts to request it again unless max-age is reached.

So I wonder wether I should use 'expires_in' in combination with Varnish at all? I could filter the Cache-Control header in a Nginx or Apache instance in front of Varnish, but that seems odd.

Can anyone enlighten me?

Regards Felix

Source: (StackOverflow)