varnish interview questions
Top varnish frequently asked interview questions
Is there a way to list the contents of the varnish cache storage? Also, it would be nice to somehow list the most frequent cache hits.
I found a way to see the most frequent cache misses by listing what is being sent to the backend with:
varnishtop -b -i TxURL
It would be very useful to see what are my top cache hits URLs.
Edit: I am using version: varnish-3.0.3 revision 9e6a70f
Source: (StackOverflow)
My goal is to "whitelist" certain querystring attributes and their values so varnish will not vary cache between the urls.
Example:
Url 1: http://foo.com/someproduct.html?utm_code=google&type=hello
Url 2: http://foo.com/someproduct.html?utm_code=yahoo&type=hello
Url 3: http://foo.com/someproduct.html?utm_code=yahoo&type=goodbye
In the above example I want to whitelist "utm_code" but not "type" So after the first url is hit I want varnish to serve that cached content to the second url.
However, in the case of the third url, the attribute "type" value is different so that should be a varnish cache miss.
I have tried the 2 methods below (found on a drupal help article I can't locate right now) that did not seem to work. Might be because I have the regex wrong.
# 1. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_(campaign|content|medium|source|term)=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");
# 2. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_campaign=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])foo_bar=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])bar_baz=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");
Source: (StackOverflow)
I want to cache full pages on our web application (thousands of pages) that are rendered by the Rails stack, but don't change very often. Each render is quite expensive in terms of resources.
My understanding of how Varnishd works is that when an initial call is made to a URL, Varnishd will check its cache store, a miss will take place and so the request will be passed through to Rails and resulting page which gets generated is then added to the Varnishd cache.
Any subsequent calls made to that URL and then served from the Varnishd cache, the Rails stack is not involved.
Is this correct or am I way off?
How can have my app tell Varnishd when a specific page has been updated & to reflect any changes made in its cache store?
Is Varnishd a good choice for this purpose?
Thanks for your help - I know these are very basic questions, but docs just don't make this clear (to me at least).
Source: (StackOverflow)
I'm thinking about the best way to create a cache layer in front or as first layer for GET requests to my RESTful API (written in Ruby).
Not every request can be cached, because even for some GET requests the API has to validate the requesting user / application. That means I need to configure which request is cacheable and how long each cached answer is valid. For a few cases I need a very short expiration time of e.g. 15s and below. And I should be able to let cache entries expire by the API application even if the expiration date is not reached yet.
I already thought about many possible solutions, my two best ideas:
first layer of the API (even before the routing), cache logic by myself (to have all configuration options in my hand), answers and expiration date stored to Memcached
a webserver proxy (high configurable), perhaps something like Squid but I never used a proxy for a case like this before and I'm absolutely not sure about it
I also thought about a cache solution like Varnish, I used Varnish for "usual" web applications and it's impressive but the configuration is kind of special. But I would use it if it's the fastest solution.
An other thought was to cache to the Solr Index, which I'm already using in the data layer to not query the database for most requests.
If someone has a hint or good sources to read about this topic, let me know.
Source: (StackOverflow)
When sending a GET request directly to the backend with If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET
set, Apache correctly returns a 304 with no content.
When I send the same request through Varnish 3.0.2, it responds with a 200 and resends all the content even though the client already has it. Obviously, this isn't a good use of bandwidth. My understanding is that Varnish supports intelligent handling of this header and should be sending a 304, so I figure I'd done something wrong with my .vcl file.
Varnishlog gives this:
16 SessionOpen c 84.97.17.233 64416 :80
16 ReqStart c 84.97.17.233 64416 1597323690
16 RxRequest c GET
16 RxURL c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
16 RxProtocol c HTTP/1.0
16 RxHeader c Host: www.quotaproject.org
16 RxHeader c User-Agent: Sprawk/1.3 (http://www.sprawk.com/)
16 RxHeader c Accept: */*
16 RxHeader c Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
16 RxHeader c Connection: close
16 RxHeader c If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET
16 VCL_call c recv lookup
16 VCL_call c hash
16 Hash c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
16 Hash c www.quotaproject.org
16 VCL_return c hash
16 Hit c 1597322756
16 VCL_call c hit
16 VCL_acl c NO_MATCH CTRLF5
16 VCL_return c deliver
16 VCL_call c deliver deliver
16 TxProtocol c HTTP/1.1
16 TxStatus c 200
16 TxResponse c OK
16 TxHeader c Server: Apache
16 TxHeader c Last-Modified: Wed, 09 Jun 2004 16:07:50 GMT
16 TxHeader c Vary: Accept-Encoding
16 TxHeader c Content-Type: application/pdf
16 TxHeader c Date: Wed, 22 Feb 2012 18:25:05 GMT
16 TxHeader c Age: 12432
16 TxHeader c Connection: close
16 Gzip c U D - 107685 115763 80 796748 861415
16 Length c 98304
16 ReqEnd c 1597323690 1329935105.713264704 1329935106.208528996 0.000071526 0.000068426 0.495195866
16 SessionClose c EOF mode
16 StatSess c 84.97.17.233 64416 0 1 1 0 0 0 203 98304
If I understand this correctly, the object is already in Varnish's cache so it doesn't need to contact the backend, but it already knows the Last-Modified
so why would it not respond with 304?
And here's my VCL file:
backend idea {
# .host = "www.idea.int";
.host = "83.145.60.235"; # IDEA's public website IP
.port = "80";
}
backend qp {
# .host = "www.quotaproject.org";
.host = "83.145.60.235"; # IDEA's public website IP
.port = "80";
}
#
#Below is a commented-out copy of the default VCL logic. If you
#redefine any of these subroutines, the built-in logic will be
#appended to your code.
#
sub vcl_recv {
# force domain so that Apache handles the VH correctly
if (req.http.host ~ "^qp" || req.http.host ~ "quotaproject.org$") {
set req.http.Host = "www.quotaproject.org";
set req.backend = qp;
} else {
# default to idea.int
set req.http.Host = "www.idea.int";
set req.backend = idea;
}
# Before anything else we need to fix gzip compression
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} else if (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} else if (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
# unknown algorithm
remove req.http.Accept-Encoding;
}
}
# ajax requests bypass cache. TODO: Make sure you Javascript implementation for AJAX actually sets XMLHttpRequest
if (req.http.X-Requested-With == "XMLHttpRequest") {
return(pass);
}
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
# Purge everything url - this isn't the squid way, but works
if (req.url ~ "^/varnishpurge") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
if (req.url == "/varnishpurge") {
ban("req.http.host == " + req.http.host + " && req.url ~ ^/");
error 841 "Purged site.";
}
else {
ban("req.http.host == " + req.http.host + " && req.url ~ ^" + regsub( req.url, "^/varnishpurge(.*)$", "\1" ) + "$");
error 842 "Purged page.";
}
}
# spoof the client IP (taken from http://utvbloggen.se/snabb-guide-till-varnish/)
remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;
# Force delivery from cache even if other things indicate otherwise
if (req.url ~ "\.(flv)") {
# pipe flash start away
return(pipe);
}
if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
# cookies are irrelevant here
unset req.http.Cookie;
unset req.http.Authorization;
}
# Force short-circuit to the real site for these dynamic pages
if (req.url ~ "/customcf/" || req.url ~ "/uid/editData.cfm" || req.url ~ "^/private/") {
return(pass);
}
# Remove user agent, since Apache will server these resources the same way
if (req.http.User-Agent) {
set req.http.User-Agent = "";
}
if (req.http.Cookie) {
# removes all cookies named __utm? (utma, utmb...) - tracking thing
set req.http.Cookie = regsuball(req.http.Cookie, "(^|; ) *__utm.=[^;]+;? *", "\1");
# remove cStates for RHM boxes (the server doesn't need to know these, JS will handle this client-side)
set req.http.cookie = regsub(req.http.cookie, "(; )?cStates=[^;]*", ""); #cStates might sometimes have a blank value
# remove ColdFusion session cookie stuff
if (!req.url ~ "^/publications/" && !req.url ~ "^/uid/admin/") {
set req.http.cookie = regsub(req.http.cookie, "(; )?CFID=[^;]+", "");
set req.http.cookie = regsub(req.http.cookie, "(; )?CFTOKEN=[^;]+", "");
}
# Remove the cookie header if it's empty after cleanup
if (req.http.cookie ~ "^;? *$") {
# The only cookie data left is a semicolon or spaces
remove req.http.cookie;
}
}
}
#
# Called when the requested object was not found in the cache
#
sub vcl_hit {
# Allow administrators to easily flush the cache from their browser
if (client.ip ~ CTRLF5) {
if (req.http.pragma ~ "no-cache" || req.http.Cache-Control ~ "no-cache") {
set obj.ttl = 0s;
return(pass);
}
}
}
#
# Called when the requested object has been retrieved from the
# backend, or the request to the backend has failed
#
sub vcl_fetch {
set beresp.grace = 1h;
# strip the cookie before the image is inserted into cache.
if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
remove beresp.http.set-cookie;
set beresp.ttl = 100w;
}
# Remove CF session cookies for everything but the publications subsite
if (!req.url ~ "^/publications/" && !req.url ~ "/customcf/" && !req.url ~ "^/uid/admin/" && !req.url ~ "^/uid/editData.cfm") {
remove beresp.http.set-cookie;
}
if (beresp.ttl < 48h) {
set beresp.ttl = 48h;
}
}
#
# Called before a cached object is delivered to the client
#
sub vcl_deliver {
# We'll be hiding some headers added by Varnish. We want to make sure people are not seeing we're using Varnish.
remove resp.http.X-Varnish;
remove resp.http.Via;
# We'd like to hide the X-Powered-By headers. Nobody has to know we can run PHP and have version xyz of it.
remove resp.http.X-Powered-By;
}
Can anyone see the problem or problems?
Update: According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3
Note: When handling an If-Modified-Since header field, some
servers will use an exact date comparison function, rather than a
less-than function, for deciding whether to send a 304 (Not
Modified) response.
It seems this may be Varnish's behaviour. I'm sending another date which is previous to the real file's last modified date, but not exactly what is cached in Varnish.
Source: (StackOverflow)
Our company recently decided to start working with the Varnish HTTP accelerator. Most important why we chose this solution was because we are a company that specializes in building web shops (Magento Enterprise) => Magento has a commercial plugin that works together with varnish.
The varnish configuration is already present on our testing environment, which contains 1 (software) load balancer running a varnish instance, 2 apache webservers and 1 storage + 1 mysql server.
However now the time has come to add the Varnish to our development environment (virtualbox with 1GB of ram running debian which has the database, webserver, files running all on the same machine)
Could anyone post a default.vcl configuration file for this setup?
Apache2 runs on port 80.
Thanks in advance,
Kenny
EDIT: I found and posted the solution below.
Source: (StackOverflow)
I can't see a similar question, but apologies if I'm duping.
We're running a varnish cache on our system, but want to install a system where we can purge individual pages when they are edited (fairly normal). We've been trying to get it to work by using an HTTP header. So, our VCL is set up like:
acl purge {
"localhost";
#### Our server IP #####
}
sub vcl_recv {
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
return (lookup);
}
}
sub vcl_hit {
if (req.request == "PURGE") {
purge;
}
}
sub vcl_miss {
if (req.request == "PURGE") {
purge;
}
}
However, I'm stuck on how to actually SEND the http purge request. We're using PHP for the website, so I've tried using:
header("PL: PURGE / HTTP/1.0");
header("Host: url to purge");
But this doesn't seem to do anything (and varnishlog doesn't seem to show anything purging).
I've also experimented with cURL but, again, it doesn't seem to be working. Am I missing something really basic here, or is the basis sound, meaning my implementation is bugged?
Many thanks,
Source: (StackOverflow)
My flask app is doing a 301
redirect for one of the urls.
The traceback in New Relic is:
Traceback (most recent call last):
File "/var/www/app/env/local/lib/python2.7/site-packages/flask/app.py", line 1358, in full_dispatch_request
rv = self.dispatch_request()
File "/var/www/app/env/local/lib/python2.7/site-packages/flask/app.py", line 1336, in dispatch_request
self.raise_routing_exception(req)
File "/var/www/app/env/local/lib/python2.7/site-packages/flask/app.py", line 1319, in raise_routing_exception
raise request.routing_exception
RequestRedirect: 301: Moved Permanently
It doesn't look like it is even hitting my code or rather the traceback isn't showing any of my files in it. At one point I did have Nginx redirect all non SSL request to HTTPS but had to disable that as Varnish was not able to make the request to port 443
with out an error... probably some configuration that I did or didn't make.
It doesn't always return a 301
though, I can request the URL and get it without any trouble. But someone out in the world requesting the URL is getting a 301
response.
It is a GET
request with some custom headers to link it to the account.
At no point in my code is there a 301
redirect.
Source: (StackOverflow)
I've got a working app based in Ruby and Sinatra that is deployed on Heroku.
I want to take advantage of the HTTP caching available on Heroku, which uses Varnish.
I'm not sure what the best way to set the headers is, and the correct syntax.
Any thoughts on the best approach and syntax?
before do
headers "Content-Type" => "text/html; charset=utf8"
end
get '/' do
headers['Cache-Control'] = 'public, max-age=600'
# SOME STUFF HERE
haml :home, {:layout => :layout_minfooter}
end
Source: (StackOverflow)
We are using Varnish at the front of Plone. In the case Plone goes down or serves an internal error we'd like to show a user-friendly static HTML page which some CSS styling + images. ("The server is being updated page")
How to configure Varnish to do this?
Source: (StackOverflow)
I have been Googling aggressively, but without luck.
I'm using Varnish with great results, but I would like to host multiple websites on a single server (Apache), without Varnish caching all of them.
Can I specify what websites by URL to cache?
Thanks
Source: (StackOverflow)
I want to use Varnish to cache certain pages even in the presence of cookies. There are 3 possibilities that I need to take care of:
- An anonymous user is viewing some page
- A logged in user is viewing some page with light customization. These customizations are all stored in a signed-cookie and are dynamically populated by Javascript. The vary-cookie http header is not set.
- A logged in user is viewing some page with customized data from the database. The vary-cookie http header is set.
The expected behaviors would be:
- Cache the page. This is the most basic scenario for Varnish to handle.
- Cache the page and do not delete the cookie because some Javascript logic needs it.
- Never cache this page because vary-cookie is signalling the cookie contents will affect the output of this page.
I have read some docs on Varnish and I cannot tell if this is the default behavior or if there is some setup I have to do in VCL to make it happen.
Source: (StackOverflow)
First please forgive me for total lack of understanding of Varnish. This is my first go at doing anything with Varnish.
I am following the example at: http://www.kalenyuk.com.ua/magento-performance-optimization-with-varnish-cache-47.html
However when I install and run this, Varnish does not seem to cache. I do get the X-Varnish header with a single number and a Via header that has a value of 1.1 varnish
I have been told (by my ISP) it is because of the following cookie that Magento sets:
Set-Cookie: frontend=6t2d2q73rv9s1kddu8ehh8hvl6; expires=Thu, 17-Feb-2011 14:29:19 GMT; path=/; domain=XX.X.XX.XX; httponly
They said that I either have to change Magento to handle this or configure Varnish to handle this. Since changing Magento is out of the question, I was wondering if someone can give me a clue as to how I would configure Varnish to handle this cookie?
Source: (StackOverflow)
We are currently in the process of upgrading our Varnish Cache servers.
As part of the process, we upgraded only one of them to see how it behaves compared to the older versions.
Some of the major changes made in this new version is changing the regex engine from POSIX to PCRE. That means that some of our purges (regex purges) have stopped working on the newer server.
I was wondering if anyone can list/point me to a list of actual syntax differences between POSIX and PCRE. Or maybe a function that converts a POSIX regex to PCRE regex.
This is so that I can convert only the purges going to the newer server - without affecting the current regex syntax that is implemented in the system for the other servers.
Source: (StackOverflow)
Recently I added a Varnish instance to a Rails application stack. Varnish in it's default configuration can be convinced from caching a certain resource using the Cache-Control Header like so:
Cache-Control: max-age=86400, public=true
I achieved that one using the expires_in statement in my controllers:
def index
expires_in 24.hours, public: true
respond_with 'some content'
end
That worked well. What I did not expect is, that the Cache-Control header ALSO affects the browser. That leads to the problem that both - Varnish and my users browser cache a certain resource. The resource is purged from varnish correctly, but the browser does not attempts to request it again unless max-age is reached.
So I wonder wether I should use 'expires_in' in combination with Varnish at all? I could filter the Cache-Control header in a Nginx or Apache instance in front of Varnish, but that seems odd.
Can anyone enlighten me?
Regards
Felix
Source: (StackOverflow)