Sorry, you need to enable JavaScript to visit this website.

Submitted by Michael Sherron on Fri, 11/02/2012 - 12:05

What is Varnish?
Varnish is web application accelerator. More specifically, its a 'caching HTML reverse proxy'. Let's take a second and unpack what that means and why it's awesome.
Your typical LAMP driven dynamic webpage is relatively expensive to produce. A request comes into Apache, is ran through PHP, data is pulled out of MySQL, manipulated by PHP, data may be written back to MySQL, more PHP, then finally given back to Apache to return to a users browser. Whew! All of this processing takes time and memory. As your application gets more popular (a nice problem to have!), you'll notice page speeds slow down and your site may even crash under heavy load.
Varnish will sit in front of your LAMP stack, take page requests and then ask itself - have I seen this page before? If yes, then let's serve the cached version of that page instead of going through the LAMP stack. Varnish is very, very good at this. It's also much better at serving static files, such as images and javascript, than Apache.
Varnish will dramatically reduce the number of pages your LAMP stack needs to process. A key concept to keep in mind though is that Varnish does its' best to only cache pages it believes are not unique - e.g., Anonymous traffic on your Drupal site. Authenticated pages are almost always going to be sent to Apache. For this reason, Varnish should be high on your performance improvements hit list, but only if you see a large amount of Anonymous traffic. If most of your traffic is Authenticated, your efforts may be better spent elsewhere. Also, keep in mind that in order to integrate Varnish with Drupal, you'll need reverse proxy support, which is only included with Pressflow or Drupal 7. Sorry, vanilla Drupal 6., a case study
As a comparison, let's look at the improvements we saw on after implementing Varnish. First, let's talk about how one measures server load. 
There's a statistic available when running 'top' or 'htop' called load average. This is a measure of how hard your server CPUs are working. If you have one core and your load average is 1, then that means that your CPU is currently maxed out at 100% of its' processing power. Any load average number over the number of CPUs on that machine means that processes are going to start piling up, and in the case of a web server, page load times will increase until the server runs out of resources and crashes. 
We went from having two 4-core web servers that were regularly well above a load average of 4 (over 100%), to around .5 after we deployed Varnish. We saw page speeds increase dramatically, and organic traffic increase by a huge margin over the following couple of months. While we worked on a lot of performance and usability improvements during that time frame, we attribute much of that success to deploying Varnish.

Where to install Varnish
Varnish is a comparatively lightweight linux system process. You can dedicate one or more servers to just being your Varnish cache, but in my case (only two web nodes) the benefits of doing so aren't worth the overhead. It's a much easier configuration to simply install Varnish on the same server as Apache. If I were to add another web node to our setup, I'd definitely reevaluate putting Varnish onto its' own box.

How to install Varnish
Here's general instructions on how to install Varnish for most Linux OSs. Here's specific instructions on how to install Varnish on Ubuntu that worked for me, as of July 2012. A couple of quick notes though:

  • The Varnish documentation says that Varnish should come bundled with your Ubuntu distribution, but in my case, this wasn't true. YMMV.
  • I ran into errors when trying to add the varnish repository to apt. To fix the issue, I just needed to update apt-get by running 'sudo apt-get upgrade'.

How to integrate Varnish with Apache
I used Lullabot's guide for getting Varnish up and running, so I'll just link to it from here. A few notes from my experience though:

  • Since Varnish lives on the same server as Apache in my case, I changed the port that Apache listens on to 8080 (by default Varnish takes up port 80), and set my backend  (Apache) port to 8080 in the default.vcl file.
  • Make sure to pay attention when the Lullabot article mentions changing the default server configuration file to use 'malloc'. The default stores the Varnish cache on disk, which is really not what you want unless memory is at a huge premium in your environment.
  • This is the default.vcl file I used as a template for our site. The first ones you'll see in that guide are intended for a standalone Varnish server which multiple web nodes connect to. The high-availability example configs are at the bottom of the article.
  • I recommend building this out on a non-public testing server first. This is going to take a lot of trial and error and you WILL break your website many, many times while getting things tweaked just right.
  • In my configuration, if I make changes to the default.vcl file, I have to restart Varnish first (/etc/init.d/varnish restart), THEN Apache. If I only restart Varnish without restarting Apache, my site will never load. Same thing if I restart Apache without restarting Varnish. This may be a problem with my configuration, YMMV.

How to integrate Varnish with Drupal
The community has a contrib module for Varnish. Since Varnish sits in front of Drupal, caching requests after the pages have been built, there's not much work here for the varnish module to do. It primarily adds a reporting interface under /admin/reports/varnish.
You do need to add 'reverse_proxy_addresses' to the conf array in settings.php. Note that those instructions worked fine for me on Pressflow.

A note on Cookies - or, 'what's the catch?'
Varnish is particularly grumpy about cookies. Remember that cookies are a way for a web application to maintain state from one dynamic page to the next. Varnish assumes that since many pages and static assets on sites rarely change, it's good to cache completed pages and return them to all users regardless of who they are. Varnish assumes that a cookie indicates that the page being requested should be therefore be unique, and won't cache or returned a cached version of that URL.
There's two things to consider here:

  • Does the cookie's presence indicate this page should be unique? E.g., is it going to be read by PHP and used to dynamically select which content to render on the page? If so, then Varnish is correct in not wanting to cache this page and you should generally let it do its' thing.
  • If it's a cookie that isn't used by your server-side processor, then you need to write an exception for it in your varnish configuration file. An example of this would be cookies that are used to track user behavior, such as analytics or A/B testing.

How to Monitor Varnish
In addition to the admin report I mentioned earlier under /admin/reports/varnish, there also command line tools to monitor Varnish: varnishtop and varnishstat