I use Varnish on my production server without any issues – it works quite well, and I have come to consider it an essential component in my server stack. I have recently been having a bit of trouble with a new project of mine that I currently believe a misconfigured Varnish instance to be responsible for. As such, I wanted to reproduce the problem in my test environment, but ran into a completely different problem attempting to do so.
In its most basic form, I have Nginx listening on port 81 (all interfaces), and varnish listening on port 80 (all interfaces). Essentially, Varnish receives a request – checks its cache, and either fetches the request from cache if it is a hit, or from the backend if it is a miss.
The problem was that Varnish simply did not acknowledge the existence of the backend, continually returning ‘503 Service Unavailable’. The following is an account of my diagnostics and resolution to this problem. (Please note: this is a different problem than an intermittent 503 error – I was never once able to successfully load a page through Varnish before I fixed this problem).
Environment:
- Virtualbox 4.1.8r75647
- Host operating system: Windows 7 SP1 (64-bit)
- Guest operating system: CentOS 6.2 (32-bit) (minimal install)
- Networking: bridged adapter
As a basic setup, I used CentOS 6.2 – since this most closely resembles the Amazon Linux that I use on my production box. The official Nginx and Varnish repositories were added and nginx and varnish were installed:
yum localinstall http://nginx.org/packages/centos/6/noarch/RPMS/nginx-release-centos-6-0.el6.ngx.noarch.rpm yum localinstall --nogpgcheck http://repo.varnish-cache.org/redhat/varnish-3.0/el5/noarch/varnish-release-3.0-1.noarch.rpm yum install nginx varnish
Configure Nginx
I created a basic test site – with a single static file, and overrode the default Nginx config:
mkdir -p /var/www/example.com/web echo "Testing 1, 2, 3..." > /var/www/example.com/web/test.txt > /etc/nginx/conf.d/default.conf
/etc/nginx/conf.d/default.conf:
server { server_name example.com www.example.com; listen 81 default; root /var/www/example.com/web; }
You’ll note here that I am listening on port 81 instead of port 80 – this is because I intend for Varnish to be the front end server. After these changes, I restarted Nginx:
service nginx start
Verify that Nginx is running and listening on the correct port:
ps -ef | grep nginx root 1502 1 0 07:09 ? 00:00:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf nginx 1504 1502 0 07:09 ? 00:00:00 nginx: worker process root 1670 1203 0 07:15 pts/0 00:00:00 grep nginx
netstat -anp | grep nginx tcp 0 0 0.0.0.0:81 0.0.0.0:* LISTEN 1502/nginx unix 3 [ ] STREAM CONNECTED 15289 1502/nginx unix 3 [ ] STREAM CONNECTED 15288 1502/nginx
Finally see if I can retrive the page:
curl --header "Host: example.com" 127.0.0.1:81/test.txt Testing 1, 2, 3...
So far, everything looks good – Nginx is working without issue, as expected.
Configure Varnish
I am going for a really minimal setup here – simply trying to illustrate the problem at hand – no optimizations at all in place. Essentially, I just change the Varnish port to 80 and use malloc instead of file backed storage:
Edit /etc/sysconfig/varnish, and change the following:
VARNISH_LISTEN_PORT=80 VARNISH_STORAGE_SIZE=50M VARNISH_STORAGE="malloc,${VARNISH_STORAGE_SIZE}"
Now, we need to point Varnish to the backend that is running on port 81:
Edit /etc/varnish/default.vcl:
backend default { .host = "127.0.0.1"; .port = "81"; }
Finally, we start Varnish:
service varnish start
Confirm that it is in fact running:
ps -ef | grep varnish root 1571 1 0 07:11 ? 00:00:00 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 1,1000,120 -u varnish -g varnish -S /etc/varnish/secret -s malloc,50M varnish 1572 1571 0 07:11 ? 00:00:00 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 120 -w 1,1000,120 -u varnish -g varnish -S /etc/varnish/secret -s malloc,50M root 1668 1203 0 07:15 pts/0 00:00:00 grep varnish
Verify that it is listening on the correct port:
netstat -anp | grep varnish tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1572/varnishd tcp 0 0 127.0.0.1:6082 0.0.0.0:* LISTEN 1571/varnishd tcp 0 0 :::80 :::* LISTEN 1572/varnishd unix 2 [ ] DGRAM 15619 1571/varnishd
Trying to retrieve the same file from port 80 (Varnish) results in the following:
curl --header "Host: example.com" 127.0.0.1:80/test.txt <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <title>503 Service Unavailable</title> </head> <body> <h1>Error 503 Service Unavailable</h1> <p>Service Unavailable</p> <h3>Guru Meditation:</h3> <p>XID: 1041692010</p> <hr> <p>Varnish cache server</p> </body> </html>
At the same time, looking at the output of varnishlog displays:
11 SessionOpen c 127.0.0.1 41777 :80 11 ReqStart c 127.0.0.1 41777 1041692011 11 RxRequest c GET 11 RxURL c /test.txt 11 RxProtocol c HTTP/1.1 11 RxHeader c User-Agent: curl/7.19.7 (i686-pc-linux-gnu) libcurl/7.19.7 NSS/3.12.7.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2 11 RxHeader c Accept: */* 11 RxHeader c Host: example.com 11 VCL_call c recv lookup 11 VCL_call c hash 11 Hash c /test.txt 11 Hash c example.com 11 VCL_return c hash 11 VCL_call c miss fetch 11 FetchError c no backend connection 11 VCL_call c error deliver 11 VCL_call c deliver deliver 11 TxProtocol c HTTP/1.1 11 TxStatus c 503 11 TxResponse c Service Unavailable 11 TxHeader c Server: Varnish 11 TxHeader c Content-Type: text/html; charset=utf-8 11 TxHeader c Retry-After: 5 11 TxHeader c Content-Length: 419 11 TxHeader c Accept-Ranges: bytes 11 TxHeader c Date: Sun, 11 Mar 2012 11:16:54 GMT 11 TxHeader c X-Varnish: 1041692011 11 TxHeader c Age: 0 11 TxHeader c Via: 1.1 varnish 11 TxHeader c Connection: close 11 Length c 419 11 ReqEnd c 1041692011 1331464614.671160936 1331464614.671626806 0.000131845 0.000425816 0.000040054 11 SessionClose c error 11 StatSess c 127.0.0.1 41777 0 1 1 0 0 0 257 419
Note the line:
11 FetchError c no backend connection
Obviously though, I have a valid backend – I can reach it, and it returns a valid page. As part of my diagostics, I tried the following:
- Disabling iptables (service iptables stop)
- Passing the backend directly to Varnish using the -b parameter (and removing the config file).
- Trying a different backend
- Restarting the VM and all servers more times than I can count
- Rebuilding the entire VM from scratch
- Using the external IP address for the backend instead of the loopback
None of these had any effect or revealed anything useful – and Varnish didn’t log any errors either.
SELinux
For some odd reason though, I thought to try setting SELinux to permissive – and that was a success.
If you are running auditd
, the actions SELinux denies will be logged to /var/log/audit/audit.log
Running grep varnish /var/log/audit/audit.log
, shows the blocks that occured:
type=AVC msg=audit(1331500393.450:25): avc: denied { name_connect } for pid=1276 comm="varnishd" dest=81 scontext=unconfined_u:system_r:varnishd_t:s0 tcontext=system_u:object_r:reserved_port_t:s0 tclass=tcp_socket
The current state of SELinux enforcement can be determined with:
cat /selinux/enforce 1
Setting SELinux to temporarily permissive and retesting shows that it was, indeed, the source of the problem:
echo 0 >/selinux/enforce
curl --header "Host: example.com" 127.0.0.1:80/test.txt Testing 1, 2, 3...
Of course, this is only a temporary fix – as soon as the server is restarted, SELinux will go back to enforcing mode.
You can setup SELinux as permanently permissive, or generate a policy from the logged denial using audit2allow
.
Install audit2allow
:
yum install policycoreutils-python
Running audit2allow -a -w
gives the following:
type=AVC msg=audit(1331500393.450:25): avc: denied { name_connect } for pid=1276 comm="varnishd" dest=81 scontext=unconfined_u:system_r:varnishd_t:s0 tcontext=system_u:object_r:reserved_port_t:s0 tclass=tcp_socket Was caused by: One of the following booleans was set incorrectly. Description: Allow varnishd to connect to all ports, not just HTTP. Allow access by executing: # setsebool -P varnishd_connect_any 1 Description: Allow system to run with NIS Allow access by executing: # setsebool -P allow_ypbind 1
Running setsebool -P varnishd_connect_any 1
, and trying to fetch the page through Varnish, shows that the problem has indeed been fixed, although SELinux is still in enforcing mode (I didn’t need to run the second command). The change persists through a server restart.
Hi there,
thanks for the hint. I am experiencing the same odd thing. At some time varnish just rejects to open backend connections even if they’re alive. Nothing obvious in the logs.
Any other idea?
greetz
Robert
The thing with the SELinux cause is that it is constant – the response is always 503. Unfortunately, I don’t have a lot to go by, so a few guesses – if it happens during times of load, check and increase your timeouts (.connect_timeout; .first_byte_timeout; .between_bytes_timeout); Give saint mode a try if you find that the timeouts improve things. One other possibility that comes to mind is that a related set of requests all result in 50x errors on the backend, causing varnish to mark the backend as invalid for a short time.
Hi, I had got this the same error and I found your site.
So, I don’t read all contents. But my site had been fixed by check error code 500 from backend web-service nginx/apache. You can try it.
The problem you describe is a different one than is outlined here. Thanks for commenting, good to know you got your site working.
I’m just stopping by to say thank you for this post. I’ve experienced exactly the same problem on a server, I’ve gone through all the phases of despair you went through (iptables, another ip, pulling my hair 🙂 – and it turned out to be selinux. Thank you, it’s still relevant as of 2013-04 despite the warning on the top.
You’re most welcome – glad you found it useful. I automatically add the notice at the top of posts on tech subjects, since I can’t always revisit them and update their content.
Nice one, bruvaaaaa!
God Send, Thanks!
Great to know you found it useful. Thanks for commenting.
Awesome, thanks.
Glad you found it useful. Thanks for reading and commenting.
Thanks for the article, I was also stuck with everything but haven’t thought of SELinux. nice catch!
Though this post is posted 2 years, it still works fine, it helped me to solve my problem.
Thank you very much !! Nice post