首页 > 互联网技术 > nginx + memcached + apache = light and fast web server setup

nginx + memcached + apache = light and fast web server setup

2008年10月3日 849 views 发表评论 阅读评论

写得不错.

nginx + memcached + apache = light and fast web server setup

For a fair while now I’ve been hearing about people using nginx as a reverse proxy in front of apache or mongrel instances.  Why would you do that?  Primarily because nginx is a small and very efficient web server.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      8590  0.0  0.0   4564   676 ?        Ss   18:06   0:00 nginx: master process /usr/sbin/nginx
www-data  8591  0.0  0.0   4872  1484 ?        S    18:06   0:00 nginx: worker process
www-data  8592  0.0  0.0   4768  1080 ?        S    18:06   0:00 nginx: worker process
Compare that to:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
pgraydon  4241  0.0  0.0   3336   800 pts/2    S+   18:37   0:00 grep apache
root      7988  0.0  0.2  21508  6092 ?        Ss   16:56   0:00 /usr/sbin/apache2 -k start
www-data  7991  0.0  0.2  21588  4820 ?        S    16:56   0:00 /usr/sbin/apache2 -k start
www-data  7992  0.0  0.2  21564  4800 ?        S    16:56   0:00 /usr/sbin/apache2 -k start
www-data  7996  0.0  0.2  21564  4800 ?        S    16:56   0:00 /usr/sbin/apache2 -k start
www-data  7997  0.0  0.2  21588  4820 ?        S    16:56   0:00 /usr/sbin/apache2 -k start
www-data  7998  0.0  0.2  21588  4820 ?        S    16:56   0:00 /usr/sbin/apache2 -k start

In it’s still admittedly pretty default Ubuntu install state Apache is reserving just over 21Mb of memory and actually using around 4Mb.  Contrast that with nginx’s 4.5Mb of reserved memory and just 1Mb of actual usage.  If you consider new instances spawning as the site scales up in traffic it’s fairly clear to see that apache2 is going to tax your server more than nginx will.

If you want to use dynamic content though, that’s where things tend to fail a little bit with nginx.  It’s generally slower, and your only option (currently) is to use FastCGI as an interface, a lovely module I know a good number of devs swear about vociferously, and contrary to it’s name isn’t necessarily fast.

Apache on the other hand can interact nicely with any number of languages and handles them very well, much better than nginx.

So we’ve got two programs, each with their own respective strengths and weaknesses.  Luckily for us it’s a fairly simple matter to leverage both programs together to negate those weaknesses and take advantages of their respective strengths.

Read on for more..

But what if we could take it one step further and start caching content in RAM too?  You can buy extremely fast disks these days if you’re prepared to fork out a fair bit of money, but none will be faster than accessing content from local memory.  If you’re prepared to put a bit of effort in you can alter your scripts to use memcached too, checking against it first before going to the database for example; but even if you don’t want the effort of doing that you can use memcached to cache static content in memory, like some images, your CSS file or similar.  Instead of having to go to disk all the time for content nginx can doing a quick check against memcached and fetch from the appropriate source.

So how do we set about doing this?  The outline is that you set up nginx to listen on port 80, the standard port for all web traffic, and port 443 if you’re doing https (SSL) work, so that nginx is the first thing to see web traffic.

Then you set up memcached and apache, telling apache to listen to a different port and only to local requests, e.g. bind it to 127.0.0.1:8080.  Then all you have to do is tell nginx to pass certain content to 8080, and certain content to memcached first.  Here’s a rough ASCII art diagram of what we’re doing:

                          ---------
                         |         |
                         |memcached|
                         |         |
                        / ---------
                -------
               |       |  ---------
Web Traffic -->| nginx |—| content |
               |       |  ---------
                -------       |
                        \ ---------   ------------
                         |         | |            |
                         | Apache2 |—|  Database  |
                         |         | |            |
                          ---------   ------------

What is really cool about this all, is that nginx, apache and the database can all be running on different servers, and multiples thereof.  You could have any web server or DB running behind nginx, in fact, such as IIS on Windows, or db2 or whatever.  This way you only have to scale up what needs to be scaled up, plus it presents strong arguments for making your site as efficient as possible, encouraging the developer to think seriously as to what content actually needs to be dynamic generated and what can be static.  Not only does this set up mean your current server usage is more efficient, it makes it extremely easy to scale without having to re-engineer your whole solution.

As a side tip: It’s worth remembering that you can always use a scheduled script to generate static content. Unless your page really needs to be 100%, to that micro-second up to date, or provides user specific content, you can reduce processor needs by generating static content.  If you do need to generate the page dynamically every time, think about all the sections of the page.  If there are any that are constant for all users, consider doing a scheduled generation of code and doing an “include” instead of querying the database every time someone accesses the overall page.

Remember, your page can be as fancy as you like, but if it is too slow to load you’ll lose people.  One of the reasons I’m convinced Google is winning the search engine war is their minimalist and extremely fast approach.  When all the other search engines were filling pages with news, Google kept it simple with a logo, a text box and two buttons.

Okay, reeling it all back in, how easy is it to set up such a solution?  Very!

There are packages available for most major distributions that will save you compiling from source.  I did the install on my Ubuntu box, but if you’re aiming for production server I’d be inclined to suggest Debian instead.

First install the required packages:

sudo aptitude install nginx apache2 memcached libapache2-mod-php5 libapache2-mod-rpaf

 

rpaf is there to ensure that apache logs the IP of the remote client rather than the proxy.

Once they’re installed it’s time to configure.

Nginx

First we alter nginx’s main config /etc/nginx/nginx.conf:

user www-data;
worker_processes  2;

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    access_log  /var/log/nginx/access.log;

    sendfile       on;
    tcp_nopush     on;

    keepalive_timeout  65;
    tcp_nodelay        on;

    gzip  on;
    upstream apache  {
        server 127.0.0.1:8080;
    }

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}
Something to note that may need tweaking depending on your purpose is the worker_connections value.  According to the nginx documentation, when you’re running as a reverse proxy:

In a reverse proxy situation, max_clients becomes

max_clients = worker_processes * worker_connections/4

Since a browser opens 2 connections by default to a server and nginx uses the fds (file descriptors) from the same pool to connect to the upstream backend

The other lines worth noting are these:

upstream apache  {
        server 127.0.0.1:8080;
}

You can call the upstream whatever you like, and add as many server lines as you fancy.  In this case our apache instance exists locally, and we’re only using one, but if you have a cluster of web servers you can add them into this section of the file.  You can also weight servers, just tack on “weight = 2” or whatever value you like to the end of the server line like this:

server 127.0.0.1:8080 weight=2;

Remember the name Apache, or whatever you chose, we’ll be using it later. As you can see from that upstream section, if you want to add more dynamic script processing power, you can set up the new server, then just have to add a single extra line to nginx and restart.  It’s that easy.

Inside /etc/nginx/sites-available there is a default file, you can create ones for each virtual host you need or tack them into a single file. I must admit I’m a fan of the split config file idea (except when it comes to Debian/Ubuntu’s atrocious exim4 config.) If you keep your sites in separate files in sites-available, disabling and enabling a site is as easy as creating a symlink in sites-enabled; and the restarting nginx.  That’s saved my bacon on a few occasions when I’ve made mistakes in new config files! So my configured default looks like this.

server {
        listen   80;
        server_name  localhost;
        access_log  /var/log/nginx/localhost.access.log;

        location / {
                root   /var/www/;
                index  index.html index.htm;
        }

        location ~* .(jpg|png|gif|css)$ {
                access_log   off;
                expires      max;
                set $memcached_key $uri;
                memcached_pass     127.0.0.1:11211;
                error_page         404 = /fetch;
        }

        location /fetch {
                internal;
                access_log   off;
                expires      max;
                proxy_pass http://apache;
                break;
        }

       # proxy the PHP scripts to predefined upstream “apache”        #
        location ~ .php$ {
                proxy_pass   http://apache;
        }
}

 

 

It should hopefully be fairly self explanatory where to make changes to suit your domain.  The main interesting bits are the entries that start “location”.

The first location (location /) tells it where to find the root of the website.

The second entry is where we bring memcached into the fold.  For my purposes I decided to use memcached to store images and css files.

nginx uses basic regular expressions to filter content, so with this line we’re looking for any string that has a period mark, followed by either jpg or png or gif or css, occurring at the end of the line.

Should the request match those terms first it tells nginx not to bother logging the request, then it passes the request off to a local memcache instance.  If memcache has the content it will return it to nginx and nginx will be nice and happy.  If memcache fails, nginx sees that as a 404.  Now for a cool bit: We can tell nginx that if it get’s a 404 page to send it to location /fetch instead of failing and serving a 404 to the end user.

The third section is that /fetch.  internal tells nginx to go to the location specified by the headers, otherwise the request would have “/fetch” added on to the front of it.  Again, we’re telling it not to log accesses, and then we’re proxying the request off to be handled.

So with those two sections of the config file, all we’re doing to nginx is saying “If it’s one of these files, check memcache, and if it’s not in there, pass on the request to be dealt with.”

The overhead from such is negligible, and far outweighed by the gains.

The final location entry is for handling php requests.  As mentioned before, nginx isn’t particularly good at dynamic scripts, or at least certainly not as good as Apache is at it.  Some day, maybe, nginx will be able to compete, but for now it’s best to palm off the php handling to apache.

As before, it’s just doing a simple regexp check, then on matches the proxy_pass directive tells it to use the upstream called “apache” that we defined earlier.

Apache2

This one is nice and quick and simple.  Be sure to enable whatever modules you need for apache to process your scripting language of choice.  In my case I’m wanting php5, and Ubuntu comes with an application to help a2enmod (and it’s partner a2dismod)

a2enmod php5 rpaf

Define your sites as you would normally, and then all we need to do with Apache is tell it to bind to 127.0.0.1 on port 8080.  That means apache will only listen to internal requests rather than on an internet accessible port.

To do this you need to change the Listen directive.  Under Ubuntu this is in the file: /etc/apache2/ports.conf, but that may be different on your distro, you can find it by doing a recursive grep like this: grep -r “Listen” *

Alter the line so that it reads:

Listen 127.0.0.1:8080

 

Finally we need to tell rpaf what the proxy addresses are so it knows to deal with them.  Edit the file /etc/apache2/mods-available/rpaf.conf, altering the following line to include all relevant addresses (in this case just 127.0.0.1):

RPAFproxy_ips 127.0.0.1

Then restart apache and you’re almost ready.  All that’s left to do is set up memcached and create a method for populating the cache with content.

Memcached

There really isn’t much to do with memcached post initial install.  It’s default set up is pretty good.  About all you might want to do is turn on a verbose flag for logging, or alter how much memory it can use, and make sure you specify that it is to bind itself to a local port:

-l 127.0.0.1

Restart memcached and all that’s left is to populate it with content.  To the best of my knowledge there is no way to get nginx to do this for you, which is a shame, however there are plenty of ways to put data into cache, use which ever way suits you best.  My method is to knock up a quick perl script and use that to hunt down the content and store it in memory, then have this scheduled in cron.  For this I’m using the integrated module Find::File and from CPAN Cache::Memcached:

#!/usr/bin/perl
# Usual suspects
use strict;
use warnings;
# Additional modules needed
use File::Find;
use Cache::Memcached;

### Create Memcache connection
my $cache = new Cache::Memcached {
        'servers' => [
                'localhost:11211',
        ],
        'compress_threshold' => 10_000,
} or die("Failed to connect to memcache server");

my @content;

### Define root location to search, alter as appropriate
my $dir = "/var/www";

### Go find those files
find(&Wanted, $dir);

### Process the found files
foreach my $file(@content){
        open (my $source,"<$file");
        read $source,my $contents,(stat($file))[7];
        $file =~ s/^$dir//;
        if ($cache->get($file)){
                my $compare = $cache->get($file);
                if ($compare ne $contents){
                        print "Cached file $file is not up to date, updating cache";
                        $cache->delete($file);
                        $cache->set($file,$contents);
                }
        } else {
                $cache->set($file,$contents)
        }
}

sub Wanted
{
        # only operate on image files
        /.jpg$/ or /.gif$/ or /.png$/ or /.css$/ or return;
        push (@content,$File::Find::name);
}

You can drop that print statement if you like.  If you leave it in there you’ll get an e-mail sent to the cron user every time content is updated.

Restart nginx, apache and memcached, trigger the memcache script, and (hopefully) everything should be working fine in your new nice and efficient web server.

 » 如果喜欢可以: 点此订阅本站
  1. 本文目前尚无任何评论.
  1. 本文目前尚无任何 trackbacks 和 pingbacks.