Archive for the ‘nginx’ Category

Rate limiting your WordPress login from wannabe hackers

Posted on May 1st, 2013 in nginx | No Comments »

As your blog get’s popular you get a lot of people trying to hack it. Especially if it’s on Amazon Cloud. If you’re running WordPress and not already running Nginx as a reverse proxy, you should. It makes it hella fast and a lot more scalable, especially with Nginx Proxy Cache Integrator. With it, a small Amazon EC2 instance can withstand Techcrunch and Mashable hits—I know because we do it all the time on our corporate blog.

Security-wise, you can move your SSH port, rely on key-based login only, etc. but nothing prevents script kiddies from running a brute-force dictionary attack on your WordPress login page. Even if the attempt is fruitless, it can create unnecessarily load. Rate limit just the login page with Nginx to solve the issue:

http {
   limit_req_zone  $binary_remote_addr  zone=one:10m   rate=5r/m;

   server {
       proxy_cache_valid 200 20m;
       listen       80;
       server_name  site.com www.site.com;

           location ~* wp\-login\.php {
               limit_req   zone=one  burst=1 nodelay;
               proxy_pass http://127.0.0.1:8080;
           }
   }
}

The above limits the user to one login request every 12 seconds and resets every 10 minutes. Note that this does not affect any other website calls. Be sure to use the nodelay flag to send a 503 “Service Temporarily Unavailable” response instead of just slowing down the user’s calls after the limit is reached.

Nginx, the non-blocking model, and why Apache sucks

Posted on July 8th, 2012 in Apache, General Web Dev, nginx | 9 Comments »

Note, this blog entry and the link I share at the end are kind of long but highly worthwhile, especially if you want to become more familiar with the concept of an event-based processing model versus a prefork-based one and its benefits. This is applicable to both web servers and programming languages (Apache versus Nginx, PHP versus Node, etc.).

A recent personal story

Why high-concurrency is important

Within the span of two weeks, our little application was featured on both Techcrunch and Mashable. Our traffic ballooned to about 700%, and though our highly-optimized main application server didn’t even break a sweat, our corporate blog server went down for the count. With a basic WordPress setup on a micro EC2 Amazon instance (613mb virtual ram), it was running Apache with standard optimizations of caching headers and gzip.

By the time I had noticed, our blog server had already been running at 100% CPU for a while–yep, should have set an alert–and was very sluggish to browse. Restarting httpd and mysqld only momentarily alleviated the problem, as it instantly shot back up to 100% CPU with the load of a few hundred visitors banging down the door. My first split-second thought was to quickly migrate to a more powerful EC2 instance, one with more memory and CPU. It would be relatively easy and only take ten minutes to execute. After all, that’s what AWS was meant for, right? I knew that wasn’t the best answer, as AWS can become pricey, and one shouldn’t just throw more money at a problem. Our blog server should be able to withstand a spike of a few hundred requests on its current stack; otherwise, we’d pay for an unnecessarily large EC2 instance at the end of every month.

The core problem stemmed from Apache being machine-gunned with HTTP requests, five to ten from each user for things like the PHP page, images, JS, and CSS. These files use correct expires headers, so at most, users would request each file only once. However, there was a consistent flow of new visitors to keep the load high. I had been planning to move those assets to the CDN, but being a small company, we were all busy working on the actual application.

So what did I do? In short, I alleviated the issue in less than five minutes by installing and setting up Nginx on Port 80. It now intercepts and serves all static content and reverse proxies to Apache on a different port, for only the actual PHP page. As an extra step, I even disabled gzip on Apache and have Nginx do all the work. Despite the continued traffic, load decreased to almost pre-onslaught level even though the onslaught was still happening! That’s how amazing Nginx is.

The event-based model and why it is better than the traditional thread-based model

Before we dive into what makes the event-based model preferable, we need to talk about the problem of the traditional thread-based model used in most web servers and programming languages. This amazing writeup on Nginx internals that started the idea for this post explains it well:

…imagine a simple Apache-based web server which produces a relatively short 100 KB response—a web page with text or an image. It can be merely a fraction of a second to generate or retrieve this page, but it takes 10 seconds to transmit it to a client with a bandwidth of 80 kbps (10 KB/s). Essentially, the web server would relatively quickly pull 100 KB of content, and then it would be busy for 10 seconds slowly sending this content to the client before freeing its connection. Now imagine that you have 1,000 simultaneously connected clients who have requested similar content. If only 1 MB of additional memory is allocated per client, it would result in 1000 MB (about 1 GB) of extra memory devoted to serving just 1000 clients 100 KB of content. In reality, a typical web server based on Apache commonly allocates more than 1 MB of additional memory per connection, and regrettably tens of kbps is still often the effective speed of mobile communications.

Apache forking 240mb processes under load.

This is just one common scenario of low-bandwith devices where Apache or traditional programming becomes the bottleneck. Other scenarios are when threads are waiting for a DB query or accepting a file load from a user. Forget even trying to run things like web sockets with persistent connections in this scenario! In most of these situations, processes or threads are spun up where they spend most of their time just waiting for something else to finish. They are essentially blocked. (This is where the term “non-blocking” comes from when referencing Nginx or Node.)

Martin Fjordvald goes into better detail on how Apache works in his blog entry, a section of, which I display here:

Apache Prefork Processes:

  1. Receive PHP request, send it to a process.
  2. Process receives the request and pass it to PHP.
  3. Receive an image request, see process is busy.
  4. Process finishes PHP request, returns output.
  5. Process gets image requests and returns the image.

While handling the request, the process is not capable of serving another request. This means the number of requests you can serve simultaneously is directly proportional to the number of processes you have running. Now, if a process took up just a small bit of memory, that would not be too big of an issue, as you could run a lot of processes. However, the typical Apache + PHP setup has the PHP binary embedded directly into the Apache processes. This means Apache can talk to PHP incredibly quickly and without much overhead, but it also means that the Apache process is going to be 25-50MB in size. Not just for requests for PHP requests but also all static file requests. This is because the processes keep PHP embedded at all times due to cost of spawning new processes. This effectively means you will be limited by the amount of memory you have as you can only run a small amount of processes, and a lot of image requests can quickly make you hit your processes quota.

Note the bolded words above is exactly what caused our WordPress to go down. It wasn’t an overabundance of MySQL requests but all those assets.

Nginx, on the other hand, was built from the ground up in C to be non-blocking with the Reactor pattern. Martin’s describes Nginx’s event-based processing as such:

Nginx Event Based Processing:

  1. Receive request, trigger events in a process.
  2. The process handles all the events and returns the output

On the surface it seems fairly similar, except there’s no blocking. This is because the process handles events in parallel. One connection is not allowed to affect another connection even if run simultaneously. This adds some limitations to how you can program the web server, but it makes for far faster processing as one process can now handle tons of simultaneous requests.

So there you have it. I’d been meaning to write this post for a while, and I’m glad to be done! Though Node is the hottest thing on the block right now, its concept of building on the event-loop is not new. Nginx has been doing this since the ’90s. Back then, the Nginx creator was trying to solve the C10k or how to support 10,000 connections on one server. He knew the threaded model wasn’t it.

I know that at least one person will comment to solve my WordPress issue with “why not just run PHP through Nginx”. It’s possible but wouldn’t solve the problem, as PHP wasn’t written in a non-blocking manner. Should I switch to something like Node for the blog I wouldn’t even need Nginx, but WordPress is so damn easy to install and use for our less technical business and marketers that use it. Hence our situation is solved for now.

Related links:

Nginx and Apache rewrites to support HTML5 Pushstate in pure Backbone.js (or other JS MV*) Application.

Posted on May 17th, 2012 in Apache, backbone, nginx | 4 Comments »

HTML5 pushstate is awesome. It enables you to change the URL of your site dynamically without refreshing the page (goodbye hashes!). Libraries like Backbone have great support for this. Unfortunately if a user bookmarks or refresh a page on an app that’s using HTML5 pushstate, it makes a request to the server for that deep linked content. Here are the rewrites for Nginx and Apache to internally redirect that call to the same html file. Browser thinks its a unique page but it’s the same.

Apache

In your vhost :

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.html$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.html [L]
 </IfModule>

Nginx

    rewrite ^(.+)$ /index.html last;

Note that once you have this in place your server no longer reports 400 errors as all requests pull up the index page. To work around this you can create a 404 in a Backbone route:

  routes: {
    // Other routes
    "*path"  : "notFound"
  },
 
  notFound: function(path) {
    // Load 404 template, probably of a cute animal.
  }

405 Not Allowed on Facebook Canvas Page

Posted on May 11th, 2012 in facebook, nginx | 2 Comments »

I enabled Facebook Canvas for a responsive web app we’re building an noticed despite meeting the SSL requirement and hitting the correct page, that Nginx was returning a 405 Not Allowed.

Turns out Facebook makes a POST request to your HTML page and you need to allow this. Here’s the Nginx code:

location / {
    error_page 405 =200 $uri;
    root /var/www/html/yoursite.com;
}

Nginx, Apache, and Node all living harmony.

Posted on July 15th, 2011 in Apache, nginx, node | 3 Comments »

So here’s two problems you want to solve:

You want to optimize static content

You have an Apache install that’s hosting a bunch of sites and friend’s sites through vhosts. One of your blogs is getting a lot of hits and you want to optimize it’s static content – or even the static content of all sites. You’re not quite ready for a CDN-type deal, just to place the content outside of Apache and into something more lighter-weight so it’s not running through WordPress / Apache and unnecessarily using up threads (at roughly 2mb a thread).

You want all your web apps on port 80!

You started really getting into Node (or Ruby on Rails or Django) but every web app needs to be binded to it’s own port and port 80 is taken by your Apache which is hosting a lot. You don’t want to be giving out the url: http://mycoolnewapp.com:81.

Sure you can use Apache’s proxypass but you’re gaining overhead and, in case of Node (or Ruby’s EventMachine or Python’s Twisted), you’re losing the whole point of having an optimized, non-blocking / non-threaded, web app.

The solution

Nginx! Nginx is a web server, reverse-proxy, load balancer, and mail server all in one. Like Node, it was built with the concept of the event loop, not threads so it’s highly optimized for high concurrency. The idea is to setup Nginx to be a reverse-proxy for all your other services.

I’ll skip over how to install Nginx as it’s pretty straightforward and you can google it. I’ll go over the main steps to getting Apache to work through Nginx – it’s truly easy, I did it on my first try. The only problem that I encountered is that since the user only interfaces with Nginx – it’s Nginx that is making the requests to Apache / Node. So from Apache’s perspective, all requests are coming from 127.0.0.1. We also fix this in the steps below.

  1. Once Nginx is installed, edit your /etc/httpd/conf/httpd.conf so that it listens on another port. Say 127.0.0.1:8080.
  2. Switch all your vhosts to also listen on this port.
  3. Edit your /etc/nginx/nginx.conf file. Make sure it’s set to listen on port 80 and add a server entry per each apache vhost. Here’s an example:
server {
           listen       80;
           server_name  mysite.com;

       location / {
                    proxy_pass http://127.0.0.1:8080;
                    include /etc/nginx/conf.d/proxy.conf;
           }
    }
  1. Next we’ll add that proxy.conf reference by creating /etc/nginx/conf.d/proxy.conf:
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_max_body_size 8m;
client_body_buffer_size 256k;
proxy_connect_timeout 60;
proxy_send_timeout 60;
proxy_read_timeout 60;
proxy_buffer_size 4k;
proxy_buffers 32 256k;
proxy_busy_buffers_size 512k;
proxy_temp_file_write_size 256k;
  1. At his point you need to install mod-rpaf for Apache. This enable Apache to use the extra headers Nginx is passing in the request. If you’re using a flavor of Linux that uses apt-get you’re in luck. Just run: sudo apt-get install libapache2-mod-rpaf. If you’re using a system that uses Yum, you’ll have to compile it yourself. Just follow the steps here.

As for serving static content. Inside each “server” declaration you can add the following (modified to your taste):

location ~* ^.+\.(jpg|jpeg|gif|png|ico|tgz|gz|pdf|rar|bz2|exe|ppt|txt|tar|mid|midi|wav|bmp|rtf) {
            root /folder/to/static/files;
            expires 90d;
       }
       location ~* ^.+\.(css|js)$ {
            root /folder/to/static/files;
            expires 30d;
      }

Ad that’s it you’re done!

Additional reading

Nginx Primer

Nginx Primer 2: From Apache to Nginx

Apache with Nginx

Really solid config samples