Archive for the ‘General Web Dev’ Category

Getting rid of 300ms click delay on iOS web views.

Posted on October 12th, 2012 in General Web Dev, mobile apps | No Comments »

While this technique shouldn’t be news to anybody that’s been doing mobile web development in some capacity, still I find myself periodically searching for an article (and this is the best one) again as a refresher when recreating a solution for whatever web application I am working on. One can use existing libraries that’s implemented this concept like Fastclick but I find using JavaScript frameworks like Backbone require something more custom:

Creating Fast Buttons for Mobile Web Applications

Great post on Imgur’s stack.

Posted on August 14th, 2012 in General Web Dev | No Comments »

Imgur pretty much powers all the images linked on Reddit. It’s creator did a fascinating Q&A on Reddit that gave it’s current stats and server stack. It’s surprisingly similar to ours, and confirms a few things we already new were good ideas and should eventually move to, for example using php-fpm with Nginx and dropping bloated Apache altogether, clustering across more than one availability zone, and using Haproxy to load balance. Good stuff, read on!

Site stats in the past 30 days according to Google Analytics

Visits: 205,670,059
Unique Visitors: 45,046,495
Pageviews: 2,313,286,251
Pages / Visit: 11.25
Avg. Visit Duration: 00:11:14
Bounce Rate: 35.31%
% New Visits: 17.05%
Infrastructure stats over the past 30 days according to our own data and our CDN:
Data Transferred: 4.10 PB
Uploaded Images: 20,518,559
Image Views: 33,333,452,172
Average Image Size: 198.84 KB

Server stack

It’s actually fairly complex now, but I will attempt to do it all from memory.
Backround info: Imgur is on Amazon AWS and we use Edgecast as a CDN.
Everything is grouped into clusters depending on the job. There are load balancing, uploading, www, api, image serving, searching, memcached, redis, mysql, map reduce, and cron clusters. Each one of these clusters has at least two instances, each one on it’s own availability zone. However, most have more than two instances because of the load.
A typical request goes to a load balancer which run nginx and haproxy. The request first hits nginx, and if there’s a cached version of the page (each page is cached for 5 seconds unless you’re logged in) then it will serve that out. If not then the request goes over to haproxy and it will determine which cluster to send it to, in this case, the www cluster. This cluster runs nginx and php-fpm, and is hooked up to the memcached, redis, and mysql clusters. Php-fpm will handle it if it’s a php page. If the request needs info from mysql, then it will check if the query exists in memcached. If not, then mysql will send the data back and immediately cache it into memcached. If the request is for an image page, and we need the amount of times the image was viewed, then it grabs that info from redis. The request then goes back out of php-fpm, through nginx on the www server, and back into the load balancer where it will most likely be cached by nginx, and then out to the user.
Most of the clusters use c1.xlarge instances. The upload cluster handles all uploads and image processing requests, like thumbnails and resizing, and each instance is a huge cluster instance, cc1.4xlarge.
All image requests go through the CDN, and if they’re cached, then they just go right back out of the CDN to the user. If it’s not cached then the CDN gets the image from the image serving cluster and caches it for all additional requests.

Current stuff they’re dealing with right now:

Scaling the site has always been a challenge, but we’re starting to get really good at it. There’s layers and layers of caching and failover servers, and the site has been really stable and fast the past few weeks. Maintenance and running around with our hair on fire is quickly becoming a thing of the past. I used to get alerts randomly in the middle of the night about a database crash or something, which made night life extremely difficult, but this hasn’t happened in a long time and I sleep much better now.
Matt has been really awesome at getting quality advertisers, but since Imgur is a user generated content site, advertisers are always a little hesitant to work with us because their ad could theoretically turn up next to porn. In order to help with this we’re working with some companies to help sort the content into categories and only advertise on images that are brand safe. That’s why you’ve probably been seeing a lot of Imgur ads for pro accounts next to NSFW content.
For some reason Facebook likes matter to people. With all of our pageviews and unique visitors, we only have 35k “likes”, and people don’t take Imgur seriously because of it. It’s ridiculous, but that’s the world we live in now. I hate shoving likes down people’s throats, so Imgur will remain very non-obtrusive with stuff like this, even if it hurts us a little. However, it would be pretty awesome if you could help: [11]

Full Q&A here.

Update: And here’s a description of Reddit’s stack: one and two. Surprisingly also on EC2!

Amazon S3 vs Amazon Cloudfront

Posted on July 9th, 2012 in General Web Dev, performance | 3 Comments »

Amazon’s S3 (Simple Storage System) is a decently fast, cheap, and reliable storage “in the cloud”. We host gigs and gigs there for practically a few dollars a month. Downloads are fairly speedy, too, though I’ve noticed the latency (time to first byte) of a file can vary from 120ms-400ms and even upwards of 800ms on our occasionally slow wireless connection at work. I decided to give Amazon’s < Cloudfront a try and initial tests show it’s crazy fast:

Charles 3.6.5 - Session 1.jpg @ 100% (Layer 1, RGB/8*) *

This is at home on my fast connection, but even at work I’ve seen Cloudfront’s latency as little as 18ms! It’s a bit pricer than S3 (at about $0.12 per downloaded gigabyte compared to S3′s $0.125 per stored GB—all other S3 charges are practically insignificant) but I will give it a try.

HTTP latency is the great killer of mobile apps, which is much higher on desktops, but even on the latter a few 100 milliseconds of delay from a handful of static assets can seriously effect the usability of your application.

Nginx, the non-blocking model, and why Apache sucks

Posted on July 8th, 2012 in Apache, General Web Dev, nginx | 9 Comments »

Note, this blog entry and the link I share at the end are kind of long but highly worthwhile, especially if you want to become more familiar with the concept of an event-based processing model versus a prefork-based one and its benefits. This is applicable to both web servers and programming languages (Apache versus Nginx, PHP versus Node, etc.).

A recent personal story

Why high-concurrency is important

Within the span of two weeks, our little application was featured on both Techcrunch and Mashable. Our traffic ballooned to about 700%, and though our highly-optimized main application server didn’t even break a sweat, our corporate blog server went down for the count. With a basic WordPress setup on a micro EC2 Amazon instance (613mb virtual ram), it was running Apache with standard optimizations of caching headers and gzip.

By the time I had noticed, our blog server had already been running at 100% CPU for a while–yep, should have set an alert–and was very sluggish to browse. Restarting httpd and mysqld only momentarily alleviated the problem, as it instantly shot back up to 100% CPU with the load of a few hundred visitors banging down the door. My first split-second thought was to quickly migrate to a more powerful EC2 instance, one with more memory and CPU. It would be relatively easy and only take ten minutes to execute. After all, that’s what AWS was meant for, right? I knew that wasn’t the best answer, as AWS can become pricey, and one shouldn’t just throw more money at a problem. Our blog server should be able to withstand a spike of a few hundred requests on its current stack; otherwise, we’d pay for an unnecessarily large EC2 instance at the end of every month.

The core problem stemmed from Apache being machine-gunned with HTTP requests, five to ten from each user for things like the PHP page, images, JS, and CSS. These files use correct expires headers, so at most, users would request each file only once. However, there was a consistent flow of new visitors to keep the load high. I had been planning to move those assets to the CDN, but being a small company, we were all busy working on the actual application.

So what did I do? In short, I alleviated the issue in less than five minutes by installing and setting up Nginx on Port 80. It now intercepts and serves all static content and reverse proxies to Apache on a different port, for only the actual PHP page. As an extra step, I even disabled gzip on Apache and have Nginx do all the work. Despite the continued traffic, load decreased to almost pre-onslaught level even though the onslaught was still happening! That’s how amazing Nginx is.

The event-based model and why it is better than the traditional thread-based model

Before we dive into what makes the event-based model preferable, we need to talk about the problem of the traditional thread-based model used in most web servers and programming languages. This amazing writeup on Nginx internals that started the idea for this post explains it well:

…imagine a simple Apache-based web server which produces a relatively short 100 KB response—a web page with text or an image. It can be merely a fraction of a second to generate or retrieve this page, but it takes 10 seconds to transmit it to a client with a bandwidth of 80 kbps (10 KB/s). Essentially, the web server would relatively quickly pull 100 KB of content, and then it would be busy for 10 seconds slowly sending this content to the client before freeing its connection. Now imagine that you have 1,000 simultaneously connected clients who have requested similar content. If only 1 MB of additional memory is allocated per client, it would result in 1000 MB (about 1 GB) of extra memory devoted to serving just 1000 clients 100 KB of content. In reality, a typical web server based on Apache commonly allocates more than 1 MB of additional memory per connection, and regrettably tens of kbps is still often the effective speed of mobile communications.

Apache forking 240mb processes under load.

This is just one common scenario of low-bandwith devices where Apache or traditional programming becomes the bottleneck. Other scenarios are when threads are waiting for a DB query or accepting a file load from a user. Forget even trying to run things like web sockets with persistent connections in this scenario! In most of these situations, processes or threads are spun up where they spend most of their time just waiting for something else to finish. They are essentially blocked. (This is where the term “non-blocking” comes from when referencing Nginx or Node.)

Martin Fjordvald goes into better detail on how Apache works in his blog entry, a section of, which I display here:

Apache Prefork Processes:

  1. Receive PHP request, send it to a process.
  2. Process receives the request and pass it to PHP.
  3. Receive an image request, see process is busy.
  4. Process finishes PHP request, returns output.
  5. Process gets image requests and returns the image.

While handling the request, the process is not capable of serving another request. This means the number of requests you can serve simultaneously is directly proportional to the number of processes you have running. Now, if a process took up just a small bit of memory, that would not be too big of an issue, as you could run a lot of processes. However, the typical Apache + PHP setup has the PHP binary embedded directly into the Apache processes. This means Apache can talk to PHP incredibly quickly and without much overhead, but it also means that the Apache process is going to be 25-50MB in size. Not just for requests for PHP requests but also all static file requests. This is because the processes keep PHP embedded at all times due to cost of spawning new processes. This effectively means you will be limited by the amount of memory you have as you can only run a small amount of processes, and a lot of image requests can quickly make you hit your processes quota.

Note the bolded words above is exactly what caused our WordPress to go down. It wasn’t an overabundance of MySQL requests but all those assets.

Nginx, on the other hand, was built from the ground up in C to be non-blocking with the Reactor pattern. Martin’s describes Nginx’s event-based processing as such:

Nginx Event Based Processing:

  1. Receive request, trigger events in a process.
  2. The process handles all the events and returns the output

On the surface it seems fairly similar, except there’s no blocking. This is because the process handles events in parallel. One connection is not allowed to affect another connection even if run simultaneously. This adds some limitations to how you can program the web server, but it makes for far faster processing as one process can now handle tons of simultaneous requests.

So there you have it. I’d been meaning to write this post for a while, and I’m glad to be done! Though Node is the hottest thing on the block right now, its concept of building on the event-loop is not new. Nginx has been doing this since the ’90s. Back then, the Nginx creator was trying to solve the C10k or how to support 10,000 connections on one server. He knew the threaded model wasn’t it.

I know that at least one person will comment to solve my WordPress issue with “why not just run PHP through Nginx”. It’s possible but wouldn’t solve the problem, as PHP wasn’t written in a non-blocking manner. Should I switch to something like Node for the blog I wouldn’t even need Nginx, but WordPress is so damn easy to install and use for our less technical business and marketers that use it. Hence our situation is solved for now.

Related links:

Response Times: The 3 Important Limits

Posted on October 22nd, 2011 in General Web Dev | No Comments »

Regardless of new technology and devices the basic advice regarding (perceived) response time has been the same for forty years in counting. I find that I’m always having to google around to find this link when the subject pops up so I’ll post the basics here, and link to the rest if you’re interested in reading more.

  • 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
  • 1.0 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
  • 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.

From the book Usability Engineering (1993). More info here.