Imgur pretty much powers all the images linked on Reddit. It’s creator did a fascinating Q&A on Reddit that gave it’s current stats and server stack. It’s surprisingly similar to ours, and confirms a few things we already new were good ideas and should eventually move to, for example using php-fpm with Nginx and dropping bloated Apache altogether, clustering across more than one availability zone, and using Haproxy to load balance. Good stuff, read on!
Site stats in the past 30 days according to Google Analytics
Unique Visitors: 45,046,495
Pages / Visit: 11.25
Avg. Visit Duration: 00:11:14
Bounce Rate: 35.31%
% New Visits: 17.05%
Infrastructure stats over the past 30 days according to our own data and our CDN:
Data Transferred: 4.10 PB
Uploaded Images: 20,518,559
Image Views: 33,333,452,172
Average Image Size: 198.84 KB
It’s actually fairly complex now, but I will attempt to do it all from memory.
Backround info: Imgur is on Amazon AWS and we use Edgecast as a CDN.
Everything is grouped into clusters depending on the job. There are load balancing, uploading, www, api, image serving, searching, memcached, redis, mysql, map reduce, and cron clusters. Each one of these clusters has at least two instances, each one on it’s own availability zone. However, most have more than two instances because of the load.
A typical imgur.com request goes to a load balancer which run nginx and haproxy. The request first hits nginx, and if there’s a cached version of the page (each page is cached for 5 seconds unless you’re logged in) then it will serve that out. If not then the request goes over to haproxy and it will determine which cluster to send it to, in this case, the www cluster. This cluster runs nginx and php-fpm, and is hooked up to the memcached, redis, and mysql clusters. Php-fpm will handle it if it’s a php page. If the request needs info from mysql, then it will check if the query exists in memcached. If not, then mysql will send the data back and immediately cache it into memcached. If the request is for an image page, and we need the amount of times the image was viewed, then it grabs that info from redis. The request then goes back out of php-fpm, through nginx on the www server, and back into the load balancer where it will most likely be cached by nginx, and then out to the user.
Most of the clusters use c1.xlarge instances. The upload cluster handles all uploads and image processing requests, like thumbnails and resizing, and each instance is a huge cluster instance, cc1.4xlarge.
All image requests go through the CDN, and if they’re cached, then they just go right back out of the CDN to the user. If it’s not cached then the CDN gets the image from the image serving cluster and caches it for all additional requests.
Current stuff they’re dealing with right now:
Scaling the site has always been a challenge, but we’re starting to get really good at it. There’s layers and layers of caching and failover servers, and the site has been really stable and fast the past few weeks. Maintenance and running around with our hair on fire is quickly becoming a thing of the past. I used to get alerts randomly in the middle of the night about a database crash or something, which made night life extremely difficult, but this hasn’t happened in a long time and I sleep much better now.
Matt has been really awesome at getting quality advertisers, but since Imgur is a user generated content site, advertisers are always a little hesitant to work with us because their ad could theoretically turn up next to porn. In order to help with this we’re working with some companies to help sort the content into categories and only advertise on images that are brand safe. That’s why you’ve probably been seeing a lot of Imgur ads for pro accounts next to NSFW content.
For some reason Facebook likes matter to people. With all of our pageviews and unique visitors, we only have 35k “likes”, and people don’t take Imgur seriously because of it. It’s ridiculous, but that’s the world we live in now. I hate shoving likes down people’s throats, so Imgur will remain very non-obtrusive with stuff like this, even if it hurts us a little. However, it would be pretty awesome if you could help:  https://www.facebook.com/pages/Imgur/67691197470
Full Q&A here.