Tumblr Architecture

An interview with Blake Matheny on High Scalability:

Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.

Pretty interesting to see their architecture, since I’ve been thinking about things like this a bit recently (though at much smaller scales). I was surprised to learn that the data set of “which posts should be on a users dashboard” (which just stores the post_ids) is 5x the size of the actual post contents. I would have believed that for something with very small post sizes (such as Twitter), but I always viewed tumblr as having longer posts. But, I suppose there are lots of post (the majority even?) that are simply a photo or a reblog of a photo with a 1 or 2 word comment…