Today, the vast majority of web services and sites are hosted in the cloud. By this I mean that, instead of companies (such as Ziff Davis/ExtremeTech) managing their own hardware, third-party cloud storage and computing services are used. Amazon Web Services (AWS), Microsoft Azure, and Google are three prominent examples of huge cloud clusters, but there are hundreds of smaller operations that range in size from a whole data center down to a few racks.
The power of the cloud is vested in the fact that it can be coerced and shoehorned into tasks as disparate as a cloud-based supercomputer, to webmail, to simple document storage. On a single cloud cluster, Google can host and serve petabytes of YouTube videos and store all of your email and documents. Of all the facets of the cloud, though, today we’re going to focus on cloud storage.
While storage might not be as sexy as terabytes of RAM and thousands of CPU cores, it is the most reliable way of measuring the size of the cloud, especially when we factor in bandwidth usage. From the total amount of storage we can also work out the cost of cloud storage — and from there, we can finally work out why the likes of Google, Microsoft, and Dropbox are falling over themselves to provide cloud storage services.
Like the porn story, we’ll first start with some theoretical numbers, and then move onto some real-world figures (and hardware) from Backblaze, a cloud backup provider.
PetabytesFor the most part, real numbers from the big companies, such as Google, Facebook, Amazon, and Microsoft, are few and far between. If you scour the web, though, some rough ballpark figures emerge:
- Facebook, in its IPO filing, said it stores over 100 petabytes (PB) of media (photos and videos). It’s not unrealistic to say that Facebook probably has a total storage of capacity well beyond that, once you factor in backups and other data (status updates, likes, and so on), possibly in the 300PB range.
- Microsoft recently admitted that Hotmail stores over 100 petabytes, and that SkyDrive, with “17 million customers,” stores 10PB of data. Like Facebook, Microsoft’s total capacity, once we factor in the rest of Azure and its web properties, is probably well over 300 petabytes.
- Megaupload is relatively tiny in comparison, apparently storing just 25 petabytes.
- Amazon, rather than giving us a nice, easy number of petabytes, instead announces the total number of objects stored by its S3 cloud storage service. As of April 2012, Amazon S3 stored 905 billion objects. If we assume an average size of 100KB, that’s around 90 petabytes; if the average size is 1MB, that’s 900 petabytes — almost an exabyte!
- Dropbox, a year ago, stored “10+ petabytes” of data. It had 25 million users then, and 100 million users today, so all things being equal the company now stores around 40PB of data.
BandwidthBandwidth-wise, we have even less data from the big boys. We know that, as of last year, one million files were being saved every five minutes — so today, with four times as many users, that’s 800,000 files per minute. Amazon S3, which is significantly larger than Dropbox, handles “650,000 requests per second.”
If we assume that the average file stored on Dropbox is 500KB (a mix of photos, videos, and documents) then Dropbox stores a total of 400,000 megabytes (0.4TB) per minute — or 6.7GB per second (54Gbps). We don’t have any data on how much data Dropbox sends per minute (i.e. people downloading files from their Dropbox), but it’s probably in the region of 10 to 20Gbps.
Amazon S3, which is mainly used to store static files for websites (images, style sheets, videos), probably has a lower average file size than Dropbox. If we assume an average size of 100KB per file, then 650,000 requests per second comes to a grand total of 61 gigabytes of data transferred per second, or 488Gbps. This is very close to the 800Gbps figure that we estimated for a large porn site, which equates to around 2% of total internet traffic — Amazon is pretty darn big!
Facebook and Microsoft, with between 100 and 300PB of storage each, probably fall somewhere between Dropbox and Amazon in terms of bandwidth usage — maybe 200Gbps a piece.