When Amazon released Amazon S3, I argued that it was priced below cost at $1.80/GB/year. At that time, my estimate of their cost was $2.50/GB/year. The Amazon charge of $1.80/GB/year for data to be stored twice in each of two data centers is impressive. It was amazing when it was released and it remains an impressive value today.
Even though the storage price was originally below cost by my measure, Amazon could still make money if they were running a super-efficient operation (likely the case). How could they make money charging less than cost for storage? Customers are charged for ingress/egress on all data entering or leaving the AWS cloud. The network ingress/egress charged by AWS are reasonable, but telecom pricing strongly rewards volume purchases, so what Amazon pays is likely much less than the AWS ingress/egress charges. This potentially allows the storage business to be profitable even when operating at a storage cost loss.
One concern I’ve often heard is the need to model the networking costs between the data centers since there are actually two redundant copies stored in two independent data centers. Networking, like power, is usually billed at the 95 percentile over a given period. The period is usually a month but more complex billing systems exist. The constant across most of these high-scale billing systems is that the charge is based upon peaks. What that means is adding ingress or egress at an off peak time is essentially free. Assuming peaks are short-lived, the sync to the other data center can be delayed until the peak has passed. If the SLA doesn’t have a hard deadline on when the sync will complete (it doesn’t), then the inter-DC bandwidth is effectively without cost. I call this technique Resource Consumption Shaping and it’s one of my favorite high-scale service cost savers.
What is the cost of storage today in an efficient, commodity bulk-storage service? Building upon the models in the cost of power in large-scale data centers and the annual fully burdened cost of power, here’s the model I use for cold storage with current data points
Neat model.