Facebook prepares to use 'cold storage' to deal with vast amounts of data
Facebook is rethinking the way it stores data to cope with the 7 petabytes of new photos the social network's users upload every month. As the number of photos grows, Facebook needs find cheaper, less power-hungry ways to store them all, according to the company's vice-president of infrastructure engineering.
Users upload about 300 million photos a day, more on special occasions, Facebook's Jay Parikh told the Structure Europe conference in Amsterdam Wednesday. "Halloween is one of our biggest photo upload days of the year. We will get somewhere between probably 1 and 2 billion photos uploaded just in a single day," he said.
Photos like the ones taken at Halloween soon lose their interest, with no one looking at them after a few days or weeks, but "Our contract with our users is that we can't delete the data when it is not accessed, we have to keep it," he said. That led to the idea of putting the photos into a sort of "cold storage," Parikh said.
To do that, Facebook plans to build a new data center with different types of storage, server hardware, and network equipment that consumes less power and costs less than existing data centers—all without changing servers response times, he said.
But how efficient can Facebook make its cold storage? When costs and power consumption in data centers are lowered, this usually happens at the expense of access speeds.
Storing data on tapes, for instance, lowers power consumption but severely slows down data access.
Amazon Web Services is following a middle path with its Glacier cloud storage service, which it pitches as an alternative to tape. The service is optimized for data that is infrequently accessed and for which retrieval times of several hours are acceptable.
That's much too slow for Facebook, according to Parikh. "I can't have a photo that you go access from five or ten years ago, and for me to show up a banner to the user that says: 'Hey, why don't you try again in 24 hours?' It's still got to be relatively real time," he said.
Lower power needs
Most data centers that are used today are optimized to use a lot of power to deal with tasks that need big computing power. The "cold storage" technology Facebook is thinking of is at the other extreme, said Parikh. "You need lots and lots of space but you don't need as much power," he said, adding that everything about the data center needs to be rethought to handle the problem at the scale Facebook faces.
At a high level, Facebook is working on software that will figure out how and where to store a piece of content in the infrastructure when it ages, said Parikh. "That will mean that the copies of the data will move around over time and utilize the different pieces of infrastructure that we will have optimized for the age of content." Some of the inventions in the software layer will allow Facebook to still respond quickly but to store data more cost effectively, he said.
Cold storage will be part of Facebook's infrastructure in the next year or two, he said. Facebook plans to disclose and share the parts that it thinks are relevant through the Open Compute Project, an initiative started by Facebook to apply the open-source software collaboration model to the world of data center hardware.