Facebook designing network fabric to meet massive performance needs
With more than a billion monthly active users, it’s easy to imagine that most of the data travelling over Facebook’s networks is delivering photos, status updates and “likes” to its end users, but that’s far from the case.
The social network moves about 1,000 times as much data between the servers inside its data centers as it does from its servers out to end users, company executives said Wednesday. They talked about the challenges that this creates for Facebook and the network technologies it’s developing to overcome them.
“Our traffic going from machine to machine far exceeds the traffic going from the machines out to our end users,” said Jay Parikh, vice president of Infrastructure Engineering at Facebook, in an on-stage interview at the GigaOm Structure conference in San Francisco.
That’s because of all the processing work Facebook does on the back end to figure out what information it needs to send to end users. The systems analyze data, rank results, and perform a myriad of other tasks to generate the pages Facebook delivers to users’ smartphones and Web browsers.
The problem, as Parikh described it, is that Facebook is using network equipment and protocols that were designed for a different era: for the ISPs (Internet service providers) whose chief role was to provide connectivity for end users, rather than companies like Facebook that are delivering applications and content at massive scale.
One result is data bottlenecks. Facebook sees “flash floods, choke points, all types of scenarios,” said Najam Ahmad, the director responsible for Facebook’s network operations, who also talked at the Structure conference
“One of our services handles 2.5 billion operations per second; that’s a lot of packets being sent over the wire,” he said.
Another issue is that the network technologies aren’t smart or flexible enough for the types of applications Facebook is running. “There’s really no way to communicate between an application and the network today. The app just puts the packet on the [network] and hopes it gets to the other end,” Ahmad said.
“We want to come up with technologies that give the app a better feel for what the network is doing—where it has capacity, where there are problems ... where you can find a better path.”
It’s addressing the network challenges in a couple of ways. One is by designing new switching hardware through the Open Compute Project, a multi-company effort to build new types of equipment for Internet-scale data centers. OCP’s initial goal for networks, announced in May, is to design a top-of-rack switch that’s not tied to any one operating system.
Aside from that, Parikh said Wednesday, Facebook is developing a network “fabric” that is, in effect, the company’s own take on software-defined networking. SDN refers to a set of technologies that, among other things, could shift control in networks out of specialized switches and routers and into software that can run on a variety of standard hardware.
For Facebook, the new fabric should mean lower operating costs and the ability to deliver new services more quickly. For end users, it should mean better services that operate more quickly, Parikh said.
Networks traditionally use a three-layer hierarchy that was designed primarily to pull data from storage systems, up through servers and out to end users. Facebook wants a “flatter” architecture that’s better suited to moving data from machine to machine, Ahmad said.
“If you can build a fabric where every cabinet, every rack, is at the same level, where the connectivity between any two racks is uniform, then you have a better chance of managing the apps, and the construct becomes more logical than the physical, hierarchical-based systems that we have today,” he said.
The software layer will allow Facebook to add more functionality to the network more quickly. Today, innovation in Facebook’s applications is happening much more quickly than innovation in the network, Ahmad said. “There’s a huge mismatch there. How can we make the network development happen much faster? That’s what SDN brings.”
Like Google, Facebook is also working to improve its connectivity outside the data center, by investing in the massive fiber-optic cables that carry Internet traffic between countries. It’s invested in the Asia Pacific Gateway undersea cable and in a “fiber loop” that connects a data center in Sweden with other parts of Europe, Parikh said.
More than four out of five Facebook users are outside the U.S., and operating its own network can reduce costs for Facebook and allow it to increase network capacity in markets much more quickly when demand there increases. “We’re talking about bringing up capacity in seconds or minutes rather than weeks or months it takes today,” Ahmad said.
Facebook isn’t the only online company developing new technologies to support the unique types of services it delivers. Google is well known for designing its own server hardware, to make its data centers more energy-efficient, and it developed the MapReduce software for analyzing massive data sets that eventually formed the basis for Hadoop.