Introducing data center fabric, the next-generation Facebook data center network

all our new data centers are based on our next-generation fabric design, which is a high-density Clos network deployed to meet our unprecedented growth challenges when building infrastructure at scale. At the heart of this design is the three-tier architecture

Facebook’s production network by itself is a large distributed system with specialized tiers and technologies for different tasks: edge, backbone, and data centers.

Our previous data center networks were built using clusters. A cluster is a large unit of deployment, involving hundreds of server cabinets with top of rack (TOR) switches aggregated on a set of large, high-radix cluster switches.. More than three years ago, we developed a reliable layer3 “four-post” architecture, offering 3+1 cluster switch redundancy and 10x the capacity of our previous cluster designs. But as effective as it was in our early data center builds, the cluster-focused architecture has its limitations.First, the size of a cluster is limited by the port density of the cluster switch.

even more difficult is maintaining an optimal long-term balance between cluster size, rack bandwidth, and bandwidth out of the cluster. The whole concept of a “cluster” was born from a networking limitation – it was dictated by a need to position a large amount of compute resources (server racks) within an area of high network performance supported by the internal capacity of the large cluster switches.
Allocating more ports to accommodate inter-cluster traffic takes away from the cluster sizes. With rapid and dynamic growth, this balancing act never ends – unless you change the rules.

Introducing “6-pack”: the first open hardware modular switch

Over the last few years we’ve been building our own network, breaking down traditional network components and rebuilding them into modular disaggregated systems that provide us with the flexibility, efficiency, and scale we need.

We started by designing a new top-of-rack network switch (code-named “Wedge”) and a Linux-based operating system for that switch (code-named “FBOSS”). Next, we built a data center fabric, a modular network architecture that allows us to scale faster and easier. For both of these projects, we broke apart the hardware and software layers of the stack and opened up greater visibility, automation, and control in the operation of our network.

But even with all that progress, we still had one more step to take. We had a TOR, a fabric, and the software to make it run, but we still lacked a scalable solution for all the modular switches in our fabric. So we built the first open modular switch platform. We call it “6-pack.”

Unlike with traditional closed-hardware switches, with Wedge anyone can modify or replace any of the components in our design to better meet their needs.

Disaggregation offers the option to select an OS from one vendor and run it on compatible hardware from a different manufacturer.

There are many supposed benefits of network disaggregation, but they don’t really apply in the enterprise.

  • Lower costs : Forrester's report “ The Myth of White Box Network Switches ” estimated that the cost of a disaggregated switch platform calculated over a 6.6 year period was not significantly less than buying a vendor product when you include support and operational costs.
  • Run one OS across every platform : Yes, but most likely only if those platforms are certified by your OS vendor, which may also limit hardware options to a particular brand of merchant silicon.
  • Deploy switches using puppet/REST API : This sounds great, but is the business ready to fund development of the solutions necessary to automate the management of the network this way? How often are you deploying new switches?
  • Reduced vendor lock-in : Certified hardware can be swapped in and out as desired, but does this simply shift the effective lock-in to the OS?
  • You can run other software on the switch : Sounds good, but do you have a production use case in mind? Additionally, traditional vendors are starting to offer this option too, so it may stop being a differentiator.

Another consideration is that the capacity of a typical white-box switch (48 x 10 Gbps Ethernet plus some 40 Gbps uplinks) means it is aimed at the data center environment, and likely has little use outside the DC or the largest of user sites.

results matching ""

    No results matching ""