- Ease of deployment of a 48-node pod
- Scalability is simplified and unlimited
- Each pod is identical with equal connectivity
The next step is to connect all the fabric switches -- the slide in Figure C depicts how that is accomplished. Andreyev says this is simpler (it is hard to imagine what it used to be like).
Facebook engineers stayed with the 48-node theme when adding the spine switches. Andreyev explains, "To implement building-wide connectivity, we created four independent 'planes' of spine switches, each scalable up to 48 independent devices within a plane. Each fabric switch of each pod connects to each spine switch within its local plane."What Andreyev mentions next is mind-boggling, "Together, pods and planes form a modular network topology capable of accommodating hundreds of thousands of 10G-connected servers, scaling to multi-petabit bisection bandwidth, and covering our data-center buildings with non-oversubscribed rack-to-rack performance."
Network operations
The Fabric network design standardizes on "layer 3" from the TOR switches to the network's edge, supports IPv4 and IPv6, and uses Equal-Cost Multi-Path (ECMP) routing. "To prevent occasional 'elephant flows' from taking over and degrading an end-to-end path, we've made the network multi-speed -- with 40G links between all switches, while connecting the servers on 10G ports on the TORs," adds Andreyev. "We also have server-side means to 'hash away' and route around trouble spots if they occur."
Physical layout
Andreyev writes the new building layout shown in Figure D is not that much different from earlier Facebook designs. One difference is locating Fabric's new spine and edge switches on the first-level between data hall X and data hall Y and moving network connections to the outside world (MPOE) above the spine and edge switch area.
Figure D
Overcame the challenges
Facebook engineers appear to have surmounted their challenges. Hardware limitations are no longer an issue. The number of different components is reduced as is complexity. Andreyev says the team embraced the "KISS (Keep It Simple, Stupid) Principle," adding in the paper's conclusion, "Our new fabric was not an exception to this approach. Despite the large scale and complex-looking topology, it is a very modular system, with lots of repetitive elements. It's easy to automate and deploy, and it's simpler to operate than a smaller collection of customized clusters."
No comments:
Post a Comment