Welcome to Infoblox NetMRI Community Sign in | Join | Help
in Search

Applied Infrastructure

High Availability Networking (>5-nines)

One of the best sessions I attended at CiscoLive this year was titled "BRKRST-3365, Unified HA Network Design: The Evolution of the Next Generation Network" by John Cavanaugh, Chris Cornwall, and a whole team of contributors.  They talked about the High Availability (HA) network designs that they have done over the past ten years.  Some of their network designs have had no application-affecting down-time over a ten year period. There were several key factors that influence high availability.

The first important factor was cross-connected dual-core networks.  They labeled the two cores as Red and Blue with cross-connections so that a single failure would not cause packets to take a much longer path around the failure, potentially impacting application performance.  Why two core networks?  Full redundancy allows one core to be taken out of service for maintenance while production continues on the other core.

Dual-core redundancy is important for companies who can no longer afford maintenance windows for performing network upgrades.  One VP of network engineering at a financial firm told me that he has two maintenance windows: July 4 and Christmas.  Global companies may find those days are also unavailable because significant parts of the world economy runs year-round.  Being able to take out half of the network for software and hardware maintenance while the business runs on the other half allows prompt resolution of relatively minor network problems as well as addressing security vulnerabilities in the network infrastructure.

The other major factor that I liked was their recommendation for reduction of failure domains.  A simple example is to design relatively small Layer 2 domains so that when a spanning tree loop occurs, it has a smaller range of impact.  I've heard of a 900 server data center outage that was due to the insertion of an old switch into a data center-wide spanning tree domain.  The switch was old enough and slow enough that it couldn't perform the task of the root bridge.  The entire data center's operation was affected.  A smaller Layer 2 domain would have reduced the negative impact.

Another HA recommendation that I like is putting redundant servers on different subnets.  Equipment on the subnets should not share common failure sources like routers, switches, power feeds, and cooling.  Geographically diverse data centers help, but watch out for latency between them.  Terrestrial latency is roughly 10ms per 1000 miles and high latency paths between data centers may negatively affect applications whose protocols rely on a packet per round trip time.

I highly recommend that you take the time to look up the recording for this session.  It was definitely one of the best I attended.

  -Terry

Comments

 

Terry's Blog said:

Scott Hogg of Global Technology Resources (GTRI) did a nice blog post for Network World way back in April

December 5, 2009 2:48 PM

About tslattery

Terry Slattery, CCIE #1026, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as a full time consultant, Terry was the founder and CTO of Netcordia, and inventor of NetMRI, a suite of network management products. Terry started Netcordia as a consulting company in 2000 and transitioned to a network management product company in 2003. During the consulting days, he used his network design and implementation skills to lead a team in the design and implementation of a high availability network at a brokerage clearing house. Terry is the former President and founder of Chesapeake Computer Consultants, Inc., a networking and computer systems training and consulting company. He co-invented and patented the vLab(tm) internet-based remote lab system. He is co-author of the McGraw Hill text Advanced IP Routing in Cisco Networks. Terry led the team that developed the current Cisco IOS user interface under contract to Cisco Systems. Terry is experienced in the design and installation of large TCP/IP based networks and is a successful network protocol instructor. He is the second Cisco Certified Internetworking Expert (CCIE) #1026 and the first outside of Cisco. He enjoys membership on the Vanderbilt University Engineering School’s Industrial Advisory Board and the IEEE.

This Blog

Syndication