The Network Monitor

Hot Networks!

Cisco Hot Standby Routing Protocol Solves Problems Before They Start


Network administrators who are designing, building, or running fault-tolerant networks are often in the hot-seat. How do we design the network to fail over to backup routers and paths as quickly as possible?

A fault-tolerant network will likely have two routers servicing each LAN segment (See Figure 1) (16,970 bytes). If one router fails, we want the end systems to automatically switch to the alternate router. There are several mechanisms for implementing this functionality. Each has its own set of strengths and weaknesses. The mechanisms include:

Multiple Default Routers

Each end system has to support the configuration of multiple default routers and a method of switching between them when the failure of the primary router is detected. Few end systems support this mechanism (Unix typically does not), making this a poor general solution.

Wire-tap The Network Routing Protocol

The end systems must determine (or be configured for) the routing protocol in use on the attached LAN segment and participate in the protocol to the extent necessary to learn routes to external networks. Most typically, the RIP routing protocol is the network protocol used because participation is limited to simply listening to the updates being sent by all routers on the LAN. However, many sites are now using more modern routing protocols (OSPF or EIGRP). This implies also configuring RIP on all routers on each LAN segment just to let the end systems know about the existence of those routers. Using the default RIP timer of 30 seconds and three updates to detect a failed router yields a 90-second time to switch to the backup router. Another assumption is that all the workstations are capable of running a RIP listening process -- something that many PC-based TCP/IP implementations do not supply. Again, this is not a very general solution.

Proxy-ARP

By default, Cisco routers will reply to IP ARP requests made for off-net destinations for which they have routes. For this mechanism to be effective, the end systems must have their IP subnet mask configured to treat all off-net destinations as if they were attached to the local network. For example, to reach all subnets of 128.56.0.0, one would have to use a subnet mask of 255.255.0.0. On some systems (typically Unix), a default route to the local interface will suffice to tell the system that all destinations should be treated as if they were on the attached LAN segment. The problem with this approach is that once the ARP request has been satisfied for a given destination, it is not repeated until the ARP cache entry is cleared (on Unix this is typically 20 minutes).

ICMP Router Discovery Protocol (IRDP)

IRDP (described in RFC 1256) is based on ICMP messages being multicast from all routers on a LAN segment. The end systems hear these multicasts and learn the presence of each router and its priority relative to the priorities of the other routers on the same segment. When an end system boots, a separate ICMP message is used to request that all routers announce themselves to the LAN. End systems which implement IRDP also conform to the Host Requirements standards, which require them to look for alternate routes if TCP connections become stalled. While this is an Internet Standard, it is only implemented in relatively new and featureful IP implementations. Check with your operating system or TCP/IP implementation vendor for support of this feature.

Cisco's Hot Standby Routing Protocol (HSRP)

HSRP is a nifty protocol that shifts the function of selecting a backup router out of the end systems and into the network (well, into the routers). The idea is to create a "Phantom" router to service the high availability LAN segment (13,677 bytes).

In this example, both London1 and London2 are located at a London office and are configured to be part of a single HSRP group -- thus creating the "Phantom" router. The configuration of these two routers, with London1 as the primary and London2 as the standby router is shown below

London1 Configuration
hostname London1
interface ethernet 0
ip address 131.108.1.1 255.255.255.0
! "Phantom" is at 131.108.1.3
standby 2 ip 131.108.1.3
! Make this the primary router
standby 2 priority 110
! Preempt the backup if we come back
standby 2 preempt
! Reduce our priority if our serial
! goes down
standby 2 track serial 0 110
London2 Configuration
hostname London2
interface ethernet 0
ip address 131.108.1.2 255.255.255.0
! "Phantom" is at 131.108.1.3
standby 2 ip 131.108.1.3
! Make this the standby router
standby 2 priority 90
! Reduce our priority if our serial
! goes down
standby 2 track serial 0 90
.

These configurations create the "Phantom" router with an IP address of 131.108.1.3 at a MAC address of 0000.0c07.ac02 (the MAC address is created from 0000.0c07.ac**, where "**" is replaced with the HSRP group number).

By default the two routers exchange HSRP messages every second and the standby router will take over the IP and MAC address of the "Phantom" if it does not receive at least one HSRP message every three seconds. However, one note of caution -- on access routers (4000 and 2500), the backup router's MAC address changes to that of the "Phantom" if the primary router fails. This does not adversely affect most network protocols and provides the rapid failover that is desired.

Reviewing our example shows London1 configured as the primary router and that it will pre-empt London 2 if London 1 fails and is then returned to service. Each router will have its priority significantly decreased if its Serial 0 interface fails. This will allow the standby router to immediately assume the role of the "Phantom" router if the primary router's serial link fails.

HSRP is currently supported on Ethernet, FDDI, and Token Ring networks. Later IOS releases support multiple HSRP groups on the high-end routers (7000 series). More detailed information can be found on UniverCD and by searching for "HSRP" on CIO. The Hot Standby Routing Protocol is a very useful addition to the network administrator's toolbox for designing and building fault-tolerant networks.


Volume 1, Number 3 Table Of Contents