By Peter J. Welcher, PhD, CCIE, Senior Consultant
Last issue we looked at how easy it is to configure Frame Relay. Now let's talk about design and scaling issues.
First, a disclaimer: there's more to be said on this subject than I can possibly cover in this article, so you should look at the Cisco Internetworking Design Guide on UniverCD. The Cisco Design course (CID), which I teach, also has sections on Frame Relay and ATM design.
Small Frame Relay networks of 10-20 sites are often built with a hub-and-spoke or star topology.
In such networks, Permanent Virtual Circuits (PVCs) are provisioned to a central site. This doesn't provide redundancy, but access lines are considered to be the component most likely to fail. And redundant access lines are costly, particularly if diversity is desired. Within the provider network, circuits are becoming fairly robust.
An alternative topology that does not seem to be getting much
use is the double star, with PVCs to two hubs for redundancy or
load balancing.
With small networks, the main design issue is what approach to use in configuration. I recommend using subinterfaces, otherwise split horizon issues may become a problem. If using IP with the IGRP routing protocol, split horizon is disabled on Frame Relay interfaces in recent IOS releases. So IP routing with IGRP works correctly whether or not you use subinterfaces. Creating the subinterfaces is good discipline, leaves an obvious place to add other features, and allows for per-PVC access lists, bandwidth statements, descriptions, etc. And it ensures that the IP addressing scheme won't have to change at some future date, for example, to match subinterfaces added in support of features such as bridging, IPX, or AppleTalk. So, use subinterfaces.
With larger Frame Relay networks, the major design constraint is bandwidth. First, you have to ensure access lines have enough bandwidth to handle data being sent from other sites. Using the star topology, if 10 sites have a CIR (Committed Information Rate) of 28Kbps, the central hub site may have 280Kbps arriving on its access line. But if the sites can burst to 56Kbps, the central site might burst to 560Kbps. Usually we assume bursts won't come from all remote sites at the same time and provide the central site with an intermediate capacity. Depending on local rate structure, access lines may be either 56K or T1, with no point to anything in between. If so, we might go with a T1 from the Frame Relay service provider to the central site rate, figuring there will be traffic growth anyway. But this is not the only way bandwidth enters into the discussion.
Frame Relay is not a broadcast medium. But a router participating in dynamic routing needs to send updates to its peers, at the other end of the PVCs emanating from itself. There are really only two ways to do this. One way is with switch multicast, where the router sends on a special DLCI and the switch simulates a broadcast. The other way is what's currently used: the router duplicates the broadcast and sends it directly to each intended recipient.
One effect is on the CPU. There is some use of the CPU in duplicating the broadcast packet(s)--one copy per peer router. A 10-packet routing update going to 50 peer routers results in 500 outgoing packets. These packets might hit the outbound serial interface queue at one time. This can demonstrably tie up the queue (and interface) for a while, obstructing normal traffic.
Another effect is that the packets may take time to feed out the serial interface, the serialization delay in emitting bits on a slow link. Outgoing copies may also consume a fair amount of bandwidth. Various Cisco documents suggest the latter is an important sizing criterion. Total bandwidth consumed by routing updates and other overhead activity should be less than 20% of the link bandwidth. Otherwise, resources and time are wasted outputting these overhead packets.
Some estimation is thus in order when designing a Frame Relay network of any size. How chatty is the routing protocol? How often do updates go out? How big are they? Are there hellos? Keepalives?
One way to estimate overhead bandwidth is with a spreadsheet.
(See Figure 2.) Perhaps the greediest consumers of bandwith are
Novell IPX SAP updates, which use 7 services per packet.
Suppose you add up all the numbers and too much bandwidth is being consumed by overhead activities. What are the alternatives? Well, adding more bandwidth is the easy answer. Another choice is to use less chatty protocols: IGRP, EIGRP, or OSPF instead of RIP; NLSP or EIGRP for IPX instead of Novell RIP/SAP. Static routing is the ultimate in low bandwidth, but high in administrative hassle. Cisco's snapshot routing (fairly new) is a possible alternative. When reviewing protocol and chattiness, don't make assumptions. The topology may alter things.
A less obvious choice is fewer peers. This introduces more routers, perhaps on a common LAN backbone. Or use a hierarchical structure, with access routers tied via X.25 or Frame Relay to a regional hub router, and Frame Relay tying these back to backbone routers at HQ. There is additional latency from passing packets through multiple routers, also from multiple serial/Frame Relay hops. But the resulting design is scalable and manageable. The Cisco UniverCD Design text discusses alternative topologies, as does the CID course.
Recently I saw a question on the Internet, asking whether a Cisco router could handle 500 or 1000 PVCs at the hub of a Frame Relay star. Interesting question! One reply was that the access line had better be a T3. If the remote sites have 56K access lines, T3 is in the right ballpark. However, some other questions come to mind: What is the overhead bandwidth? Assume 1 LAN outboard of each remote router. What if there are another 100 LAN segments somewhere? Static routing in the hub, default routes at the remote sites?
The CPU impact of pseudo-broadcast is a small amount per PVC. Multiply by roughly 1000 PVCs and you might have serious impact. Before attempting this, I'd want to talk to some Cisco engineers who'd tried something similar before. Maybe the CPU is ok but packet buffers or RAM is a concern?
But more important, is this good, modular, scalable design? It certainly has a single point of failure! If the remote sites go to 128K, how do you scale up the central site? How do you upgrade technology? How do you provide redundancy, another jumbo star? Clearly, a design needs to be analyzed in terms of how it will be used, both now and in the future.
Redundancy is a design concern to Frame Relay users. Dial backup, either over modems or ISDN, is a common solution. ISDN has the advantage of using a single 25xx BRI port at a remote site, and MBRI on a 4000 or PRI on a 7000. It also saves playing around with modems, which is nobody's idea of fun. Dial backup is now available on a per-PVC basis with subinterfaces (IOS release 10.3).
Using older releases or the NBMA model (non-broadcast multi-access) may require floating static routes, that is, ones with a high administrative distance. Dial-on-demand (DDR) may also be used. Triggering ISDN dialing on a native BRI interface with an older IOS release may require this. The idea behind the use of a floating static route: it remains inactive as long as dynamic routing provides a route. When a PVC goes down, the dynamic route eventually goes away, and then the floating static route kicks in, sending traffic out the backup interface. This traffic can then be used with dial-on-demand routing (DDR) to trigger modem or ISDN dialing. A neat trick!
A final part of design is planning for network management. The Cisco routers support the Frame Relay DTE MIB (RFC 1315). This allows for a good amount of per-PVC information to be collected. Do allow bandwidth to cover your polling!
Good design is the hardest part of Frame Relay, but will ensure that your networks work as intended. The above suggests some of the issues to consider.
Dr. Peter J. Welcher, CCIE #1773 and CCCI senior consultant, teaches the Cisco ICRC, ACRC, ICWC, and CID courses. He also consults on network design and management.