Colgate University Uses NetMRI
"I have upgraded 400+ IOS devices while I have a cup of coffee with a simple script in NetMRI. Being able to focus on other duties and not worry about missing changes made is great...and being able to rapidly replace a device quickly is even better."
—Don Rhodes, Network and Systems Administrator
More CustomersConnect With Us
The Network Monitor, Volume 7 Number 1
Take the Next Step:
Related Information
In this Issue:
Knowing Your Network - The Business Impact of Network Problems
A review of how network problems affect business operations can aid in communications with business executives.
Industry Focus: Healthcare
Learn how the healthcare industry is using networks to improve their operations and reduce costs while keeping patient data secure.
Customer Spotlight: Northwestern Community Hospital
Northwestern Community Hospital is an example of a healthcare provider that is achieving important gains in productivity using NetMRI.
Network Analysis Tip #1: VLAN with No Active Ports
Network Analysis Tip #2: QoS Queue Drops
Knowing Your Network - The Business Impact of Network Problems
Network problems impact more than just the network and applications staff. They ultimately impact the business and its profitability. It is often difficult to tie network problems to the business, but that process is necessary to show corporate executives the importance of the network to the business. Network staff members who ignore the tie between the network and the business or will invariably have problems getting adequate budget and staff.
The company that treats its network as a strategic asset and that understands how to use the network to provide more efficient business services will have a lower cost of running its business and higher profitability. Look to the top financial firms to see examples of how they use their networks to move ahead of their competition. In transportation, FedEx uses its IT infrastructure, based on a highly reliable network, to track millions of packages and get them to the right destinations on time. In both examples, the network is the foundation upon which the business processes operate.
The pyramid and network diagram shows the link between the business processes and the network. At the top are the business processes, such as entering a customer order. This process depends on the business applications for order entry, tracking the customer data (CRM), and processing the details of assembling the order and delivery to the customer (ERP). These applications are supported by the network protocols and infrastructure. Communication with business executives about this linkage helps them understand the benefit of allocating network budget funding and the benefit to the business (see graphic).
Lets examine the tie between network problems and their impact on the business as a way to communicate the importance of the network. Using this information, you can approach the corporate executives with facts on how improvements to the network and its management can yield improvements in business operations.
Configuration Changes
Industry analysts report that 40%-60% of network problems are due to improper configuration changes. Thats a big number, even at the low end of its range. The correct set of people, processes, and tools can significantly reduce these figures.
Validating proposed and installed configurations as well as monitoring configuration change deployment has been demonstrated to significantly reduce configuration mistakes as a source of network problems. The Information Technology Infrastructure Library (ITIL) provides guidance in how to implement these processes, resulting in significant reductions in network problems due to configuration changes.
An item that is often overlooked is that operational data is required to adequately validate many parts of configurations. For example, spanning tree data is used to determine the set of switches in a given VLAN, because the VLAN ID is often reused in different parts of the network.
Physical Characteristics
Early alerts from physical measurements are essential to get a head-start on avoiding a network problem or reducing its impact. Temperature measurements, power supply loading, power supply failure, and fan operation fall in to this category. The business impact is clear – a network device will shutdown and impact any business process that depends upon it. Knowing the applications that are running on the network, the business processes that they support, and the network infrastructure over which they are running is key to understanding the business impact. Some of this information can come from the network, but a Configuration Management DataBase (CMDB) is needed to provide the additional information that is required to make informed decisions.
Routing
Basic routing technology doesnt seem cutting-edge these days. But when you add QoS, policy routing, MPLS, and the new virtualization services to the routing domain, you have a complex system that must be correct in many aspects in order to provide the desired level of service to business processes. An unstable routing protocol or a long delay path from a call center to a server can result in lower productivity, which translates into higher cost for the business. A call center screen that refreshes in under one second instead of five seconds will be more efficient and can handle higher call volumes and generate more business for the company.
QoS works in concert with routing on converged networks to prioritize multiple applications that are competing for the same bandwidth. When the utilization approaches the available bandwidth, and the router begins to queue packets, QoS becomes important. You need to know the most important business process and the applications it uses in order to create a configuration that prioritizes the important traffic over less important traffic. Once you have the right configuration, you need to monitor queue drops to make sure that QoS continues to operate correctly even when a change occurs in the business application mix.
Switching
Switching technology has its own unique set of potential problems, mostly originating with improper configuration settings. Spanning trees can be unstable due to under-powered and overloaded root bridges. Each transition of a spanning tree results in a surge of unicast flooding as the switches refresh their forwarding table cache. Depending on the architecture and configuration, links may go through the spanning tree states of blocking, listening, learning, forwarding. During the reconvergence time, the link is not forwarding traffic, impacting any applications and the business processes that need to use that path.
A latent configuration error in one organization brought down a 900-server data center. The data center was a large switched network, and the switches used the default bridge priority setting of 32768. When all bridge priority values are the same, the spanning tree root bridge is selected by lowest MAC address. Someone needed another switch port and installed an old, small, low-powered switch to provide the additional port. This old switch had the lowest MAC address and became the root bridge of the spanning tree. In the major data center environment, with a lot of traffic flowing through the network, the old, low-powered switch couldnt handle the load and the data center spanning tree became unstable, switching between the original root bridge and the low-powered switch. The data center network was unusable until the offending switch was identified and removed. Having the proper configuration deployed, with a way to periodically validate it, would have prevented the loss of several hours of business services.
In a similar vein, duplex mismatch is the most prevalent problem in switched networks and is exacerbated by conflicting configuration recommendations by various equipment vendors. When a duplex mismatch occurs on a link to a major server, it seems to work well when the demand is light. But as the use of the server, and the resulting traffic load, increases, link errors increase. NetMRI has reported error rates as high as 34% on a busy link with duplex mismatch. At this error rate, any network connections used by business applications will have very poor operating characteristics. The network staff will typically not see anything wrong unless they look at the specific links error counters.
A major financial firm had a specific example of duplex mismatch occurring between a router and a switch, affecting a financial processing application. It took a week to convince the operational group that a duplex mismatch could have a serious negative impact on the applications. It was a basic configuration error that created an operational problem that exhibited symptoms of a performance problem. It was an expensive lesson to learn, and unfortunately, it happens too often.
Summary
There are many more network problems like those described above, many of them the result of incorrect configurations or incorrect deployment. The examples above demonstrate the value of incorporating both configuration data and operational data in one set of analysis to identify network problems. The network staff needs to identify how network problems impact the business and communicate the impact to the executive staff with an emphasis on how to reduce the impact. The result, when properly applied, is increased profitability and that is what is important.
Industry Focus: Healthcare
To the healthcare industry, the combination of networks and privacy regulations are at opposite ends of the spectrum. Networks are used for key functions such as accessing patient records, test ordering, transporting test results, and the back-end financial and business functions. On the other end of the spectrum are laws such as the Health Insurance Portability and Accountability Act (HIPAA) that require protection of health records from unauthorized access and disclosure. Increasing use of wireless, connectivity to neighboring health care organizations, and connectivity to home offices of doctors are all adding to network complexity and increasing compliance and security concerns. It is a challenge to take advantage of networks while still protecting patient information.
The Network Environment and Challenges
Healthcare networks are much like other large enterprises, with some variations. Accepted best practices, such as the core, distribution, and access layer building blocks are used in the design, implementation, and administration. Depending on the history and age of the network, it may be transitioning from a switch-based design to a routed-based design. The transition also has to incorporate financial factors, so some parts of the network may be using older equipment and scheduled for re-design with newer gear.
Wireless is a key technology. The emergency room of a hospital I visited recently used wireless computers on carts to handle admissions. Patient data was entered at bedside. Wireless telephones are popular because cellular phones often dont work deep in the building or are restricted. Some hospitals are now using voice activated lapel phones that allow the wearer to call for assistance while both hands are busy handling an emergency. Wireless is also used at bedside to allow easy updates of patient records. Keeping patient data safe in a wireless environment requires consistent network configuration. Similar to other enterprises, non-technical people often have flawed perceptions of wireless. At one site, the facilities people mentioned that they liked wireless because we dont have to install any wires, not understanding the need to run wires to access points.
Link speeds are constantly increasing. Radiology has a reputation for big images and the doctors like to be able to move through a series of images quickly, typically no slower than one every few seconds. While each image may only be a megabyte or two in size, there may be hundreds or thousands of images in a single series. Being able to play it back like a movie requires much more bandwidth than the size of one image would imply.
Redundancy is important. The cost of a doctors time and the timeliness of the data for patient care implies that redundant data pathways are incorporated into the design. Technologies like ether-channel, dual chassis, backup system controllers are in widespread use. The challenge is to monitor and identify failures in redundant configu-rations before a combination of failures occurs.
Clusters of healthcare providers use metro-area and wide-area networks to share and update patient records. The expectation is that the access will be similar to what exists within the hospital. The combination of LAN, MAN, and WAN technologies increases the complexity of the network. The connectivity and application access to consistent patient data between healthcare groups translates into improved healthcare.
Of course, back office processing of bills, lab results, patient record archiving, and myriad other things are necessary to keep the hospital competitive and efficient. In this respect, it is much like most any other large enterprise.
Finally, QoS configuration is an absolute requirement. With large images and video traversing the network, queueing and transmission latency can have a big impact on the quality of time sensitive applications like VoIP. How many queues are needed to support the important applications? Who determines which applications are important? Layers 8 and 9 (political and financial) of the network protocol stack are especially important in resolving these questions.
Budget
Speaking of layers 8 and 9, there are continual pressures on fiscal performance of healthcare organization, especially as the cost of healthcare continues to rapidly rise. Network audits must accurately determine what equipment is in service as well as the utilization of that equipment. Audits are used to select the equipment that must be upgraded in the next phase of the technology refresh cycle. An age of the fleet report is essential to planning each years equipment upgrades. Similarly, an End-of-Life report showing which devices a vendor has declared at end of life can identify where maintenance of equipment is no longer possible or where it costs more to maintain it than to replace.
Customer Spotlight: Northwestern Community Hospital
A world-class medical provider, Northwest Community Hospital provides a full range of medically advanced inpatient and outpatient services. Located in the northwest suburbs of Chicago, Illinois, Northwest Community Hospital boasts over 900 physicians and 3,700 employees including nurses, allied health professionals, administrative and support personnel. The hospital cares for over 500,000 patients annually and is well-recognized for its advanced medical technology, caring culture, and clinical expertise.
Northwest Community Hospital's IT department is tasked with monitoring a very extensive network, including the main hospital, three external treatment centers, five medical office locations, an outpatient surgery center, home healthcare services, a wellness center, a fitness center, an imaging center and a youth center.
With such a widespread network and with plans for major expansion within the next year it was critical for Northwest Community Hospital's IT department to maximize network effectiveness and efficiency, but without hiring additional IT staff. An automated solution was needed that would take the manual labor out of administering the network.
We needed a solution that could be used by our entire IT department, whether they are new to the industry or have years of management experience. Furthermore, we wanted a solution that would automate much of our daily management tasks. said Tyrone Mitchell, project manager at Northwest Community Hospital.
Northwest Community Hospital chose NetMRI because of its ability to simplify the management of the network. This single solution takes an integrated network approach with its ability to audit, analyze, and automate the day-to-day tasks of running their network.
NetMRI proactively detects performance, configuration and policy issues compared against corporate and built-in industry best practices. It automates data collection and performs deep network analysis, prioritizing potential network issues for the IT department. NetMRI automates the network management for the hospital, allowing the IT staff to focus on the pending expansion.
By providing the IT staff with a high-level scorecard and a list of real-time issues showing the networks overall performance, NetMRI enables them to quickly and easily view any issues before they become problems. When an issue is identified, the staff can drill down to in-depth detail and analysis in order to rectify this issue. Especially helpful is the automatic detection of incorrect configuration changes and configuration policy exceptions. Because the hospitals IT staff is quickly informed, they are more proactive; solving problems before the hospitals applications and processes are significantly affected.
Since implementing NetMRI, Northwest Community Hospital's IT staff has taken a more proactive approach to managing its network. Now, NetMRI acts as a true extension of the hospitals IT department, automating its network management processes and as a result, significantly reducing its troubleshooting time. NetMRI has taken the grunt work out of network management, said Mitchell. It steers us in the right direction by pointing to specific issues in the network, and prevents harmful problems from occurring.
Network Analysis Tip #1: VLAN with No Active Ports
Why is this important?
A device configured with a VLAN and has no active ports may be the result of a misconfiguration. Another source might be the re-deployment of a switch with a previous configuration to a different part of the network, resulting in VLANs that appear where they shouldnt.
Manual determination:
Finding VLANs with no active ports is very easy using manual methods, The command in Cisco IOS:
If there are no ports in the VLAN on this switch, then the Ports list will be empty, such as VLAN 3, named Remote, above. A single switch with no ports in the spanning tree for a VLAN may be properly configured and is waiting for a port to become active in that VLAN, either due to manual configuration or dynamic VLAN operation. In the case of manual configuration, any switch that has a VLAN with no ports may be configured to prune the VLAN on the trunk in order to minimize trunk utilization (remember, broadcasts are flooded to the entire VLAN, which includes switches that have no ports in the VLAN because the originating switch doesnt know there are no ports in the VLAN of the destination switch.
Automatic determination:
The same data that is obtained via manual methods can be obtained with SNMP. A bit of analysis is used to determine that there are no ports in a defined VLAN on a switch. The screen shot to the right shows a switch with several empty VLANs. Further investigation showed that this switch was the only one in each of these VLANs. These VLANs existed on other switches, but this switch was isolated from them by a router. The configuration had been copied from the other switches without thought about which VLANs would actually be needed. It would also make sense to give the VLANs names that aided in future maintenance and troubleshooting.
Further reference:
Cisco Whitepaper: Best Practices for Catalyst 6500/6000 Series and Catalyst 4500/4000 Series Switches Running Cisco IOS Software at http://www.cisco.com/en/US/products/hw/switches/ps700/products_white_paper09186a00801b49a4.shtml.
Network Analysis Tip #2: QoS Queue Drops
Why is this important?
QoS is used in networks to prioritize important data over less important data when links become congested. You typically select time sensitive and business-critical applications to be given priority service over bulk data applications. There are many documents and best practice guides available on the net to help you create good configurations.
Why use QoS? Why not just use really high bandwidth links? Those high bandwidth links certainly help, but they dont eliminate queueing. Lets say that you have a design with access-layer switches with fifty 100Mbps ports and the uplink to the distribution switches is 1Gbps. Ten file transfers or large web page loads will fill the uplink because TCP tries to use as much bandwidth as it can. Add some voice calls to the mix and you have a situation where the call gets garbled for a few seconds due to high jitter. As the applications continue to move large amounts of data, the voice calls will experience intermittent jitter problems. WAN links with less bandwidth exacerbate this problem. QoS allows you to prioritize the small voice packets when the interface queue begins to fill.
Once you have QoS configured, you should monitor its performance so that you know that it is performing the way it should. As the relative importance of applications changes, you may need to revisit the QoS configuration and only by observing its operation will you be able to make an informed decision on how to modify it. You may also use the information to decide if you need to add bandwidth to a particular path.
Looking for drops in QoS queues lets you know how much traffic is being dropped and in which queues. Expect to see drops in the scavenger and bulk data queues.
If you see drops in the high priority queues, it may be because the application is now using more bandwidth than you initially configured. Lets say you designed QoS on a link to provide service to 5 concurrent voice calls and over time the number of concurrent calls grew to 10. Youll see drops in the voice queue increase as the number of calls increases. Reallocating bandwidth may be the correct answer, or you may need a higher bandwidth link due to other critical business applications running on the same path. Having the data on queue drops allows you to make an informed decision instead of guessing that more bandwidth is the solution.
Manual Determination
Every vendor has different ways of examining the QoS queues. Im going to use Ciscos commands here. If you dont have QoS configured, check to see if output drops are occurring:
Router# show interfaces serial 5/0 Serial5/0 is up, line protocol is up ... Input queue: 0/75/0 (size/max/drops); Total output drops: 1036
This is an interface that could potentially use QoS since it is oversubscribed and is dropping packets. Then, with QoS configured, check the queues for drops (the important output is in bold):
ts1#show policy-map interface fa1/0 FastEthernet1/0 Service-policy output: example Class-map: class-default (match-any) 115404 packets, 9726599 bytes 5 minute offered rate 29000 bps, drop rate 2000 bps Match: any police: 15000 bps, 20000 limit, 20000 extended limit conformed 4320 packets, 479920 bytes; action: transmit exceeded 472 packets, 72276 bytes; action: drop violated 3720 packets, 296768 bytes; action: drop conformed 14000 bps, exceed 2000 bps violate 8000 bps
Automatic Determination
For interfaces without QoS configured, NetMRI collects interface discard counts and reports them in the Interface Congested issue.
Drilling down the the details on this interface, we find a traffic burst that resulted in congestion.
But when QoS is implemented, there will be no discards, because the queueing mechanism is dropping the packets before they get to the interface. In this case, NetMRI monitors the Cisco CBQoS MIB to detect queues that are dropping packets, as shown below. This example is taken from a test configuration in our development lab.
Some Cisco devices have hardware queueing support, and if you dont define a queue, the hardware does CoS based queueing. The hardware wont report queue drops, so the best thing to do is to implement a no-op policy and then the CBQoS MIB will be populated by the device.
Further Reference
Cisco QoS Output Scheduling on Catalyst 6500/6000 Series Switches Running CatOS System Software tells how packet drops are done in Cisco 6500 CatOS: http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801091a5.shtml
Cisco Understanding Packet Counters in show policy-map interface Output shows drop counters for each policy: http://www.cisco.com/en/US/tech/tk543/tk760/technologies_tech_note09186a0080108e2d.shtml
Back to The Network Monitor Archive
Copyright © 2008 Netcordia, Inc. All rights reserved.
