Welcome to Infoblox NetMRI Community Sign in | Join | Help
in Search

Applied Infrastructure

Syslog, SNMP Traps, and UDP Packet Loss

I was recently checking out a product that does syslog correlation and noticed that it had not reported a couple of events that I could see in syslog-ng's log. I use syslog-ng because it is free, easy to install and configure, performs filtering, and forwards to other destinations. I normally have it configured to log everything to the local filesystem and to filter and forward specific events to other systems. It provides a good de-coupling mechanism between the network devices that are sending syslog messages and the systems that must process syslog. For example, NetMRI needs to receive Cisco CONFIG_I events indicating that a configuration change has been made.

The product that I was configuring was running on a separate server.  It needed to receive syslog events and its display wasn't showing me all the events that syslog-ng was recording. At first I blamed the product, but I then decided to replace it with another copy of syslog-ng to simplify the test. The test setup was syslog-ng running on Server A, a RedHat EL5 server, receiving syslog events from all the network equipment. Server B, a Centos 5.3 server, was configured with a second copy of syslog-ng, also logging to the filesystem. Server A was forwarding all Cisco syslog events to Server B. The rate of syslog events was on the order of 10 packets per second during peaks. Each packet was pretty small, because Cisco syslog messages tend to be small. I was very surprised to find that a measurable percentage of the syslog messages were being dropped on System B, even with syslog-ng. So it wasn't a problem within the product that I was trying to install.

The next step was to verify that the UDP packets were making it from System A to System B. I ran tcpdump on both systems and verified that System A was sending the forwarded packets and that System B was receiving them. But syslog-ng was still not receiving all the events. Looking through System B's syslog events and the tcpdump events, I could see that the packets were being received by the system, but were not being received by syslog-ng.

There are a number of web sites that discuss UDP packet loss. A good one is  29West.com's UDP Buffer Sizing page, which includes commands for reporting the number of dropped UDP packets for several operating systems. On my system, it showed a lot of UDP packet errors:

$ netstat -us

Udp:
    29582255 packets received
    6898 packets to unknown port received.
    15597 packet receive errors
    29934317 packets sent

That definitely looked like the problem. So I worked on a number of recommendations for adjusting the UDP packet buffers. Some recommendations consume a lot of buffer space, as described in the 29West.com article above. I still had packet drops. I then switched System B to use a RedHat release and the packet errors dropped significantly. It turns out that the Centos 5.3 release drops many UDP packets, event at relatively low packet rates.

I would have expected any modern Linux kernel to be able to handle a load of hundreds of UDP packets per second on a 1-core server where there is no other competing process. But for some reason Centos has a problem handling even modest UDP packet loads. Switching to RedHat EL5 eliminated most (but not all) of the packet loss.

This brings me to another point that I find myself often making to network management vendors: syslog and traps are inherently unreliable due to the nature of their transport protocol: UDP. My recommendation to vendors is: Don't write your network management application as if UDP were a reliable protocol. Use multiple mechanisms or multiple requests to get the data that's needed to create informative answers to common questions. My recommendation to users: verify that the syslog and trap receivers are not dropping packets.

  -Terry

Comments

No Comments

About tslattery

Terry Slattery, CCIE #1026, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as a full time consultant, Terry was the founder and CTO of Netcordia, and inventor of NetMRI, a suite of network management products. Terry started Netcordia as a consulting company in 2000 and transitioned to a network management product company in 2003. During the consulting days, he used his network design and implementation skills to lead a team in the design and implementation of a high availability network at a brokerage clearing house. Terry is the former President and founder of Chesapeake Computer Consultants, Inc., a networking and computer systems training and consulting company. He co-invented and patented the vLab(tm) internet-based remote lab system. He is co-author of the McGraw Hill text Advanced IP Routing in Cisco Networks. Terry led the team that developed the current Cisco IOS user interface under contract to Cisco Systems. Terry is experienced in the design and installation of large TCP/IP based networks and is a successful network protocol instructor. He is the second Cisco Certified Internetworking Expert (CCIE) #1026 and the first outside of Cisco. He enjoys membership on the Vanderbilt University Engineering School’s Industrial Advisory Board and the IEEE.

This Blog

Syndication