Connections +
Feature

A Troubleshooting Manifesto

The arrival of the switched environment has permanently altered how network performance errors are not only found, but fixed. In order to tame this complex beast, frontline technicians need to utilize and take advantage of new tools and techniques that are now available.


January 1, 2005  


Print this page

In the days when ‘dial-up’ ruled, when music was played on cassettes, and when 48 Hours was both the name of a popular movie and the amount of time it took to download a 1 MB file, the network environment was a relatively simple one.

There were hubs, bridges and routers, each being a discrete box readily identifiable from the others.

Troubleshooting the network was also simple. If you were attached to a hub, then the rules for troubleshooting a collision domain applied. At the point where it attached to a bridge, all errors stopped.

Troubleshooting using a protocol analyzer was the best available option and it was very effective once the user knew the basics of the network and the protocols in use. Then switches appeared on the scene, and all of the rules changed.

As network managers continued to upgrade their infrastructure to a ‘switched-to-the-desktop’ environment, they began to realize the benefits of a fully switched architecture — segmenting traffic and preventing the propagation of Ethernet errors within their network.

However, a switched environment hides lower-layer problems that affect individual link performance, often leaving the frontline technician having to blindly guess about the status of the connection.

Unlike those ‘old’ days when networks would routinely break and fail, today’s networks break, recover and eventually slow down. According to Tony Fortunato, senior network specialist and founder of The Technology Firm in Georgetown, Ont., “today’s network problems are complex and can’t be solved using tools and methodologies that were designed for yesterday’s network problems.”

It is a new paradigm and network technicians must evolve and utilize the new tools and techniques that allow them to see into the network, beyond the physical layer — to prevent performance problems before they occur.

Seeing inside the Switch: In a switched network, the troubleshooting challenge originates from a basic inability to see inside a switch.

By installing a switch, a collision domain is created on each port. If shared media hubs are attached to the port, then the collision domain may grow to the maximum size allowed for that Ethernet implementation and port.

Most new networks have a single station per port, so in the case of switched connections, the collision domain is only a single cable link.

The problem really begins with the OSI Layer 2 bridging performed by a switch, and is exacerbated by enabling VLANs and other OSI Layer 3 (and higher) features and forwarding rules.

Advanced switching features such as OSI Layer 4 and higher forwarding and load balancing require a strong knowledge of the switch configuration options to troubleshoot.

A five-step guide

Techniques for Troubleshooting a Switch: Unlike monitoring a network when hubs — instead of switches — were common, today it is nearly impossible to see all of the traffic flowing through a switch.

Most troubleshooting assumes the traffic will pass between the station and an attached server or through the uplink.

If two stations were passing information directly between themselves, the traffic would not pass through the uplink or to any other port on the switch. And, unless they knew to look for it, technicians would not be able to detect this.

The following five techniques provide technicians with visibility into a switch — a critical first step in trouble-shooting.

In determining the best approach, technicians must consider the tools available to them as well as the potential service interruption that will result in network downtime and the potential for business productivity losses:

Access the switch console via TELNET or the serial port: Configuration of the switch can be reviewed by logging in through a TELNET session (also referred to as the Command Line Interface), or by attaching to the serial port of the switch and logging in. While this configuration data won’t easily reveal a misbehaving switch, it will be useful in guiding troubleshooting efforts to see if the switch is operating as expected and within manufacture’s specification.

Connect to an unused port: The simplest approach to troubleshooting a switch is to gain access to the broadcast domain by attaching a monitoring tool to an unused port on the switch.

But the switch will only forward a very small amount of traffic to the monitored port — appropriate behavior on the part of a bridging device because it is designed to prevent unnecessary traffic from reaching ports where it does not belong.

Since this limited view of the network is only effective for broadcast traffic, it is necessary for the monitoring tool to allow traffic from a suspected port (or ports) with errors to be copied to the monitoring port. This technique is usually referred to as port aliasing, port mirroring, conversational steering, or port spanning.

Insert a hub into the link: In many networks, most traffic will be received or transmitted by a shared resource such as a file server. Adding a shared-media hub between the switch port and the file server allows an analyzer to be connected to the same collision domain as the file server.

This technique enables the analyzer to monitor all traffic and errors to and from the file server, which assists the network technician in diagnosing a wide range of problems, including user login failures, poor performance, and dropped connections.

There are two major drawbacks to this method. First, the server link cannot be a full duplex connection or the resulting duplex mismatch will introduce more errors than it is likely to reveal.

Second, many of the newer “hubs” are actually bridging devices masquerading as hubs, when an actual shared-media hub is necessary.

Use a tap or splitter: This is similar to adding a shared-media hub, except the tapped link may only be used for full duplex links and does not allow the monitoring tool to transmit. This technique can be used for both fibre and copper links using the appropriate tap.

Tapping the line is an excellent way to see what is passing through a link and is commonly used with protocol analyzers. Once installed, the tap is invisible to the attached devices and may be utilized at any time without further disruption. However, the link must be broken to insert the tap.

Furthermore, the transmit path will be offered on one connection and the receive path on another.

To simultaneously monitor a request and response passing through the tapped link, it is necessary to have a monitoring tool with two input ports.

Tools with dual inputs have the ability to monitor both directions simultaneously.

Query the switch using SNMP: The most effective and least intrusive method of troubleshooting a switched network is to ask the switch itself how the network is behaving.

This is done with Simple Networking Management Protocol (SNMP) or by connecting to the console port of the switch. The SNM console does not have to be anywhere near the monitored device as long as there is a routed path to the target, and security configurations permit the console to communicate with the agent in the switch.

Because switches do not routinely forward errors, using SNMP is perhaps the best method of locating ports that are experiencing errors. Finally, the difficulties of trouble- shooting switched environments can be overcome with a tools such as one from Fluke Networks, which provides discovery and diagnostics for quick problem solving,

The handheld analyzer eliminates navigation obstacles when connecting into the network to diagnose user issues and automatically determines the nearest switch, interface and VLAN for each discovered device.

Unfortunately, today’s typical troubleshooting scenario starts with user complaints that the application is slow.

Once this occurs, the impact on productivity is likely already being felt by the organization and is adding to overall costs.

By adopting a more proactive approach to managing switched network environments — one that focuses on preventing problems before they occur, organizations can realize significant time and productivity improvements.

Brad Masterson is Canadian Product Manager for Fluke Networks and a member of CNS Magazine’s editorial advisory board He has been involved in the field of networking and network testing since 1995.