Connections +

Troubleshooting 101

Given the ever-increasing complexity of switched environments, the list of potential problems continues to grow.

June 1, 2003  

Print this page

We’ve come a long way since the early networking days when hubs, bridges and routers were discrete boxes with their own look and functions and troubleshooting could be performed using a single diagnostic tool.

The parameters were simple: When attached to a hub, the rules for troubleshooting a collision domain applied. At the point where the collision domain attached to a bridge, the errors stopped. At the time, a protocol analyzer was the best available option for troubleshooting, and – once the user knew the basics of the network and the protocols in use – was very effective.

All that changed when switches appeared on the scene. The scope of troubleshooting requirements expanded and became more complicated, which made monitoring and troubleshooting with traditional protocol analyzers alone much less effective.

Along with the evolution of switched networks came a new generation of network analyzers that can be combined with traditional protocol analyzers to deliver unprecedented views into the network and provide substantially faster solutions to networking problems.

These include a diverse range of active discovery tools with SNMP device analysis, RMON2 traffic analyzers, and a hybrid of network and cable analyzers. Each serves a specific function depending on troubleshooting needs.

While there is much to consider when selecting and using analyzer equipment, there are four basic approaches that any technician can apply when troubleshooting a switched environment, regardless of the tool used.

These are:

Use SNMP to query the switch: In most cases this is done from a network management platform;

Install a Tap or Splitter: This is usually done on an uplink. This allows for monitoring the output from the Tap or Splitter with a protocol analyzer or other diagnostic tool;

Configure a Mirror or Span port to receive a copy of the traffic for analysis: The mirror can be configured to monitor the activity on one or more ports simultaneously, and

Install a shared media hub between the problem device and the switch and then monitor the collision domain with a protocol analyzer or other diagnostic tool.

Unfortunately, given today’s complex switched network design, none of these approaches on their own can cover all troubleshooting requirements.

In most cases, several of them will need to be employed in a typical troubleshooting scenario. Following is an outline of the challenges associated with each approach and some of the considerations to keep in mind.

SNMP: Security can be a challenge in this area. Security enabled at the switch or anywhere along the way may make it impossible talk to the switch in order to obtain statistics.

While most useful information may be obtained from standard MIBs (versus private MIBs), not all switches support standard MIBs. (MIB or Management Information Base is a database that holds network information collected by that particular device.) Also, not all standard MIBs are well implemented by all switches.

Private MIBs on the other hand, do provide a view into new features and functionality that the standard MIBs don’t, such as rate limiting. Unfortunately, private MIB support is not always simple.

The analysis tool requires knowledge of what is potentially captured and where it is stored in the private MIB. Without this understanding it cannot properly display the data or interpret the results.

Keeping the tools updated with the most recent private MIB info can become a significant task in larger management systems and may not even be possible in others.

Tap or splitter: Taps provide a view into both half and full duplex links. However, monitoring a full duplex link requires an expensive two-port analyzer — unless one is willing to see only one side of a conversation at a time.

Not all traffic on a switch passes through the uplink (which is usually where the tap is installed). In addition, installation of a Tap requires that the link be disconnected for a short period of time, which further impacts performance on the network.

When installing a Tap or splitter to analyze the link traffic, it is also important to make sure the Tap does not disconnect the link under test if power is lost to the Tap. If this fault tolerance is not built in to the Tap, then it could be another point of network failure.

Shared media hub: As with the Tap, installing a hub only gives a view of the traffic passing through a single port — not the whole switch. Also, shared media instantly means half duplex. So placing a hub on a full duplex link is likely to result in worse performance than the problem one is already troubleshooting.

Not all hubs are OSI Layer 1 repeaters. As a result, a troubleshooter may not see what they are expecting, especially if the “hub” is really a small and cheap switch itself.

Another thing to consider is that installing the hub means adding two additional points of failure: The hub and another cable. However, once a shared media hub is installed, almost any monitoring tool can be used to troubleshoot the problem, including protocol analyzers.

Mirror or span: In this approach, only traffic from the ports being mirrored will appear on the configured mirror port. Many switches do not permit traffic to be transmitted into the output mirror port, resulting in a listen-only situation.

Operating a mirror will usually reduce the performance of the switch by some amount, which can add to the troubleshooting challenge.

Even if the problem has been identified correctly, switches do not generally forward errors. This means that the forwarding technique of the switch might prevent one from actually seeing the error(s) that are causing the problem they are troubleshooting.

While there are both positive and negative aspects to these four switch troubleshooting techniques, there are no simple “one-stop” alternatives.

The upside is all of those issues that cause problems for the troubleshooting process are also the issues that help to keep the rest of the network running – even when one or more users are experiencing problems. The downside is technology keeps changing and adding fresh challenges to the mix.

There are some new developments in switching technology that are making the troubleshooting process even more complex (but not necessarily insurmountable).

Here are some examples:

Load balancing: Some switches are designed to perform load balancing. This has an impact on troubleshooting because although it may be possible to know where the traffic entered the switch, it is difficult to predict where it should come out. Again, this can be difficult to troubleshoot unless one can check the switch configuration.

VLANs: A simple VLAN configuration assigns a set of ports to be a broadcast domain. To pass traffic between two VLANs on the same switch usually requires a trip to a router, another blade in the switch or an entirely separate device. More complex configurations can assign the VLAN dynamically, based on port, address, or other criteria. In order to effectively troubleshoot the problem, the troubleshooting tool may be used to assume the identity of the problem station.

Virtual high-speed ports: Many switches permit combining several lower speed ports to form a single logical higher speed port. This is sometimes known as an EtherChannel although it is not limited to a single vendor, or a single Ethernet speed. Taps are available to monitor multiple physical links joined into a logical port, but this does require special software and hardware.

Redundancy: Switches inherently offer Spanning Tree as an OSI Layer 2 means of maintaining and managing parallel network paths. If anything goes wrong with how the parallel paths are handled, a broadcast storm often results, bringing network performance on the broadcast domain to a halt.

While there are a variety of troubleshooting issues designed for Spanning Tree problems, the solution is simply to disconnect one of the parallel connections. Finding the problem parallel connection however is the challenge.

the ever-increasing complexity of switched environments, the list of potential problems continues to grow.

As a result, there are a considerable number of challenges that network support staff must overcome. A good start is gaining an awareness of the potential problems from the outset. This is paramount to a successful troubleshooting episode.

Each additional feature that is introduced into switching technology creates a new troubleshooting challenge. None of these challenges are insurmountable, but continued education is critical.

Half the battle is becoming aware of the challenges. Then it becomes a relatively simple matter of selecting the right tool or tools for the problem at hand.

The days of using a single tool to solve most problems are long gone. However, the range of available tools and the suite of features available with those tools are growing in parallel with the increasing catalog of switch features — and problems.

Brad Masterson is Canadian Product Manager for Fluke Networks. Involved in the field of networking and network testing since 1995, he is a Certified Engineering Technologist registered with OACETT and is a member of BICSI

Print this page