While it can be a complex chore, the key lies with understanding the root of a problem.
April 1, 2004
The key to successful troubleshooting is knowing how the network functions under normal conditions, since it enables a technician to quickly recognize abnormal operation.
Any other approach is little better than a shot in the dark.
While the foundation of good troubleshooting is based on insight, formal training and practical experience, the following information can help shorten the learning curve on isolating and solving network problems.
Before the onsite visit: Technicians can save considerable time and resources by determining ahead of time whether an on-site visit is required. Even with continuous improvement being made in operating system software reliability, “reboot your PC” is still the first step.
Information can also be gathered over the phone with the help of the user. Most users can open a command prompt and report back to the technician the result of an IPCONFIG command.
This tells the technician whether the PC has an appropriate address for the subnet to which it is physically connected.
Have the user attempt to use the network following receipt of a fresh IP address. If the IPCONFIG command reports that the DHCP operation cannot be performed, then the user is probably using a static IP configuration.
If the user has reported a valid IP address, try pinging that address from your desk. If the user’s PC responds, then have the user attempt some other network activity, such as opening a Web page or pinging the local router to verify basic connectivity.
Verifying the problem on site: If a site visit is necessary, it is important to question the user about any action or activity that may have affected network performance, including any recent changes (i.e. moving office furniture or installing a new screen saver).
The next step is to repeat the tests the user performed previously over the telephone.
A successful ping to a network server or off-net device immediately confirms that the workstation has Layer 3 connectivity to the network, which means all lower-layer tests are instantly deemed “not needed”. If Layer 3 connectivity cannot be validated, then troubleshooting must start at the Physical layer–Layer 1.
Extended troubleshooting: Once the inability to log into the network has been verified, the next step is to determine whether the issue relates to the network or the user’s PC. To verify this, the technician must determine whether the cable connecting the client to the network is in place and/or functioning properly.
Solving network problems in a timely, cost-effective manner at this point requires a tool that can quickly verify the status of critical network functionality. Handheld devices exist such as Fluke’s Network Multimeter that can be used to find basic connection problems and confirm critical network operational parameters, in order to eliminate the presence of physical-layer issues before escalating the trouble-ticket to a more senior technician.
In a shared Ethernet environment, when too many stations attempt to transmit simultaneously, performance may suffer dramatically due to collisions.
While the existence of collisions is a normal part of half duplex Ethernet operation, when the number of collisions begins to rise due to increasing traffic, the traffic volume will begin to rise at an increasing level because of the re-transmissions required.
The network will display a performance curve that suddenly “falls off a cliff” as the number of frames sent, collisions, and re-transmitted packets spirals upward at a rapidly-increasing rate.
Be reminded, however, that if connected to a single switch port (not shared media) the only traffic seen may be broadcast frames, which can be very intermittent on low traffic networks.
A switch may operate in full duplex mode, essentially eliminating the shared Ethernet performance drops caused by multiple collisions.
If a link can be established and utilization is reasonable, the user may then press the button corresponding to the ping test to obtain an IP address from the network’s DHCP server.
The failure of either a client’s or the troubleshooting tool’s automatic DHCP configuration could point to a problem with the DHCP relay system.
The process of obtaining a DHCP address demonstrates the viability of the local cable, the local hub or switch port, and the network infrastructure all the way back to the DHCP server. In one simple operation, therefore, most of the nearby network infrastructure has been validated up through Layer 3.
The simple success of a ping indicates that end-to-end Layer 3 connectivity exists between the two devices. The total roundtrip travel time for the request is easily compared to known values to provide a helpful diagnostic for more detailed analysis, if deeper analysis is required.
It is useful to send a series of pings to give the destination multiple opportunities to respond.
Servers outside the enterprise network may also be used as the target for pinging to verify WAN interconnectivity from the client station and local site to a remote site.
If servers within the firewall respond to ping, but those outside the firewall do not, then the source of the problem may be with routers or other aspects of the network boundary infrastructure.
If pings are successful to both external and internal servers, but the client is not receiving those services, it indicates that the problem lies at a level beyond the physical transport.
If these instant tests are unsuccessful or inconclusive, then it is time to look at the network cabling. If the cable tests are successful but the problem continues, then the call should be escalated to a senior level network technician for resolution.
The next step is to trace the cable into the wiring closet and the local hub or switch. This can be simplified by using a tone probe feature for audible tracing, as well as a flash function for locating port links.
If the hub or switch port test is good, then the workstation might be the source of the problem. This can be verified by testing for the presence of link and the speed and duplex settings offered by the NIC.
Remediation procedures at this point can include rebooting and retesting of the link, network and protocol reconfiguration, and address verification.
If all components are in place and properly configured, and the workstation still does not show proper network and application connectivity, it is time to escalate the problem beyond the field technician level.
While troubleshooting can be a complex chore, understanding the root of a problem before escalating it to a more senior level can be instrumental in reducing workload and saving costs.
If a technician can quickly isolate the problem, he or she can then determine next steps and make the decision as to whether it can be resolved at the department or group level. All it really takes is solid groundwork.
Ron Groulx is a product specialist with Fluke Networks Canada. A member of the IEEE, he has been involved in the field of networking since 1997.