How to Troubleshoot Spanning-Tree Protocol

In most networks, the best Spanning-Tree Protocol topology is determined as part of the network blueprint. The Spanning-Tree Protocol is implemented by configuring Spanning-Tree Protocol priority and cost values. Several things could go wrong.

You can expect something, but your switches can give you something different. Situations also occur when the Spanning-Tree Protocol was not measured in the network planning and implementation or was measured or implemented before the network’s growth and change.

In these situations, it is important to analyze the actual Spanning-Tree Protocol topology in the operational network to troubleshoot the Spanning-Tree Protocol. The steps for analyzing a spanning tree are the following:-

  1. In the first step, find the layer 2 topology. If it was prepared previously, consult the network documentation for the topology.
  2. Use the “show cdp neighbors” command to help find the layer 2 topology.
  3. When Layer 2 topology is discovered, use spanning-tree protocol knowledge to resolve the predictable Layer 2 path.
  4. It is also important to know the root bridge. Use theshow spanning-tree vlan <vlan_id >command to resolve which switch is the root bridge.
  5. Use the show spanning-tree vlan <vlan_id > command on all switches to find the port state and confirm your expected Layer 2 path.

Expected Topology vs Actual Topology

Comparing the network’s actual state against its expected state and spotting the differences can help in troubleshooting the problem. A network administrator can examine the switches, resolve the actual topology, and recognize the superior spanning-tree topology.

Overview of Spanning Tree Status

The overview of the spanning tree plays an important role in troubleshooting. For an overview, we can use the “show spanning-tree” command without specifying any additional options to provide a quick overview of the status of STP for all VLANs.

We can limit the command’s output by specifying a particular VLAN. The command syntax for specifying a VLAN is “show spanning-tree vlan vlan_id.” The command output will display information about the role and status of each port on the switch.

The port role and state can be Root, Designated, alternate, etc. The command’s output also provides information about the bridge ID of the local switch, including the bridge ID of the root bridge.

Spanning Tree Failure Consequences

Two types of failure can occur with STP. In the first problem, the STP may block the wrong port planned in the forwarding state. This problem might be caused by lost traffic that would normally pass through this switch, but other networks remain unaffected.

The second type of failure is much more troublemaking, as shown in the Figure below. It happens when the Spanning Tree Protocol wrongly moves one or more ports into the forwarding state.

Troubleshooting Spanning-Tree Protocol

Recall that an Ethernet frame header does not contain a TTL field, so any frame that enters a bridging loop remains continuous, forwarding from switch to switch indefinitely.

The frames that have their destination address recorded in the MAC address table of the switches are simply forwarded to the port associated with the MAC address and do not enter a loop. But, any frame flooded by a switch enters the loop. The flooded traffic may include broadcasts, multicasts, and unicasts with a globally unknown destination MAC address.

What is the sign of STP failure? The load on all links starts increasing as more frames enter the loop. The frames also affect other links in the switched network because they flood the links. If the failure is on a single VLAN, then only the corresponding VLAN is affected. Switches and trunks that are not related to this VLAN operate normally.

The spanning-tree failure can create bridging loops. In this case, traffic is increasing exponentially, and the switches will flood the broadcasts out multiple ports. This creates copies of the frames each time the switches forward them.

When traffic like OSPF or EIGRP hello packets enters the loops, the devices running these protocols quickly become overloaded. Their CPUs quickly reach 100 percent utilization.

The network switches to change the MAC address table frequently. If a loop exists, a switch may see a frame with a particular source MAC address received on one port and another with the same source MAC address on a different port.

So, the switch will update the MAC address table twice for the same MAC address. Due to the high load and maximum CPU utilization, these devices become unreachable, making troubleshooting very difficult.

Repairing a Spanning Tree Problem

The first method of resolving the problem is to remove redundant links in the switched network. The redundant link can be removed both physically and through configuration.

When the loops are removed and broken, traffic and CPU loads should quickly return to normal levels, and connectivity to devices should also be restored.

This restores the network troubleshooting, but this is not the end of the troubleshooting process. Because all redundant paths have been removed from the network, it needs to restore the redundant links.

If the problem of the spanning tree failure has not been fixed, there is a chance that a new broadcast storm will be triggered again during the restoration of the redundant links. So, before restoring the redundant links, find out and correct the original fault.