Troubleshooting EIGRP
Troubleshooting the
exchange of IGRP or RIP route information is a reasonably simple procedure. Routing updates are either propagated or they are not, and they either contain accurate information or they do not. The added complexity of EIGRP means an added complexity to the troubleshooting procedure. Neighbor tables and adjacencies must be verified, the query/response procedure of DUAL must be followed, and the influences of VLSM on automatic summarization must be considered.
This section's case study describes a sequence of events that typically can be used when pursuing
an EIGRP problem. Following the case study is a discussion of an occasional cause of instabilities in larger EIGRP
internets.
Case Study: A Missing Neighbor
Figure 8.45 shows a small EIGRP
internetwork. Users are complaining that subnet 192.168.16.224/28 is unreachable. An examination of the route tables reveals that something is wrong at router Grissom (Figure 8.46).
The following observations are made from the two route tables of Figure 8.46:
Shepard does not have subnets 192.168.16.40/30 and 192.168.16.224/28 in its route table
, although Grissom does. Grissom's route table
does not contain any of the subnets that should be advertised by Glenn or Shepard. Shepard's route table contains the subnets advertised by Glenn (and Glenn's table contains the subnets advertised by Shepard, although its route table is not included in the figure).
The conclusion to be drawn
from these observations is that Grissom is not advertising or receiving routes correctly over subnet 192.168.16.16/28.
Among the possible causes, the simplest causes should be examined first. These are:
An incorrect interface address or mask An incorrect EIGRP process ID A missing or incorrect network statement
In this case, there are no EIGRP or address configuration errors.
Next, the neighbor tables should be examined. Looking at the neighbor tables at Grissom, Shepard, and Glenn (Figure 8.47), two facts stand out:
Grissom (192.168.16.19) is in its neighbors' tables, but its neighbors are not in Grissom's neighbor table. The entire internetwork has been up for more than five hours; this information is reflected in the uptime statistic for all neighbors except Grissom. However, Grissom's uptime shows approximately one minute.
If Grissom is in Shepard's neighbor table, Shepard must be receiving Hellos from it. Grissom, however, is apparently not receiving Hellos from Shepard. Without this two-way exchange of Hello packets, an adjacency will not be established
and route information will not be exchanged.
A closer examination of
Shepard's and Glenn's neighbor tables reinforces this hypothesis:
The SRTT for Grissom is 0, indicating that a packet has never made the round-trip. The RTO for Grissom has increased to five and eight seconds, respectively. There is a packet enqueued for Grissom (Q Cnt). The sequence number recorded for Grissom is 0, indicating that no reliable packets have ever been received from it.
These factors indicate that the two routers are trying to send a packet reliably to Grissom, but are not receiving an ACK.
In Figure 8.48,
debug eigrp packets is used at Shepard to get a better look at what is happening. All
EIGRP packet types will be displayed, but a second debug command is used
with it:
debug ip eigrp neighbor
75 192.168.16.19. This command adds a filter to the first command. It tells debug eigrp packet to display only IP packets of EIGRP 75 (the process ID of the routers in Figure 8.45) and only those packets that concern neighbor 192.168.16.19 (Grissom).
Figure 8.48 shows that Hello packets are being received from Grissom. It also shows that Shepard is attempting to send updates to Grissom; Grissom is not acknowledging them. After the 16th retry, the message "Retransmission retry limit exceeded" is displayed. This exceeded limit accounts for the low uptime shown for Grissom in the neighbor tables—when the retransmission retry limit is exceeded, Grissom is removed from the neighbor table. But because Hellos are still being received from Grissom, it quickly reappears in the table and the process begins again.
Figure 8.49 shows the output from
debug eigrp neighbors at Shepard. This command is not IP specific, but instead shows EIGRP neighbor events. Here, two instances of the events described in the previous paragraph are displayed: Grissom is declared dead as the retransmission limit is exceeded but is immediately "revived" when its next Hello is received.
Although Figure 8.48 shows that update packets are being sent to Grissom, observation of EIGRP packets at that router show that they are not being received (Figure 8.50). Because Grissom is successfully exchanging Hellos with Cooper, Grissom's EIGRP process must be working. Suspicion therefore falls on
Grissom's Ethernet interface. An inspection of the configuration file shows that an access list is configured as an
incoming filter on E0:
interface Ethernet0 ip address 192.168.16.19 255.255.255.240 ip access-group 150 in ! ! access-list 150 permit tcp any any established access-list 150 permit tcp any host 192.168.16.238 eq ftp access-list 150 permit tcp host 192.168.16.201 any eq telnet access-list 150 permit tcp any host 192.168.16.230 eq pop3 access-list 150 permit udp any any eq snmp access-list 150 permit icmp any 192.168.16.224 0.0.0.15
When EIGRP packets are
received at Grissom's E0 interface, they are first filtered through access list 150. They will not match any entry on the list and are therefore being dropped. The problem is resolved (Figure 8.51) by adding the
following entry to the access list:
access-list 150 permit eigrp 192.168.16.16 0.0.0.15 any
Stuck-in-Active Neighbors
When a route goes
active and queries are sent to neighbors, the route will remain active until a reply is received for every query. But what happens if a neighbor is dead or otherwise incapacitated and cannot reply? The route would stay permanently active. The active timer is designed to prevent this situation. The timer is set when a query is sent. If the timer expires before a reply to the query is received, the route is declared stuck-in-active, the neighbor
is presumed dead, and it is flushed from the neighbor table. The SIA route and any other routes via that neighbor are eliminated from the route table. DUAL will be satisfied by considering the neighbor to have replied with an infinite metric.
In reality, this sequence of events should never happen. The loss of Hellos should identify a disabled neighbor long before the active timer expires.
But what happens in large EIGRP networks where a query might, like the bunny in the battery advertisement, keep going and going? Remember that queries cause the diffusing calculation to grow larger, whereas replies cause it to grow smaller (refer to Figure 8.10). Queries must eventually reach the edge of the internetwork, and replies must eventually begin coming back, but if the diameter of the diffusing calculation grows large enough, an active timer may expire before all replies are received. The result, flushing a legitimate neighbor from the neighbor table, is obviously destabilizing.
When neighbors mysteriously disappear from neighbor tables and then reappear, or users complain of intermittently unreachable destinations, SIA routes may be the culprit. Checking the error logs of routers is a good way to find out whether SIAs have occurred (Figure 8.52).
When chasing the cause of SIAs, close attention should be paid to the topology table in routers. If routes can be "caught" in the active state, the neighbors from whom queries have not yet been received should be noted. For example, Figure 8.53 shows a
topology table in which several routes are active. Notice that
most of them have been active for 15 seconds and that one (10.6.1.0) has been active for 41 seconds.
Notice also that in each case, the neighbor 10.1.2.1 has its reply status flag (r) set. That is the neighbor from which replies have not yet been received. There may be no problem with the neighbor itself or with the link to the neighbor, but this information points to the direction within the internetwork topology in which the investigation should proceed.
Common causes of SIAs in larger EIGRP internetworks are heavily congested, low-bandwidth data links and routers with low memory or overutilized CPUs. The problem will be exacerbated if these limited resources must handle very large numbers of queries.
The careless adjustment of the bandwidth parameter on interfaces may be another cause of SIAs. Recall that EIGRP is designed to use no more than 50% of the available bandwidth of a link. This restriction means that EIGRP's pacing is keyed to the
configured bandwidth. If the bandwidth is set artificially low in an attempt to
manipulate routing choices, the EIGRP process may be starved. If IOS 11.2 or later is being run, the command
ip bandwidth-percent eigrp may be used to adjust the percentage of bandwidth used.
Note
Changing the percentage of bandwidth used by EIGRP
For example, suppose that an interface is connected to a 56K serial link, but the bandwidth is set to 14K. EIGRP would limit itself to 50% of this amount, or 7K. The following commands adjust the EIGRP bandwidth percent to 200%—200% of 14K, which is 50% of
the actual bandwidth of the 56K link:
interface Serial 3 ip address 172.18.107.210 255.255.255.240 bandwidth 14 ip bandwidth-percent eigrp 1 200
Increasing the active timer
period with the
timers active-time command may help avoid SIAs in some situations, but this step should not be taken without careful consideration of the effects it may have on reconvergence.
A good internetwork design is the best solution to instabilities such as SIA routes. By using a combination of intelligent address assignment, route filtering, default routes, and summarization, boundaries may be constructed in a large EIGRP internetwork to restrict the size and scope of diffusing computations. Chapter 13, "Route Filtering,"
includes an example of such a
design.
|