You’re troubleshooting a routing problem; you check BGP and the neighbor shows as active. Great, let’s move on and look somewhere else.
Right now a good proportion of you should be shouting at your screen – and with good reason.
Cisco IOS Basics
When we are first taught about Cisco ACLs, we’re taught the dangers of assuming that adding “no” in front of the command will just remove a single line. For example, pretty much every Cisco engineer out there will know that the end result of these commands:
access-list 1 permit host 220.127.116.11 access-list 1 permit host 18.104.22.168 access-list 1 permit host 22.214.171.124 ! no access-list 1 permit host 126.96.36.199
…will be that access-list 1 is totally deleted. If you didn’t know that, then thank goodness you do now.
In a similar vein, I’ve recently seen a few people miss something really critical about BGP session status that I think we’re all taught pretty early on (probably in CCNA-level classes) that I just kind of assumed everybody knew about – just like the access-list problem above. However, since this was missed in the heat of troubleshooting – unfortunately probably the time it was most critical not to miss it – I thought I’d share a reminder here.
Active BGP Sessions
The command typically used to check BGP neighbor status is not, as you might expect, “show ip bgp neighbors”. That does work, but it’s a lot of output to slog through, especially if you have a lot of peers. It’s much quicker instead to use “show ip bgp summary” which abbreviates nicely to “sh ip bgp sum”.
So let’s say we’re troubleshooting a routing problem discovered a few minutes ago, we check BGP, and this is what we see:
R1#sh ip bgp sum BGP router identifier 188.8.131.52, local AS number 1 BGP table version is 9, main routing table version 9 Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 184.108.40.206 4 1 5 4 0 0 0 00:05:07 Active
No problem – the neighbor relationship has been active for the last 5 minutes, so everything should be cool.
NO NO NO NO NO!
Read the output again:
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 220.127.116.11 4 1 5 4 0 0 0 00:05:07 Active
On the right hand site, the state is showing as ‘Active’. That means it’s configured and it has been trying to connect to the neighbor for the last 5 minutes and 7 seconds, but it has not yet managed to do so. When it does connect, the output is going to change so that rather than telling you the state of the connection, it’s going to tell you how many prefixes were received (and accepted):
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 18.104.22.168 4 1 11 9 13 0 0 00:01:03 4
You’ll also notice that the TblVer (table version) in the previous output was “0″ – also a bad sign. This is simple stuff – if you don’t have a number on the right hand side, the connection is DOWN. Not ‘admin down’ mind you – that looks like this:
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 22.214.171.124 4 1 11 9 0 0 0 00:00:13 Idle (Admin)
It’s a really simple thing, and I suspect most of us know this well. However, it’s such an easy thing to overlook when you’re stressed because a network is down, another mention can’t hurt, right? Don’t fall into the trap of thinking that “Active” = “Working fine”.
RFC Mumbo Jumbo
You’re probably thinking “Well then, why didn’t Cisco change it to say ‘not working’ then?” And that’s a reasonable question, but the reality is that they are displaying the current State of the BGP Finite State Machine, as defined in RFC4271. Those states are:
We’ve seen Idle and Active in the command output above, and when a session is Established, Cisco shows the number of prefixes instead. It is possible – but unlikely – to see the other states showing in the command output simply because sessions tend to establish so quickly, you’d be unlikely to catch them at the right moment:
R1#sh ip bgp sum BGP router identifier 126.96.36.199, local AS number 1 BGP table version is 9, main routing table version 9 Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 188.8.131.52 4 1 5 4 0 0 0 00:00:35 OpenSent
So that’s why it says Active even though it isn’t active in the sense we would hope. Blame the BGP FSM, why don’t you.
Meanwhile, stay vigilant, and pay attention to that sneaky State column