Thursday, August 11, 2016

New OSPF Authentication Warning Messages

Network Setup for OSPF Authentication Warning Messages

Note: The passwords for the OSPF simple password authentication configured using the ip ospf authentication-key {passwd} interface subcommand on both routers are different by purpose. We can see that the routers can still establish an OSPF adjacency.

Cisco IOS 15.4(3)M release starts to support the new feature – OSPFv2 Cryptographic Authentication (RFC 5709 – OSPFv2 HMAC-SHA Cryptographic Authentication).

Starting with Cisco IOS 15.4(3)M release, OSPF notifies about OSPF authentication misconfiguration issues with the %OSPF-4-INVALIDKEY and %OSPF-4-NOVALIDKEY error messages.
RT1#sh ver | in IOS|Compiled
Cisco IOS Software, C1900 Software (C1900-UNIVERSALK9-M), Version 15.4(3)M, RELEASE SOFTWARE (fc1)
Compiled Mon 21-Jul-14 17:38 by prod_rel_team
RT1#
09:44:59: %LINK-3-UPDOWN: Interface GigabitEthernet0/0, changed state to up
09:45:00: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0, changed state to up
09:45:08: %OSPF-4-INVALIDKEY: Key ID 0 received on interface GigabitEthernet0/0
09:45:37: %OSPF-4-NOVALIDKEY: No valid authentication send key is available on interface GigabitEthernet0/0
09:45:44: %OSPF-5-ADJCHG: Process 100, Nbr 10.10.10.2 on GigabitEthernet0/0 from LOADING to FULL, Loading Done
09:46:15: %OSPF-4-INVALIDKEY: Key ID 0 received on interface GigabitEthernet0/0
09:46:44: %OSPF-4-NOVALIDKEY: No valid authentication send key is available on interface GigabitEthernet0/0
09:47:21: %OSPF-4-INVALIDKEY: Key ID 0 received on interface GigabitEthernet0/0
09:47:51: %OSPF-4-NOVALIDKEY: No valid authentication send key is available on interface GigabitEthernet0/0
RT1#
RT1#sh ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.10.10.2        1   FULL/DR         00:00:39    10.10.10.2      GigabitEthernet0/0
RT1#
RT1#sh ip ospf int gi0/0
GigabitEthernet0/0 is up, line protocol is up
  Internet Address 10.10.10.1/24, Area 0, Attached via Network Statement
  Process ID 100, Router ID 10.10.10.1, Network Type BROADCAST, Cost: 1
  Topology-MTID    Cost    Disabled    Shutdown      Topology Name
        0           1         no          no            Base
  Transmit Delay is 1 sec, State BDR, Priority 1
  Designated Router (ID) 10.10.10.2, Interface address 10.10.10.2
  Backup Designated router (ID) 10.10.10.1, Interface address 10.10.10.1
  Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5
    oob-resync timeout 40
    Hello due in 00:00:05
  Supports Link-local Signaling (LLS)
  Cisco NSF helper support enabled
  IETF NSF helper support enabled
  Index 1/1, flood queue length 0
  Next 0x0(0)/0x0(0)
  Last flood scan length is 1, maximum is 1
  Last flood scan time is 0 msec, maximum is 0 msec
  Neighbor Count is 1, Adjacent neighbor count is 1
    Adjacent with neighbor 10.10.10.2  (Designated Router)
  Suppress hello for 0 neighbor(s)
  Cryptographic authentication enabled
      No key configured, using default key id 0
RT1#


Basically, the warning messages are due to a configuration error. The ip ospf authentication message-digest interface subcommand enables the MD5 authentication; however, the ip ospf authentication-key {passwd} interface subcommand defines a key for the simple password authentication, not for the MD5 authentication.

As a result, MD5 authentication is activated but no key is defined for it; an implicit empty / null key with the ID of 0 is being used for the authentication. That is also what the logging messages say. The OSPF adjacencies formed on the routers because they are all authenticated using the same empty / null key.

Tuesday, August 4, 2015

Cisco Prime Infrastructure 2.x False Alarms on High Memory Utilization

Let's first have a look upon the hardware specification of a Cisco Catalyst 3750X series switch.
Based upon the following info, we can know that the DRAM size of a WS-C3750X-48P-L is 256MB.

http://www.cisco.com/c/en/us/products/collateral/switches/catalyst-3750-x-series-switches/data_sheet_c78-584733.html

Let's have a look upon the output of show version and show memory statistics commands from a WS-C3750X-48P-L with an uptime of 3 minutes.
Switch#show version | include IOS|Compiled|uptime|of memory
Cisco IOS Software, C3750E Software (C3750E-UNIVERSALK9-M), Version 12.2(55)SE10, RELEASE SOFTWARE (fc2)
Compiled Wed 11-Feb-15 11:17 by prod_rel_team
Switch uptime is 3 minutes
cisco WS-C3750X-48P (PowerPC405) processor (revision A0) with 262144K bytes of memory.
Switch#
Switch#show memory statistics
                Head    Total(b)     Used(b)     Free(b)   Lowest(b)  Largest(b)
Processor    3EC3C38   187620168    44631960   142988208   142471592   126184352
      I/O    E000000    16777216    10787060     5990156     5990156     5979600
Driver te    2400000     4194304          44     4194260     4194260     4194260
Switch#

From the output of the show version command, 256MB of DRAM checked OK.
From the output of the show memory statistics command, we can see that the total of Processor memory is 188MB (187,620,168 bytes), which is the biggest portion of the total memory of the switch - 256MB.
From the output of the show memory statistics command, we can also see that:
  • The utilization of the Processor memory = 44631960 / 187620168 x 100 = 23%
  • The utilization of the I/O memory = 10787060 / 16777216 x 100 = 64%
The formula for calculating the utilization percentage is Used / Total x 100.


According to a Cisco Live presentation titled BRKCRS-3141 Troubleshooting Cisco Catalyst 2960, 3560 and 3750 Series Switches, there are 2 types of memory:
  • Processor memory is the memory used by Cisco IOS (the operating system of Cisco routers and switches).
  • I/O memory is used for traffic sent to the CPU.
    I/O memory is not used for normal packet switching - the forwarding of end user traffic (a.k.a the data plane).
    I/O memory is used for the packets bound to the CPU of the Cisco device, eg: CDP packets, STP packets, OSPF packets, EIGRP packets, etc. (a.k.a the control plane).
FYI we can’t tune the I/O memory allocation for Cisco Catalyst switches as like Cisco routers, in which the memory-size iomem {i/o-memory-percentage} command is available on Cisco routers but not on Cisco Catalyst switches.


Now let's have a look on the Top N Memory Utilization graph from Cisco Prime Infrastructure 2.2.

Looks pretty scary and worrying, because the average memory utilization exceed 75% and Alloy Orange color.

Now let's have a look on a bug / caveat - CSCuo31707.

The bug description tells us that:
    Cisco Prime Infrastructure (PI) Version 2.0, has a known issue that the Top N Memory Utilization dashlet, which we just saw now, is having false alarms of always seeing 100% utilization for Cisco IOS-XR devices.
    The Top N Memory Utilization dashlet should shows the actual memory utilization of the Cisco IOS-XR device, using the Processor memory, which is the actual and real status of the memory of the Cisco IOS-XR device.
    For Cisco IOS devices, it is a known issue that the Cisco Prime Infrastructure showing false alarms of high memory utilization by showing the utilization of I/O memory, but instead it should show the actual memory utilization by referring to the utilization of Processor memory.

We can see that the status of the bug / caveat is still Open.
There is not known fixed software releases yet.
The severity of the bug / caveat is 6 Enhancement.


I see that Cisco Prime Infrastructure showing high memory utilization for Cisco Catalyst switches, based on the I/O memory, but not the processor memory, is actually a very serious false alarm problem.
Monitoring something wrongly, and yet reporting problems for that, making people hoo-hah, making me to explain to people again and again...

Imaging you bought a nice and luxury car, the car temperature gauge shows Red at 80% after drove for 10 minutes.
You then quickly stop the car at a safe place, and tow the car to the service center.

The mechanic tells you...
Mechanic: Hi Mr. Customer, that is not something to worry about, it is a software problem in the dashboard system, the actual car temperature is only 40%, although it shows 80% on the dashboard.
You: When will the software fix be available?
Mechanic: Sorry Mr. Customer, because this is a Severity 6 cosmetic bug, which classified as Enhancement, no date is committed yet. Maybe our programmers will start to look into this after resolved all other S1, S2, S3, S4, and S5 bugs.
You: Oh well, how can I know the actual car temperature for the time being?
Mechanic: Sorry Mr. Customer, no effective workaround available at the moment. Perhaps when you see smoke coming out, which most probably means the car is overheated.
You: ...


I came across this bug / caveat back on Nov/2014, below shows the status of the same bug / caveat when I accessed it back on Nov/2014.
We can see that Cisco knows this problem at least since 20/Oct/2014.
How difficult to change the coding for memory utilization monitoring to refer to Processor memory instead of I/O memory?
Now is Aug/2015, Cisco PI 2.2 already available, how long more do I have to wait?!?
Cisco, are you serious about getting network monitoring done right?

Saturday, January 17, 2015

Packet Loss and TCP Retransmissions

Seeing TCP retransmissions in network capture traces are a very common problem.
TCP retransmissions generally are not considered a good sign, as they happen most probably due to some packets being dropped; they are not always a bad sign, nor are they always the cause of an application slowness problem.

Packet drops can happen due to many reasons.
Network protocol designers and implementers deal with packet drop problems by coming up with many TCP recovery algorithms and mechanisms to quickly recover the lost packets in order improve the overall response time for upper-layer protocols and applications.
Effective and efficient TCP recovery algorithms and mechanisms are like good shock absorbers.
With good shock absorbers, the passengers in a car would feel calm and steady when the car passing through bumpy roads.
In layman-term, we know that the roads outside can never be 100% smooth and even and bumpy (packet drops can easily happen), and therefore people are making good shock absorbers (good TCP recovery algorithms and mechanisms) to make people feel comfortable inside the cars.

Some of the RFCs related to TCP recovery are as below:
  • RFC 2018 - TCP Selective Acknowledgment Options
  • RFC 2988 - Computing TCP's Retransmission Timer
  • RFC 2011 - TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms
The 1st screenshot below shows you how ideally a good TCP recovery mechanism works upon a packet drop scenario.
In this particular scenario, there are:
  • 8 x TCP Previous segment not captured are reported (which generally means 8 packets or more are dropped). Packet#2996, Packet#3001, Packet#3006, Packet#3010, Packet#3016, Packet#3022, Packet#3055, Packet#3061.
  • 13 x TCP Retransmission are reported. Packet#3072, Packet#3074, Packet#3077, Packet#3078, Packet#3079, Packet#3080, Packet#3085, Packet#3086, Packet#3087, Packet#3091, Packet#3092, Packet#3093, Packet#3094.
  • 36 x TCP Dup ACK are reported.
8 or more packets dropped.
Sounds like a bumpy road isn't it?
Does it caused slowness issues upon the application? (TCP/1433 - MS SQL in this case)

The answer is no.
Because we can see that packets traverse across the bumpy road starting on 19:35:27.405, and the road started to be smooth again on 19:35:27.411.
That is 6ms.
1 second consists of 1000ms.
The user would not feel slowness impact upon this.



Now let's have a look on how a lousy TCP recovery mechanism is being implemented, which means that it is unable to recovery lost packets due to packet drops in a fast manner, and has caused slowness impact upon the upper-layer application and eventually the user experience.



In this particular scenario, there are:
  • 1 x TCP Previous segment not captured is reported. (Packet #17655).
  • 1 x TCP Retransmission is reported. Packet#17999.
  • 3 x TCP Dup ACK are reported. Packet#17661, Packet#17662, Packet#18012.
That particular TCP retransmission confirmed due to a packet dropped across a Juniper NetScreen firewall, due to the TCP Sequence Number Checking security feature.

Whether to see TCP Previous segment not captured messages in Wireshark depends whether the packet trace files is captured near to the Sender or the Receiver. Assuming a packet is dropped along the path from the Sender to the Receiver, the packet would be seen in the packet trace file captured near the Sender, and the packet would not be seen in the packet trace file captured near the Receiver, in which Wireshark will flag the next packet arrived upon the Receiver with a TCP Previous segment not captured message.

In this particular scenario, the TCP retransmission process, as part of the TCP recovery mechanism for a single packet drop, has taken 329ms.

The packet drops issue happening frequently and consistently due to the firewall security feature.
Assuming a TCP recovery process took 333ms, generally it can take up to 1 second to recover 3 packet drops.

To answer why lousy TCP recovery implementations can still found on modern networks?
Most probably this particular application (MQ on IBM AS/400) are designed for LAN environments, which assuming packet loss are to be very minimal, and also the TCP/IP stack for the operating system is not being stress-tested in high packet loss environment.

For this particular case, disabling this particular feature using the set flow no-tcp-seq-check command resolved the packet drops issues, and eventually the application slowness issues, and end users are happy. :-)

Take Home Lesson:
Packet loss always happens.
If the TCP recovery upon packet loss occurs quickly enough, users would not feel anything.

Monday, December 15, 2014

Cisco Bug Toolkit Inconsistent Info

For CSCtr19078, the Cisco Bug Toolkit mentions that the known fixed release is only 15.0(1)M7.2.


However in the Release Notes for Cisco IOS Release 15.4M&T (http://www.cisco.com/c/en/us/td/docs/ios/15_4m_and_t/release/notes/15_4m_and_t/154-3MCAVS.html), CSCtr19078 is also being resolved in Cisco IOS Release 15.4(3)M (released on 22/Jul/2014).

Tuesday, March 11, 2014

Cisco WCS7.0.240.0 httpd version

[root@localhost ~]# cd /opt/WCS7.0.240.0/webnms/apache/bin
[root@localhost bin]# pwd
/opt/WCS7.0.240.0/webnms/apache/bin
[root@localhost bin]# 
[root@localhost bin]# ./httpd -v
./httpd: error while loading shared libraries: libaprutil-1.so.0: cannot open shared object file: No such file or directory
[root@localhost bin]# 
[root@localhost bin]# ldd httpd
        linux-gate.so.1 =>  (0x00ba4000)
        libz.so.1 => /lib/libz.so.1 (0x00abe000)
        libm.so.6 => /lib/libm.so.6 (0x00a8c000)
        libaprutil-1.so.0 => not found
        libexpat.so.0 => /lib/libexpat.so.0 (0x00d81000)
        libapr-1.so.0 => not found
        libuuid.so.1 => /lib/libuuid.so.1 (0x03c59000)
        librt.so.1 => /lib/librt.so.1 (0x00aef000)
        libcrypt.so.1 => /lib/libcrypt.so.1 (0x042b0000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00ad3000)
        libdl.so.2 => /lib/libdl.so.2 (0x00ab7000)
        libc.so.6 => /lib/libc.so.6 (0x00930000)
        /lib/ld-linux.so.2 (0x00911000)
[root@localhost bin]# 
[root@localhost bin]# find / -name libapr*
/opt/WCS7.0.240.0/webnms/apache/lib/libapr-1.so.0
/opt/WCS7.0.240.0/webnms/apache/lib/libapr-1.so.0.4.5
/opt/WCS7.0.240.0/webnms/apache/lib/libapr-1.a
/opt/WCS7.0.240.0/webnms/apache/lib/libapr-1.so
/opt/WCS7.0.240.0/webnms/apache/lib/libaprutil-1.so
/opt/WCS7.0.240.0/webnms/apache/lib/libapr-1.la
/opt/WCS7.0.240.0/webnms/apache/lib/libaprutil-1.la
/opt/WCS7.0.240.0/webnms/apache/lib/libaprutil-1.so.0
/opt/WCS7.0.240.0/webnms/apache/lib/libaprutil-1.so.0.3.12
/opt/WCS7.0.240.0/webnms/apache/lib/libaprutil-1.a
[root@localhost bin]# 
[root@localhost bin]# cd /etc/ld.so.conf.d/
[root@localhost ld.so.conf.d]# echo /opt/WCS7.0.240.0/webnms/apache/lib > httpd-lib.conf
[root@localhost ld.so.conf.d]# rm /etc/ld.so.cache
rm: remove regular file `/etc/ld.so.cache'? y
[root@localhost ld.so.conf.d]# 
[root@localhost ld.so.conf.d]# /sbin/ldconfig
/sbin/ldconfig: /opt/WCS7.0.240.0/webnms/apache/lib/libaprutil-1.so.0 is not a symbolic link

/sbin/ldconfig: /opt/WCS7.0.240.0/webnms/apache/lib/libapr-1.so.0 is not a symbolic link

[root@localhost ld.so.conf.d]# 
[root@localhost ld.so.conf.d]# /opt/WCS7.0.240.0/webnms/apache/bin/httpd -v
Server version: Apache/2.2.21 (Unix)
Server built:   Sep 20 2011 11:25:33
[root@localhost ld.so.conf.d]# 

Monday, July 22, 2013

ACS 5.x RADIUS External Identity Store Identity Caching + AAA Authorization

When configuring a RADIUS identity server as an external identity store, you may face AAA authorization problem on the Cisco IOS and/or NX-OS AAA clients.

Enable the Identity Caching for the external RADIUS identity server to solve the problem.


Below shows the impact upon disabling the Identity Caching feature.

Friday, July 19, 2013

ACS 4.2 NAS-IP-Address + ACS 5.4 Client-IP-Address + FreeRADIUS huntgroups


Basic FreeRADIUS huntgroups configuration
Add the following configuration to the bottom of the corresponding configuration files.
/etc/raddb/clients.conf
client ACS4.2 {
   ipaddr = 192.168.18.51
   secret = rad456
}

client ACS5.4 {
   ipaddr = 192.168.18.61
   secret = rad456
}
/etc/raddb/users
raduser1 Cleartext-Password := "cisco123", Huntgroup-Name == "DEFAULT-HUNT"
/etc/raddb/huntgroups
DEFAULT-HUNT NAS-IP-Address == 192.168.18.51
DEFAULT-HUNT Client-IP-Address == 192.168.18.61


Cisco ACS 4.2 > FreeRADIUS RADIUS Access-Request Packet:





Cisco ACS 5.4 > FreeRADIUS RADIUS Access-Request Packet:

Monday, July 15, 2013

Installating FreeRADIUS 2.1.12 on Red Hat Enterprise Linux 5.8 (32-bit)

1. Insert the RHEL/5.8 i386 DVD.
2. Issue the following commands in sequence.
rpm -vhU /media/RHEL_5.8\ i386\ DVD/Server/libtool-ltdl-1.5.22-7.el5_4.i386.rpm
rpm -vhU /media/RHEL_5.8\ i386\ DVD/Server/freeradius2-2.1.12-3.el5.i386.rpm

3. Command outputs:
[root@localhost /]# rpm -vhU /media/RHEL_5.8\ i386\ DVD/Server/libtool-ltdl-1.5.22-7.el5_4.i386.rpm 
warning: /media/RHEL_5.8 i386 DVD/Server/libtool-ltdl-1.5.22-7.el5_4.i386.rpm: Header V3 DSA signature: NOKEY, key ID 37017186
Preparing...                ########################################### [100%]
   1:libtool-ltdl           ########################################### [100%]
[root@localhost /]# 
[root@localhost /]# rpm -vhU /media/RHEL_5.8\ i386\ DVD/Server/freeradius2-2.1.12-3.el5.i386.rpm 
warning: /media/RHEL_5.8 i386 DVD/Server/freeradius2-2.1.12-3.el5.i386.rpm: Header V3 DSA signature: NOKEY, key ID 37017186
Preparing...                ########################################### [100%]
   1:freeradius2            ########################################### [100%]
[root@localhost /]# 
[root@localhost ~]# service radiusd start
Starting RADIUS server:                                    [  OK  ]
[root@localhost ~]# 
[root@localhost /]# radiusd -v | grep Version
radiusd: FreeRADIUS Version 2.1.12, for host i386-redhat-linux-gnu, built on Jan  5 2012 at 18:30:57
[root@localhost /]#