Wednesday, October 26, 2011

BGP Route Dampening

Route dampening provides a mechanism to control route instability due to route flapping which continuously generates BGP UPDATE and WITHDRAWN messages on the internetwork. The amount of routing updates can consume considerable network bandwidth and router resources. A route flap occurs when it transitions from the up to the down state – when the prefix is withdrawn.

Routes are categorized as either well-behaved or ill-behaved. A well-behaved route shows a high level of stability over an extended period of time; while an ill-behaved route experiences a high level of instability in a short period of time. An ill-behaved unstable route is penalized and suppressed (not being advertised) until there is some level of confidence that the route has become stable. Note: Route dampening applies upon EBGP routes only.

The recent history of a route is used for estimating future stability. Dampening tracks the number of times a route has flapped over a period of time. A route is assigned a penalty upon each time it flaps. When the penalty reaches a predefined threshold – the suppress limit, the route is suppressed. A route can continue to accumulate penalties even after it is suppressed. The more frequent a route flaps in a short period of time, the faster the route is suppressed.

An algorithm is implemented to decay or reduce the penalty value exponentially in order to unsuppress a route and start readvertising it again. The algorithm bases its configuration on a user-defined set of parameters as below:
Half-life A numeric value that describes the amount of time that must elapse to reduce the penalty by one half. A longer half-life might be desirable for a route that has a habit of oscillating frequently. A larger half-life value would cause the penalty to decay more slowly, in which a route is being suppressed longer.
Suppress limit A numeric value that is compared with the penalty. If the penalty is greater than the suppress limit, the route is suppressed.
Reuse limit A numeric value that is compared with the penalty. If the penalty is less than the reuse limit, a suppressed route that is up will no longer be suppressed.

BGP Route Dampening Penalty Assessment

When BGP is redistributed or injected into an IGP and/or BGP, it is important to ensure that EBGP instability does not affect internal routing and causes a meltdown inside the AS. Flapping routes will be suppressed and prevented from being injected into the AS until they have a level of stability.
Note: Route dampening is only applicable upon the routes received from EBGP peers; it has no effect upon the external routes received from IBGP peers.

Route dampening is often implemented in ISP environments to shield the instabilities that occur inside a customer network from burdening the provider network and the outside world – the Internet. This is not an issue when a provider advertises a customer network as part of an aggregate, which is stable and always advertised even if most of its component more-specific routes are not. When a customer network cannot be aggregated due to multi-homing or it is not being part of the address space of the provider, instabilities will be carried to the outside world.

A possible side effect of route dampening in ISP environments is that a customer will experience some short outages even if his routes have become stable. If administrators are unaware that their routes are being dampened and caused some subnets to be unreachable from the outside world, they might try to resolve the problem by troubleshooting the IGP, resetting BGP sessions, etc; and makes their routes flap even more and become more penalized. The better approach is to contact the provider whether he is receiving the routes, and if he is, check why they are not being advertised. Providers have strict policies and might not change the dampening behavior as per customer request. What the provider can do is flush the history info of the dampened routes to advertise the routes.

The bgp dampening [half-life reuse suppress max-suppress-time | route-map map-name] BGP router subcommand enables BGP route dampening and/or modifies it parameters. The max-suppress-time indicates the maximum of time in minutes a route can be suppressed; when the max-suppress-time is configured, the maximum penalty will never be exceeded, regardless of the number of times that the route dampens. The maximum penalty is computed with the following formula.

The Cisco defaults for the various route dampening variables are as below:
Penalty – 1000 per flap
Suppress limit – 2000
Reuse limit – 750
Half-life – 15 minutes
Maximum suppress time – 60 minutes, or 4 times the half-time (4 x half-time)
Router#sh ip bgp dampening parameters
 dampening 15 750 2000 60 (DEFAULT)
  Half-life time      : 15 mins       Decay Time       : 2320 secs
  Max suppress penalty: 12000         Max suppress time: 60 mins
  Suppress penalty    :  2000         Reuse penalty    : 750

Router#

The process of reducing the penalty happens every 5 seconds. The process of unsuppressing routes happens every 10 seconds.

A route map can be associated with route dampening to selectively apply the dampening parameters if certain criteria are found, eg: matching upon a specific IP prefix, AS_PATH, or community.



Network Setup for BGP Route Dampening

Below implements BGP route dampening with a route map called SELECTIVE_DAMPENING to apply the dampening upon 172.16.2.0/24 only. All other routes will not be dampened upon flapping.
RT2#sh ip bgp
BGP table version is 4, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 172.16.1.0/24    12.12.12.1               0             0 1 i
*> 172.16.2.0/24    12.12.12.1               0             0 1 i
*> 172.16.3.0/24    12.12.12.1               0             0 1 i
RT2#
RT2#debug ip bgp dampening ?
  <1-199>      Access list
  <1300-2699>  Access list (expanded range)
  <cr>

RT2#debug ip bgp dampening 1
BGP dampening debugging is on for access list 1
RT2#
RT2#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
RT2(config)#access-list 1 permit 172.16.2.0 0.0.0.255
RT2(config)#
RT2(config)#ip prefix-list dampening-route permit 172.16.2.0/24
RT2(config)#route-map SELECTIVE_DAMPENING permit 10
RT2(config-route-map)#match ip address prefix-list dampening-route
RT2(config-route-map)#set dampening 15 750 2000 60
RT2(config-route-map)#
RT2(config-route-map)#router bgp 2
RT2(config-router)#bgp dampening route-map SELECTIVE_DAMPENING
RT2(config-router)#end
RT2#
00:01:49: BGP(0): Created dampening structures with halflife time 15, reuse/suppress 750/2000
RT2#
Note: The route-map SELECTIVE_DAMPENING permit 20 command is not required for this scenario.

When BGP receives a withdrawn for a prefix, BGP considers the withdrawn prefix as a flap and increases the penalty by 1000; if BGP receives an attribute change, BGP increases the penalty by 500. BGP keeps the withdrawn prefix in the BGP table as a history entry. Below shows the 1st flap.
RT2#! RT1 shuts Lo2
00:02:47: EvD: charge penalty 1000, new accum. penalty 1000, flap count 1
00:02:47: BGP(0): charge penalty for 172.16.2.0/24 path 1 with halflife-time 15 reuse/suppress 750/2000
00:02:47: BGP(0): flapped 1 times since 00:00:00. New penalty is 1000
RT2#sh ip bgp 172.16.2.0
BGP routing table entry for 172.16.2.0/24, version 5
Paths: (1 available, no best path)
  Not advertised to any peer
  1 (history entry)
    12.12.12.1 from 12.12.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, external
      Dampinfo: penalty 1000, flapped 1 times in 00:00:03
RT2#
00:02:49: EvD: accum. penalty decayed to 1000 after 2 second(s)
00:02:50: EvD: accum. penalty decayed to 1000 after 1 second(s)
RT2#

Below shows the 2nd flap.
RT2#! RT1 no shuts Lo2
00:03:16: EvD: accum. penalty 980, not suppressed
RT2#sh ip bgp 172.16.2.0
BGP routing table entry for 172.16.2.0/24, version 6
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
  Not advertised to any peer
  1
    12.12.12.1 from 12.12.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, external, best
      Dampinfo: penalty 976, flapped 1 times in 00:00:34
RT2#
00:03:21: EvD: accum. penalty decayed to 976 after 5 second(s)
RT2#! RT1 shuts Lo2
00:03:44: EvD: accum. penalty decayed to 961 after 23 second(s)
00:03:44: EvD: charge penalty 1000, new accum. penalty 1961, flap count 2
00:03:44: BGP(0): charge penalty for 172.16.2.0/24 path 1 with halflife-time 15 reuse/suppress 750/2000
00:03:44: BGP(0): flapped 2 times since 00:00:57. New penalty is 1961
RT2#sh ip bgp 172.16.2.0
BGP routing table entry for 172.16.2.0/24, version 7
Paths: (1 available, no best path)
  Not advertised to any peer
  1 (history entry)
    12.12.12.1 from 12.12.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, external
      Dampinfo: penalty 1953, flapped 2 times in 00:01:02
RT2#
00:03:49: EvD: accum. penalty decayed to 1953 after 5 second(s)
00:03:49: EvD: accum. penalty decayed to 1953 after 0 second(s)
RT2#

Below shows the 3rd flap.
RT2#! RT1 no shuts Lo2
00:04:12: EvD: accum. penalty 1923, not suppressed
RT2#sh ip bgp 172.16.2.0
BGP routing table entry for 172.16.2.0/24, version 8
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
  Not advertised to any peer
  1
    12.12.12.1 from 12.12.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, external, best
      Dampinfo: penalty 1923, flapped 2 times in 00:01:29
RT2#
00:04:16: EvD: accum. penalty decayed to 1923 after 4 second(s)
RT2#! RT1 shuts Lo2
00:04:41: EvD: accum. penalty decayed to 1886 after 25 second(s)
00:04:41: EvD: charge penalty 1000, new accum. penalty 2886, flap count 3
00:04:41: BGP(0): charge penalty for 172.16.2.0/24 path 1 with halflife-time 15 reuse/suppress 750/2000
00:04:41: BGP(0): flapped 3 times since 00:01:54. New penalty is 2886
RT2#sh ip bgp 172.16.2.0
BGP routing table entry for 172.16.2.0/24, version 9
Paths: (1 available, no best path)
Flag: 0x820
  Not advertised to any peer
  1 (history entry)
    12.12.12.1 from 12.12.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, external
      Dampinfo: penalty 2874, flapped 3 times in 00:01:59
RT2#
00:04:46: EvD: accum. penalty decayed to 2874 after 5 second(s)
00:04:49: EvD: accum. penalty decayed to 2874 after 3 second(s)
RT2#

Below shows that 172.16.2.0/24 is dampened after it comes back up again after the 3rd flap.
Dampened prefixes are not used in the BGP decision process and not installed into the routing table.
RT2#! RT1 no shuts Lo2
00:05:11: BGP(0): suppress 172.16.2.0/24 path 1 for 00:28:40 (penalty 2829)
00:05:11: halflife-time 15, reuse/suppress 750/2000
00:05:11: EvD: accum. penalty 2829, now suppressed with a reuse intervals of 172
RT2#sh ip bgp 172.16.2.0
BGP routing table entry for 172.16.2.0/24, version 9
Paths: (1 available, no best path)
  Not advertised to any peer
  1, (suppressed due to dampening)
    12.12.12.1 from 12.12.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, external
      Dampinfo: penalty 2818, flapped 3 times in 00:02:30, reuse in 00:06:49
RT2#
00:05:17: EvD: accum. penalty decayed to 2818 after 6 second(s)
RT2#sh ip bgp dampening dampened-paths
BGP table version is 9, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          From             Reuse    Path
*d 172.16.2.0/24    12.12.12.1       00:06:39 1 i
RT2#
00:05:27: EvD: accum. penalty decayed to 2796 after 10 second(s)
RT2#sh ip bgp dampening flap-statistics
BGP table version is 9, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          From            Flaps Duration Reuse    Path
*d 172.16.2.0/24    12.12.12.1      3     00:02:48 00:06:29 1
RT2#
00:05:35: EvD: accum. penalty decayed to 2785 after 8 second(s)
RT2#

Below shows that the route is unsuppressed around 2 round of half-life (2 x 15 minutes, total 30 minutes) to reduce the penalty from 3000  1500, and then from 1500  750.
RT2#
00:34:01: EvD: accum. penalty decayed to 749 after 98 second(s)
00:34:01: EvD: accum. penalty 749, now unsuppressed
00:34:01: BGP(0): Unsuppressed 172.16.2.0/24, path 1
RT2#sh ip bgp 172.16.2.0
BGP routing table entry for 172.16.2.0/24, version 10
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
  Not advertised to any peer
  1
    12.12.12.1 from 12.12.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, external, best
      Dampinfo: penalty 746, flapped 3 times in 00:31:20
RT2#

The history event is an entry used to store route flap information that is important for monitoring and calculating the oscillation level of a route. When the route stabilizes, the history event becomes useless and must be flushed from the router using the clear ip bgp dampening privileged command.

No comments:

Post a Comment