Friday, March 30, 2012

Protocol Independent Multicast (PIM)

A router performs routing table lookup upon the destination address and forwards a unicast packet out the appropriate interface. However, a router may have to forward a multicast packet out to multiple interfaces. PIM is commonly implemented on multicast routers to dynamically build the distribution trees that determine the paths to deliver the multicast traffic to all receivers.

Multicast routers consider both the source and destination addresses of a multicast packet and use the distribution tree to forward the packet away from the source toward the destination. Below describes the 2 types of distribution trees:
  • Source Tree – created for each source that is sending traffic to each multicast group. It has its root at the source and has branches throughout the network to the receivers. Source trees are also known as source-routed or shortest path trees (SPTs) as the tree takes a direct and shortest path from the source to its receivers.
  • Shared Tree – a single tree that is shared between all sources for each multicast group. A shared tree has a single common root known as the Rendezvous Point (RP). Sources initially send their multicast packets to the RP, which in turn forwards data through a shared tree to the members of the group.
Source Distribution Tree

The figure above shows 2 source trees between Source 1, Receiver 1, and Receiver 2; as well as between Source 2, Receiver 1, and Receiver 2. The path between the source and receivers is the path with the lowest cost. Packets are forwarded according to the source and group address pair along the tree. The forwarding state associated with the source tree is identified using the notation (S, G) (pronounced as “S comma G”), where S is the IP address of the source and G is the multicast group address. A separate unique tree is built for every source S sending to group G.

Shared Distribution Tree

The figure above shows a shared distribution tree. RT3, the RP, is the root of the shared tree. The tree is built from RT3 to RT5 and RT6 toward Receiver 1 and Receiver 2. Source 1 and Source 2 send multicast packets toward the RP via source distribution trees; the packets are then forwarded from the RP to the receivers according to the shared distribution tree. The default forwarding state for the shared tree is identified using the notation (*, G) (pronounced as “star comma G”); where * is a wildcard entry meaning any source, and G is the multicast group address. The root is not necessarily the multicast source – it is a router that is centrally located in the network that is called the Rendezvous Point (RP).

Think that the multicast forwarding paths as a tree structure. The source resides at the root of the tree and blindly sending IP packets to a multicast address. The source is never aware of the recipients that are members of a multicast group. The source depends upon multicast routers and switches to deliver the multicast packets. The multicast routers or switches reside at a branch of the tree replicate the multicast packets upon interfaces that have downstream recipients.

Reverse Path Forwarding (RPF) is the concept of forwarding multicast traffic away from the source, rather than toward the receiver – opposite of the normal unicast packet forwarding. RPF ensures that multicast packets are not being replicated back into the network in order to avoid routing loops. In multicasting, the source IP address indicates the known source, while the destination IP address indicates a group of unknown receivers.
“toward the destination” and “away from the source” sound like the same thing, but they are not!

Multicast routers uses the unicast routing table to determine the upstream (toward the source) and downstream (away from the source) neighbors and ensures that only one router interface is considered as the incoming interface for a specific multicast source – verify that the packet is received upon the same interface used to reach the source. It this is true, the packet can be forwarded or replicated toward the multicast recipient; if it is not true, the packet is discarded. Packet received on an interface and forwarded out another interface might be replicated around the network and forwarded back to the same router on a different interface. RPF ensures that this packet is not being forwarded again.

PIM Dense Mode (PIM-DM) is defined in RFC 3973 – Protocol Independent Multicast – Dense Mode (PIM-DM): Protocol Specification (Revised). PIM-DM uses a push approach that uses source trees to flood multicast traffic to the entire network. Routers that do not need the data (because they are not connected to receivers that want the data or to other routers that want it) request to prune the tree so that they do not receive the multicast packets.

PIM-DM Initial Flooding and Pruning

PIM-DM initially floods multicast traffic throughout the entire network. The traffic is sent out of all non-RPF interfaces where there is another PIM-DM neighbor or a directly connected member. As each router receives the multicast traffic via its RPF interface (the interface in the direction of the source), it forwards the multicast traffic to all its PIM-DM neighbors. The (S, G) state entry is created in every router in the network.

The flooding may result in some traffic arriving upon a non-RPF interface as with RT3 and RT6. Packets arriving via the non-RPF interfaces are discarded. PIM-DM Prune messages are sent to stop unwanted traffic when there is no host registered for the multicast group using IGMP. Prune messages are sent out of an RPF interface when the router has no downstream receivers for multicast traffic from the specific source. Prune messages are also sent out of non-RPF interfaces to terminate the multicast traffic flow as it is arriving via an interface that is not on the shortest path to the source. PIM-DM Prune messages are sent to 224.0.0.13 – PIM.

There is only one receiver in the scenario above, and therefore all other paths are pruned. Although the multicast traffic flow does not reach and pass through most routers in the network, the (S, G) state entry remains in all routers and will remain there until the source stops sending. In PIM-DM, all prune messages expire in 3 minutes. After that, the multicast traffic is flooded again to all routers. This periodic flood-then-prune operation or behavior is normal and must be taken into account when a network is intended to use PIM-DM.

PIM-DM routers assume that the recipients of a multicast group are located on every subnet – the multicast group is densely populated across the network; few senders, but many receivers; there will be a great amount of multicast traffic; and the multicast streams will be constant.

PIM-DM Multicast Traffic Flow after Pruning

PIM Sparse Mode (PIM-SM) is defined in RFC 2362 – Protocol Independent Multicast – Sparse Mode (PIM-SM): Protocol Specification. PIM-SM uses a different pull approach to forward multicast traffic only to the portions of network that need it. It uses a shared tree and therefore required to define an RP. In sparse mode, sources register with the RP, multicast routers along the path from active receivers that have explicitly requested to join a specific multicast group would join the tree – the multicast tree is not extended to a router unless a host has joined the group. The multicast tree is built and grown in reverse by beginning with the group members at the end leaves and extended back toward the central root. Multicast routers calculate using the unicast routing table whether they have a better metric to the RP or to the source itself. They forward the join messages to the device with which they have the better metric.

Sparse mode multicast flows are described as (*, G) as the multicast tree allows any source to send to a group. As a receiver joins a multicast group via IGMP, the local router forwards the membership report toward the RP at the root of the tree. Each router along the way adds that branch to the shared tree. Pruning is performed only when a member leaves the group.

PIM-SM Shared Tree Join

When an receiver attached to the leaf router RT6 joins the multicast group G, the last-hop router – RT6 which knows the IP address of the RP router for multicast group G sends a (*, G) join for the group toward the RP. The PIM Join travels hop-by-hop toward the RP to build a branch of the shared tree that extends from the RP to the last-hop router directly connected to the receiver. The traffic of multicast group G may then flow down the shared tree to the receiver. The (*, G) state entry is created only along the shared tree.

Both PIM-DM and PIM-SM modes construct identical tree structures and therefore result in the same multicast traffic flow patterns. PIM-SM is appropriate for wide-scale deployment for both densely and sparsely populated groups in an enterprise network. PIM-SM is preferred over PIM-DM for all production networks regardless of size and membership density.

PIM Sparse-Dense Mode allows a PIM router to operate in both sparse and dense modes on a per-group basis on the same router interface. Sparse mode is used if a group has an RP defined; otherwise, dense mode is used. PIM sparse-dense mode also supports automatic RP discovery. Multiple RPs can be implemented with each RP in an optimum location for maximum efficiency. Configuring, managing, and troubleshooting multiple RPs can be difficult if done manually. However, PIM sparse-dense mode supports automatic selection of RPs for each multicast source, eg: RT1 could be the RP for Source 1 and RT2 could be the RP for Source 2.
If no RP is discovered for the multicast group or none is manually configured, PIM sparse-dense mode will operate in dense mode. Therefore, automatic RP discovery should be implemented with PIM sparse-dense mode.

Cisco recommends PIM sparse-dense mode for IP multicast, as PIM-DM does not scale well and requires many router resources, and PIM-SM has limited RP configuration options. Additionally, it can use either statically defined RPs, Auto-RP, or BSR with the least configuration effort.

Below are some extensions, optimizations, and enhancements upon PIM:
  • Bidirectional PIM Mode, which is designed for many-to-many applications – many hosts multicasting to each other.
  • Source-Specific Multicast (SSM), which is a variant of PIM-SM that builds only source specific shortest path trees and does not need an active RP for source-specific groups in the address range 232.0.0.0/8.
Comparison of PIM modes:

Multicast Flows Tree Construction Tree Refinements
Dense Mode (S, G) Root to leaves.
Source is the root.
Receivers are the leaf nodes.
First flood, then prune.
Sparse Mode (*, G) Leaves to root.
RP is the root.
Source can be anywhere. Receivers are the leaf nodes.
Group extended from receivers toward RP. Pruning only when member leaves group.
Sparse-Dense Mode (S, G) or (*, G) Hybrid on a per-group basis N/A

The ip pim dense-mode interface subcommand configures PIM dense mode on an interface.
The ip pim sparse-mode interface subcommand configures PIM sparse mode on an interface.
The ip pim sparse-dense-mode interface subcommand configures PIM sparse-dense mode on an interface.

PIMv1 RPs can be configured manually or using the dynamic auto-RP process. The ip pim rp-address {ip-addr} [access-list] [override] global configuration command manually identify an RP. An access list limits the range of multicast groups supported by the RP. The override keyword causes the RP to be preferred over any automatically determined RP. Because the RP does not advertise itself, its address and function must be defined on every router in the PIM domain, including the RP itself. Future changes upon the RP location are difficult as every router must be reconfigured with the new RP address.

Auto-RP is a Cisco-proprietary process that automatically informs PIM-SM routers about the appropriate RP for a group by identifying a centrally located and well-connected router to function as the mapping agent that learns all the candidate RPs that are announced through the Cisco-RP-Announce multicast address 224.0.1.39, in which all PIM-SM routers must join by default.

The ip pim send-rp-discovery [intf-type intf-num] scope {ttl} global configuration command configures a router as a RP mapping agent. The optional intf-type intf-num defines the interface type and number that is to be used as the source address of the RP mapping agent; and the optional ttl parameter specifies the Time-to-Live (TTL) value that limits the scope of the Auto-RP discovery messages – how many router hops away the information will reach and valid. The RP mapping agent sends Group-to-RP mapping information to all PIM routers over the Cisco-RP-Discovery multicast address 224.0.1.40.

Each candidate RP router must then be explicitly defined with the ip pim send-rp-announce {intf-type intf-num | ip-addr} scope {ttl} [group-list acl] global configuration command. A router begins sending announcements to the RP mapping agent when it knows it can be an RP. The interface must be specified to indicate the advertised RP address and identifies where to reach the mapping agent. TTL limits the scope of Auto-RP announcements by the number of router hops. The router can also advertise itself as a candidate RP for the multicast groups permitted through the optional group-list access list. The default announcement interval is 60 seconds.

PIMv2 also includes an industry-standard dynamic Group-to-RP mapping advertisement mechanism that is known as Bootstrapping, which is similar to the Cisco Auto-RP method.

A bootstrap router (BSR) that learns about RP candidates for a group and advertises them to PIM routers must first be identified using the ip pim bsr-candidate {intf-type intf-num[1]} [hash-mask-length] [priority] global configuration command; followed by defining the candidate RP routers that advertise themselves to the BSR as PIMv2 candidate RPs using the ip pim rp-candidate {intf-type intf-num[2]} [ttl] [group-list acl] [priority priority] global configuration command. The priority value ranges from 0 to 255. The BSR or RP with the larger priority is preferred. The router with the higher IP address becomes the BSR or RP if the priority values are the same.
Note: The Cisco IOS implementation of PIM BSR which predates the draft-ietf-pim-sm-bsr IETF draft uses the value 0 as the default priority for candidate RPs and BSRs. Explicitly set the priority value to 192 to comply with the IETF draft that specifies 192 as the default priority value.
[1] – The IP address associated with this interface determines the candidate BSR address.
[2] – Advertises the IP address associated with this interface as the candidate RP address.

Once the BSR and candidate RPs are configured, all other PIM routers will learn the appropriate RP from the BSR. The selection of RP for a group is based on a hashing function. The length of the hash mask controls the number of multicast groups that are being hashed to the same RP.

The bootstrap messages are propagated throughout the entire PIM domain by default. The scope of the advertisements can be limited by defining PIMv2 border routers using the ip pim border global configuration command.

A small network with only one or some L2 or L3 switches and without a multicast router always support multicast. When a host sends an IGMP Membership Report to join a multicast group, it does not know about multicast routers at all; it just sends out a request to join and hopes that it will start receiving traffic destined for the multicast group. Even if a multicast router is present, it does not send a reply to a host upon joining a multicast group. A multicast router only sends out Membership Queries periodically asking if the hosts still want to remain as a member of a group. In such small network, L2 switches simply flood the multicast traffic out all ports on a VLAN; CGMP is not in action to prune the multicast traffic. L3 switches can use IGMP snooping to constrain the flooding of multicast traffic.

1 comment:

  1. Marvelous posting! I quite enjoyed reading it, you may be a great author.I will ensure that I bookmark your blog and will eventually come back in the foreseeable future..Tree Removal Birmingham AL

    ReplyDelete