Chapter 5. Tuning BGP CapabilitiesThis chapter covers the following key topics:
Up to this point, this book has been concerned primarily with general definitions of interior and exterior gateway protocols and an overview of their respective and interconnected tasks. The Border Gateway Protocol was also presented from the technical perspective of its functional elements. With this chapter, you will begin to consider more practical implementation details for BGP as part of the overall design problem in building reliable Internet connectivity. This chapter examines specific attributes of BGP and how they are applied individually and together to address this design problem. Although the terminology, attributes, and details of this chapter are specific to BGP, the general concepts and problems raised are pertinent to routing architecture design, regardless of what specific protocols are being utilized. Building Peer SessionsThe previous chapter began examining the process of BGP neighbor negotiation at a fairly technical level and with an emphasis on the formats of messages exchanged during negotiation. This chapter now expands the examination to consider additional subtleties of the negotiation process. In addition, distinctions between internal and external BGP, which have practical implications in building peer sessions, are introduced in this section. Although BGP is meant to be used between autonomous systems to provide an interdomain loop-free topology, BGP can be used within an AS as a pipe between border routers running external BGP to other ASs. A neighbor connection, also called a peer connection, between two routers can be established within the same AS, in which case BGP is called internal BGP (IBGP). A peer connection can also be established between two routers in different ASs. BGP is then called external BGP (EBGP). Figure 5-1 contrasts these environments.
Upon neighbor session establishment and during the OPEN message exchange negotiation, peer routers compare AS numbers and determine whether they are peers in the same AS or in different ASs. The difference between EBGP and IBGP manifests itself in how each peer would process the routing updates coming from the other peer and in the way different BGP attributes are carried on external versus internal links. The neighbor negotiation process is mainly the same for internal and external neighbors as far as building the TCP connection at the transport level. It is essential to have IP connectivity between the two neighbors for the transport session to take place. IP connectivity has to be achieved via a protocol different from BGP; otherwise, the session will be in a race condition. An example of a race condition follows: neighbors can reach one another via some IGP, the BGP session gets established, and the BGP updates get exchanged. The IGP connection goes away for some reason, but still the BGP TCP session is up because neighbors can still reach each other via BGP. Eventually the session will go down because the BGP session cannot depend on BGP itself for neighbor reachability.
An Interior Gateway Protocol (IGP) or static route can be configured to achieve IP connectivity. In essence, a ping packet, containing a source IP address (the IP address of one BGP peer) and a destination IP address (the IP address of the second peer), must succeed for a transport session to initiate. Physical Versus Logical Connections External BGP neighbors have a restriction on being physically connected. BGP drops any updates from its external BGP peer if the peer is nonconnected. Some situations, however, arise where external neighbors cannot be on the same physical segment. Such neighbors are logically, but not physically connected. An example would be running BGP between external neighbors across non-BGP routers. In this situation, Cisco offers an extra knob to override this restriction. BGP would require some extra configuration to indicate that its external peer is not physically attached.
In Cisco's implementation, nondirectly connected EBGP peers are referred to as EBGP multihop. In figure 5-2, RT2 is not able to run BGP, but RT1 and RT3 are. Thus, external neighbors RT1 and RT3 are logically connected and peer with one another via EBGP multihop.
On the other hand, neighbors within the same autonomous system (internal neighbors) have no restrictions whatsoever on whether the peer router is physically connected. As long as there is IP connectivity between the two neighbors, BGP requires no extra configuration. In figure 5-2, RT1 and RT4 are logically but not physically connected. Because both are in the same AS, no additional configuration is needed for them to run IBGP. Obtaining an IP Address The neighbor's IP address could be the address of any of the routers' interfaces, such as Ethernet, Token Ring, or Serial. Keep in mind that the stability of the neighbor connection will rely on the stability of the IP address you choose.
If the IP address belongs to an Ethernet card that has some hardware problems and is shutting down every few minutes, the neighbor connection and the stability of the routing updates will suffer. Cisco has introduced a loopback interface; this is actually a virtual interface that is supposed to be up at all times. Tying the neighbor connection to a loopback interface will make sure that the session is not dependent on any hardware interface that might be problematic. Adding loopback interfaces is not necessary in every situation (it actually requires more configuration). If external BGP neighbors are directly connected and the IP addresses of the directly connected segment are used for the neighbor negotiation, a loopback address is of no added value. If the physical link between the two peers is problematic, then the session will break with or without loopback.
Authenticating the BGP Session As you have already seen in Chapter 4, "Interdomain Routing Basics," the BGP message header allows authentication. Authentication is a measure of precaution against hackers who might present themselves as one of your BGP peers and feed you wrong routing information. Authentication between two BGP peers gives the capability to validate the session between you and your neighbor by using a combination of passwords and keys upon which you both agree. A neighbor that tries to establish a session without the use of these specific passwords and keys will not be permitted. The authentication feature uses the Message-Digest Algorithm version 5 (MD5) [1]. The discussion of the authentication algorithm itself is beyond the scope of this book. BGP Continuity Inside an AS To avoid creating routing loops inside the AS, BGP does not advertise to internal BGP peers routes that are learned via other IBGP peers. Thus, it is important to maintain a full IBGP mesh within the AS—that is, every BGP router in the AS has to build a BGP session with all other BGP routers inside the AS. Figure 5-3 illustrates one of the common mistakes administrators make when setting BGP routing inside the AS.
In the situation illustrated in figure 5-3, an ISP has three POPs (Point Of Presence) in San Jose, San Francisco, and Los Angeles. Each POP has multiple non-BGP routers and a BGP border router running EBGP with other ASs. The administrator sets an IBGP connection between the San Jose border router and the San Francisco border router. He sets another IBGP connection between the SF border router and the LA border router. In this configuration, EBGP routes learned via SJ will be given to SF, EBGP routes learned via SF are given to SJ and LA, and EBGP routes learned via LA are given to SF. Routing in this picture is not complete; EBGP routes learned via SJ will not be given to LA, and EBGP routes learned via LA will not be given to SJ. This is because the SF router will not pass on IBGP routes between SJ and LA. What is needed is an additional IBGP connection between SJ and LA (shown via the dotted line). You will see in Chapter 8, "Controlling Large-Scale Autonomous Systems," how this situation could be handled by using the concept of route reflectors, an option that scales much better in cases where the AS has a large number of IBGP peers. Synchronization Within an AS BGP must be synchronized with IGP in such a way that it waits until the IGP has propagated routing information across your autonomous system before advertising transit routes to other ASs. It is important that your AS be consistent about the routes it advertises. If, for example, your BGP were to advertise a route before all routers in your AS had learned about the route through the IGP, your AS could receive traffic that some routers cannot yet route. Whenever a router receives an update about a destination from an IBGP peer, the router tries to verify internal reachability for that destination before advertising it to other EBGP peers. The router would do so by checking for the existence of this destination in the IGP. This would give an indication whether non-BGP routers can deliver traffic to that destination. Assuming that the IGP recognizes that destination, the router will announce it to other EBGP peers. Otherwise, the router will treat the route as not being synchronized with the IGP and would not advertise it. Consider the situation illustrated in figure 5-4; ISP1 and ISP2 are using ISP3 as a transit AS. ISP3 has multiple routers in its AS and is running BGP only on the border routers. (Even though RTB and RTD are carrying transit traffic, ISP3 has not configured BGP on these routers.) ISP3 is running some Interior Gateway Protocol inside the AS for internal connectivity.
Assume that ISP1 is advertising route 192.213.1.0/24 to ISP3. Because RTA and RTC are running IBGP, RTA will propagate the route to RTC. Note that other routers besides RTA and RTC are not running BGP and have no knowledge so far of the existence of route 192.213.1.0/24. In the situation illustrated in figure 5-4, if RTC advertises the route to ISP2, traffic toward the destination 192.213.1.0/24 will start flowing toward RTC. RTC will do a recursive lookup in its IP routing table and will direct the traffic toward the next hop RTB. RTB, having no visibility to the BGP routes, will drop the traffic because it has no knowledge of the destination. This has happened because there is no synchronization between BGP and the IGP. The BGP rule states that a BGP router should not advertise to external neighbors destinations learned from inside BGP neighbors unless those destinations are also known via IGP. If a router knows about these destinations via IGP, it assumes that the route has already been propagated inside the AS, and internal reachability is guaranteed.
The consequence of injecting BGP routes inside an AS is costly. Redistributing routes from BGP into the IGP will result in major overhead on the internal routers, which might not be equipped to handle that many routes. Besides, carrying all external routes inside an AS is not really necessary. Routing can easily be accomplished by having internal non-BGP routers default to one of the BGP routers. Of course, this will result in routing suboptimality (there is no guarantee for shortest path for each route), but this cost is minimal compared with maintaining thousands of routes inside the AS. Cisco offers a software knob called "no synchronization" that enables BGP to override the synchronization requirement and enables it to advertise routes learned via IBGP irrespective of an existence of an IGP route. In practice, a couple of situations exist where synchronization can be safely turned off on border routers:
Sources of Routing UpdatesIn networks as complex as today's Internet, route stability is a big issue. There is a close correspondence between route fluctuations and the stability of the Internet access links on one hand and how the routing information was injected into the Internet via BGP on the other hand. Information can be injected into BGP dynamically or statically. Dynamically injected routes come and go from the BGP routing table, depending on the status of the networks they identify. Statically injected routes are constantly maintained by the BGP routing tables, regardless of the status of the networks they identify. Thus, while a dynamic advertisement will cease if the network being advertised no longer exists, a static advertisement would not. Each method has its pros and cons, as you will see next.
Unstable Routes Injecting the IGPs in BGP dynamically or semidynamically results in the dependency of the BGP routes on the IGP routes. Although you could argue that this is good because it reflects the actual status of networks, it can have drawbacks as well. Remember that with today's global network connectivity, route fluctuation within your AS will affect your provider if you are a customer or other providers if you are a provider. The IGP route you advertise will translate into a BGP route. If that route goes down, a WITHDRAWN message will be sent via BGP requesting peers to remove that route from their tables. A route constantly going up and down in your AS has the effect of being constantly sent and withdrawn by other ASs. The example of one fluctuating route is very simplistic; imagine having hundreds of these routes fluctuating in hundreds of ASs. Internet stability will be affected very negatively. Strict measures are being put in place to try to mitigate the effect of route fluctuation on the Internet. As you will see in Chapter 9, "Designing Stable Internets," by a process called route dampening, fluctuating routes are penalized and stopped from being advertised depending on their degree of instability. Your routes might be "held hostage" for minutes and hours—until they stabilize—by an interconnected provider. Controlling route instability is not an easy matter because it usually depends on factors that are beyond your control. Such factors could be instable access links or faulty hardware. One way to minimize route instability is aggregation. When the aggregate represents more than one route, the fluctuation of any single route does not cause fluctuation in the aggregate itself. Aggregation could be done on the customer boundary or the provider boundary, depending on the level of information exchanged between the customer and the provider. If done on the customer boundary, this would alleviate the provider from seeing the fluctuations of individual customer routes. If aggregation is done on the provider boundary, then the customer fluctuation would leak to the provider but will not be propagated to the Internet. BGP4 aggregation is discussed at the end of this chapter after you have acquired enough techniques in BGP tuning. Another way of controlling route instability is to decouple route advertisement from the existence of the route itself. This is called static injection of routes into BGP, as described in the following section. Injecting Information Statically into BGP Today, injecting information statically into BGP has proven to be the most effective in ensuring route stability. Of course, this method also has drawbacks. To statically inject information into BGP, IGP routes (or aggregates) that need to be advertised to other peers are manually defined as static routes. This ensures that these routes will never disappear from the IP routing table and hence will always be advertised. Because administrators are often uncomfortable advertising routes to networks that might be down or unreachable, the appropriateness of injecting information statically depends on the particular situation.
If, for example, the route is advertised to the Internet from a single point, then advertising a route that is actually down is not a big issue. Hosts trying to access that destination will fail irrespective of whether the route is advertised. On the other hand, if a route is advertised to the Internet from multiple points, then advertising the route statically at all times might end up black-holing the traffic. If problems inside the AS prevent the border router from being able to reach the network it is advertising, traffic to that destination will be dropped even though it could have been reached from some other entry point. The actual advertisement of the static route can be done with either of the methods described in the "Injecting Information Dynamically into BGP" section. Advertisement can be done by redistributing all the static routes via the redistribute command or a subset of the static routes via the network command. The latter method enables a more controlled route injection because redistribution might cause some unwanted static routes to be sent via BGP. ORIGIN of Routes BGP considers the networks advertised via the network command or via aggregation as being internal to the AS and will include the ORIGIN attribute in each route as being IGP (i). On the other hand, whenever a route is injected into BGP via redistribution (whether statically or dynamically), the ORIGIN of the route will be INCOMPLETE because the redistributed routes could have come from anywhere. Figure 5-6 illustrates these issues. In Scenario 1, all networks have been listed under the BGP process via the network command. Note that BGP has considered 10.0.0.0 and 11.0.0.0 as having a known origin of IGP. Network 12.0.0.0 is the only network that is not known to the router (does not exist in the IP routing table). As you can see, 12.0.0.0 is not being advertised via BGP, even though it has been listed via the network command.
In Scenario 2, networks 10.0.0.0, 11.0.0.0, and 12.0.0.0 have been statically defined. Network 11.0.0.0 has also been defined via the network command. Finally, network 13.0.0.0 is learned dynamically by the router via IGP. All these networks have been injected into BGP via redistribution. As a result, networks 10.0.0.0, 12.0.0.0, and 13.0.0.0 have been advertised with an ORIGIN of INCOMPLETE because these networks have been injected into BGP via redistribution. Although network 11.0.0.0 has been injected via redistribution, it is also defined natively to BGP via the network command, which is why it will be sent out with an ORIGIN of IGP. If it had not been defined natively, it would have been sent out with an ORIGIN of INCOMPLETE. Actually, network 11.0.0.0 did not need to be redistributed because defining it statically and listing it via the network command would suffice to inject it into BGP. Although the ORIGIN is immaterial at this point, it is used later on to favor one route over the other by the BGP decision process. An Example of Static Versus Dynamic Routing: Mobile Networks It is common in the military for units to be mobile; this creates a problem for assigning IP addresses. Usually these mobile units would like to deploy their subnets and IP addresses wherever they go and operate as if they had never moved. If these networks are part of a global network and advertised via BGP, announcing them statically will not work easily. The static commands would have to be removed from the border router of one AS in one location and installed in the border router of another AS in the new location every time the unit moves. To avoid such complications, injecting these networks dynamically into BGP becomes mandatory. One solution is to inject the IGP into BGP in all locations. This way, whenever the IP addresses are moved from one location to the other, the announcements will disappear from one location and reappear in the new one. In some cases, network administrators are not comfortable with this solution for reasons discussed earlier, such as mutual redistribution problems and the mandate for extensive filtering. Another possibility is to define these networks in all the border routers of all the locations via the network command. Because BGP checks for the existence of these routes in the IP routing table before announcing them, BGP will only announce the routes in the location of the mobile unit. All other locations will automatically cease from announcing the routes because they are not part of the IGP of that particular AS. Overlapping Protocols: BackdoorsWith different IGPs and EGPs working together to achieve routing, networks can be learned via different protocols. Choosing one protocol over the other affects how the traffic flows. If, for example, traffic follows a RIP route, it might end up on one link; whereas if it follows an external BGP route, it might end up on another link. Backdoor links offer an alternate IGP path that can be used instead of the external BGP path. IGP routes reachable over the backdoor link are called backdoor routes. With the existence of such alternate routes, a mechanism that gives one protocol preference over other protocols is needed. Cisco Systems offers a preference parameter called the distance of a protocol. The lower a protocol's distance, the higher the preference for the protocol. Table 5-1 lists distances according to the Cisco implementation.
Table 5-1 indicates that a directly connected route is generally preferred over a static route, which in turn is preferred over an EBGP route, and so on. Note that EBGP routes with a distance of 20 are preferred over all the other IGP routes. Figure 5-7 illustrates the use of backdoor routes. In the figure, AS1 is receiving updates about NetA from two different sources. AS1 is receiving routes via EBGP on the link to AS3 and via the backdoor link running RIP between AS1 and AS2. According to the distance table, the router will give a distance of 20 to the EBGP route and a distance of 120 to the RIP route. In AS1, the EBGP route with the lower distance will be installed in the routing table. Hence, traffic toward NetA will follow the BGP route via AS3 and then AS2, rather than the direct RIP route via AS2.
Cisco provides a way to force IGP routes to take precedence over the EBGP routes. The concept is simple. EBGP routes can be tagged as backdoor routes. This would set the distance of these routes to be the same as the "BGP Local" route's distance (default is 200). According to table 5-1, this distance is higher than any IGP learned route, and the backdoor IGP route will be preferred. The Routing Process SimplifiedUp until now, this chapter has examined discrete aspects of routing, specifically peer negotiation and static versus dynamic routing. Before diving into details of routing configuration, it makes sense to pause here and briefly overview the BGP routing process in its entirety. BGP is a fairly simple protocol, which is why it is so flexible. Routes are exchanged between BGP peers via UPDATE messages. BGP routers receive the UPDATE messages, run some policies or filters over the updates, and then pass on the routes to other BGP peers. Cisco's implementation of BGP keeps track of all BGP updates in a BGP routing table separate from the IP routing table. In case multiple routes to the same destination exist, BGP does not flood its peers with all those routes; rather, it picks the best route and sends it. In addition to passing along routes from peers, a BGP router may originate routing updates to advertise networks that belong to its own autonomous system. Valid local routes originated in the system, and the best routes learned from BGP peers are then installed in the IP routing table. The IP routing table is used for the final routing decision. To model the BGP process, imagine each BGP speaker having different pools of routes and different policy engines applied to the routes. The model would involve the following components:
Figure 5-8 illustrates this model. The subsequent discussion provides more details about each component.
Routes Received from Peers BGP receives routes from external or internal peers. Depending on what is configured in the Input Policy Engines, some or all of these routes will make it into the router's BGP table. Input Policy Engine This engine handles route filtering and attribute manipulation. Filtering is done based on different parameters such as IP prefixes, AS_path information, and attribute information. BGP also uses the Input Policy Engine to manipulate the path attributes to influence its own decision process and hence affect what routes it will actually use to reach a certain destination. If, for example, BGP chooses to filter a certain network number coming from a peer, it is an indication that BGP does not want to reach that network via that peer. Or, if BGP gives a certain route a better local preference (this attribute is discussed later on), it is an indication that BGP would like to prefer this route over other routes. The Decision Process BGP goes through a decision process to decide which routes it wants to use to reach a certain destination. The decision process is based on the routes that made it into the router after the Input Policy Engine was applied. The decision process is performed on the routes in the BGP routing table. The decision process looks at all the available routes for the same destination, compares the different attributes associated with each route, and chooses one best route. The decision process is discussed later in this chapter, after coverage of attributes. Routes Used by the Router The best routes, as identified by the decision process, are what the router itself uses and are candidates to be advertised to other peers and also to be placed in the IP routing table. In addition to routes passed on from other peers, the router (if configured to do so) originates updates about the networks inside its autonomous system. This is how an AS injects its routes into the outside world. Output Policy Engine This is the same engine as the Input Policy Engine, applied on the output side. Routes used by the router (the best routes) in addition to routes that the router generates locally are given to this engine for processing. The engine might apply filters and might change some of the attributes (such as AS_path or metric) before sending the update. The Output Policy Engine also differentiates between internal and external peers; for example, routes learned from internal peers cannot be passed on to internal peers. Routes Advertised to Peers This is the set of routes that made it through the Output Engine and are advertised to the BGP peers, internal or external. Example Routing Environment Figure 5-9 illustrates routing in an example environment. In the figure, AS5 is receiving routes from both AS1 and AS2 and is originating its own routes (172.16.10.0/24). To simplify, consider just the flow of updates in one direction, left to right. By applying the engine model to AS5, you will get the following.
Routes received from peers (these are the routes coming from AS1 and AS2):
Input Policy Engine:
The decision process:
Routes used by the router:
Output Policy Engine:
Routes advertised to peers:
Controlling BGP RoutesThe preceding section discusssed the existence of policy engines that provide attribute manipulation and route filtering. This section discusses attribute manipulation and route filtering, the keys to controlling routing information, in detail. Each BGP attribute is examined to determine what it does and how to use it. Traffic inside and outside an AS always flows according to the road map laid out by routes. Altering the routes translates to changes in traffic behavior. Among the questions that organizations and service providers ask about controlling routes are: How do I prevent my private networks from being advertised? How do I filter routing updates coming from a particular neighbor? How do I make sure that I use this link or this provider rather than another one? BGP provides the necessary hooks and attributes to address all these questions and more. BGP Attributes The BGP attributes are a set of parameters that describe the characteristics of a prefix (route). The BGP decision process uses these attributes to select its best routes. Remember that attributes are part of each BGP UPDATE packet. The next few sections cover these attributes and how they can be manipulated to affect the routing behavior.
The NEXT_HOP Attribute The NEXT_HOP attribute is a well-known mandatory attribute (type code 3). In IGP, the next hop to reach a route is the IP address of the connected interface of the router that has announced the route.
The next hop concept with BGP is slightly more elaborate and takes one of the following three forms:
Figure 5-10 illustrates the BGP NEXT_HOP attribute environment.
The SF router is running an EBGP session with the LA router and an IBGP session with the SJ router. The SF router is learning route 128.213.1.0/24 from the LA router. In turn, the SF router is injecting the local route 192.212.1.0/24 into BGP. The SJ router learns route 192.212.1.0/24 via 2.2.2.2, the IP address of the IBGP peer announcing the route. Thus, 2.2.2.2 is the next hop, according to the definition, for SJ to reach 192.212.1.0/24. Similarly, the SF router sees 128.213.1.0/24 coming from the LA router via next hop 1.1.1.1. When it passes this route update to the SJ router via IBGP, SF includes the next hop information, unaltered. Thus, the SJ router would receive the BGP update about 128.213.1.0/24 with next hop 1.1.1.1. This is an example of the EBGP next hop being carried into IBGP. As you can see from the preceding example, the next hop is not necessarily reachable via a direct connection. SJ's next hop for 128.213.1.0/24, for example, is 1.1.1.1, but reaching it requires a pathway through 3.3.3.3. Thus, the next hop behavior mandates a recursive IP lookup for a router to know where to send the packet. To reach the next hop 1.1.1.1, the SJ router will recursively look into its IGP routing table to see if and how 1.1.1.1 is reachable. This recursive search continues until the router associates destination 1.1.1.1 with an outgoing interface. The same recursive behavior is performed to reach next hop 2.2.2.2. If a hop is not reachable, BGP would consider the route as being inaccessible. The following is a sample of how IP recursive lookup is used to direct the traffic toward the final destination. Table 5-2 and table 5-3 list the BGP and IP routing tables for the SJ router illustrated in figure 5-10.
Table 5-2 indicates that 128.213.1.0/24 is reachable via next hop 1.1.1.1. Looking into the IP routing table, network 1.1.1.0/24 is reachable via next hop 3.3.3.3. Another recursive lookup in the IP routing table indicates that network 3.3.3.0/24 is directly connected via Serial 0. This would indicate that traffic toward next hop 1.1.1.1 should go via Serial 0. The same reasoning applies to deliver traffic toward next hop 2.2.2.2. Care must be taken to make sure that reachability of the next hop is advertised via some IGP or static routing. In case the BGP next hop cannot be reached, the BGP route would be considered inaccessible. NEXT_HOP Behavior on Multiaccess Media A media is considered multiaccess (MA) if routers connected to that media have the capability to exchange data in a many-to-many relationship. Routers on MA media share the same IP subnet and can physically access all other routers on the media in one hop (directly connected). Ethernet, FDDI, Token Ring, Frame Relay, and ATM are examples of multiaccess media.
IP has a rule on MA media that states that a router should always advertise the actual source of the route in case the source is on the same MA media as the router. In other words, if RTA (router A) is advertising a route learned from RTB, and RTA and RTB share a common MA media, when RTA advertises the route, it should indicate RTB as being the source of the route. If not, routers on the same media would have to make an unnecessary hop via RTA to get to a router that is sitting in the same segment. In figure 5-11, RTA, RTB, and RTC share a common multiaccess media. RTA and RTC are running EBGP, while RTC and RTB are running OSPF. RTC has learned network 11.11.11.0/24 from RTB via OSPF and is advertising it to RTA via EBGP. Because RTA and RTB are running different protocols, you might think that RTA would consider RTC (10.10.10.2) as its next hop to reach 11.11.11.0/24, but this is incorrect. The correct behavior is for RTA to consider RTB (10.10.10.3) as the next hop because RTB shares the same media with RTC.
In situations where the media is broadcast, such as Ethernet and FDDI, physical connectivity is a given and the next hop behavior is no problem. On the contrary, in situations where the media is nonbroadcast, such as Frame Relay and ATM, special care should be taken as described in the following section. NEXT_HOP Behavior Over Media such as Frame Relay and ATM are nonbroadcast multiaccess. The many-to-many direct interaction between routers is not guaranteed unless virtual circuits are configured from each router to all other routers. This is called a fully meshed topology, and it is not always implemented for a number of reasons. In practice, Frame Relay or ATM virtual circuits are provided by the access carrier at a certain dollar amount per circuit, and additional circuits translate into extra money. In addition to this cost disincentive, most organizations use a hub and spoke approach, where multiple remote sites have virtual circuits built to one or more concentration routers at a central site (the hub site) where information resides. Figure 5-12 illustrates an example of next hop behavior in a nonbroadcast multiaccess environment.
The only difference between the environments illustrated in figure 5-12 and figure 5-11 is that the media in figure 5-12 is a Frame Relay cloud that is NBMA. RTC is the hub router; RTA and RTB are the spokes. Notice how the virtual circuits are laid out between RTC and RTA, and between RTC and RTB, but not between RTA and RTB. This is called a partially meshed topology. RTA gets a BGP routing update about 11.11.11.0/24 from RTC and would try to use RTB (10.10.10.3) as the next hop (the same behavior as on MA media). Routing will fail because no virtual circuit exists between RTA and RTB. Cisco IOS software supports a special case parameter that remedies this situation. The next-hop-self parameter (when configured as part of the BGP neighbor connection) forces the router (in this case, RTC) to advertise 11.11.11.0/24 with itself as the next hop (10.10.10.2). RTA would then direct its traffic to RTC to reach destination 11.11.11.0/24. Use of next-hop-self Versus Advertising DMZ The demilitarized zone (DMZ) defines the shared network between ASs. The IP subnet used for the DMZ link might be part of any of the networked ASs or might not belong to any of them. As you have already seen, the next hop address learned from the EBGP peer is carried inside IBGP. It is important for the IGP to be able to reach the next hop. One way of doing so is for the DMZ subnet to be part of the IGP and have the subnet advertised in the AS. The other way is to override the next hop address by forcing the next hop to be the IP address of the border IBGP neighbor. In figure 5-13 the SJ router is receiving updates about 128.213.1.0/24 with next hop 1.1.1.1 (part of the DMZ). For the SJ router to be able to reach this next hop, one option is for network 1.1.1.0/24 to be advertised inside the AS by the SF border router.
The other option is to have the SF router set the next-hop-self parameter as part of the IBGP neighbor connection to the SJ router. This will set the next hop address of all EBGP routes to 2.2.2.2, that is already part of the IGP. The SJ router can now reach the next hop with no problem.
Choosing one method over the other depends on whether you want to reach the DMZ. An example could be an operator trying to do a ping from inside the AS to a router interface that belongs to the DMZ. For the ping to succeed, the DMZ must be injected in the IGP. In other cases, the DMZ might be reachable via some suboptimal route external to the AS. Instead of reaching the DMZ from inside the AS, the router might attempt to use another EBGP link to reach the DMZ. In this case, using next-hop-self ensures that the next hop is reachable from within the AS. In all other cases, both methods are similar as far as the BGP routing functionality. The AS_Path Attribute An AS_path attribute is a well-known mandatory attribute (type code 2). It is a sequence of autonomous system numbers a route has traversed to reach a destination. The AS that originates the route adds its own AS number when sending the route to its external BGP peers. Thereafter, each AS that receives the route and passes it on to other BGP peers will prepend its own AS number to the list. Prepending is the act of adding the AS number to the beginning of the list. The final list represents all the AS numbers that a route has traversed with the AS number of the AS that originated the route all the way at the end of the list. This type of AS_path list is called an AS_sequence, because all the AS numbers are ordered sequentially.
BGP uses the AS_path attribute as part of the routing updates (UPDATE packet) to ensure a loop-free topology on the Internet. Each route that gets passed between BGP peers will carry a list of all AS numbers that the route has already been through. If the route is advertised to the AS that originated it, that AS will see itself as part of the AS_path attribute list and will not accept the route. BGP speakers prepend their AS numbers when advertising routing updates to other ASs (external peers). When the route is passed to a BGP speaker within the same AS, the AS_path information is left intact. Figure 5-14 illustrates the AS_path attribute at each instance of the route 172.16.10.0/24, originating in AS1 and passed to AS2 then AS3 and AS4 and back to AS1. Note how each AS that passes the route to other external peers adds its own AS number to the beginning of the list. When the route gets back to AS1, the BGP border router will realize that this route has already been through its AS (AS number 1 appears in the list) and would not accept the route.
AS_path information is one of the attributes BGP looks at to determine the best route to take to get to a destination. In comparing two or more different routes, given that all other attributes are identical, a shorter path is always preferred. In case of a tie, other attributes are used to make the decision. Using Private ASs To conserve AS numbers, InterNIC generally does not assign a legal AS number to customers whose routing policies are an extension of the policies of their provider. Thus, in the situation where a customer is single-homed or multihomed to the same provider, the provider generally requests that the customer use an AS number taken from the private pool of ASs (64512-65535). As such, all BGP updates the provider receives from its customer contain private AS numbers.
Private AS numbers cannot be leaked to the Internet because they are not unique. For this reason, Cisco has implemented a feature to strip private AS numbers out of the AS_path list before the routes get propagated to the Internet. This is illustrated in figure 5-15.
In figure 5-15, AS1 is providing Internet connectivity to its customer AS65001. Because the customer has only this provider and no plans for having an additional provider in the near future, the customer has been allocated a private AS number. If the customer later needs to connect to another provider, a legal AS number should be assigned. Prefixes originating from AS65001 have an AS_path of 65001. Note prefix 172.16.220.0/24 in figure 5-15 as it leaves AS65001. For AS1 to propagate the prefix to the Internet, it would have to strip the private AS number. When the prefix reaches the Internet, it would look like it has originated from the provider's AS. Note how prefix 172.16.220.0/24 has reached the NAP with AS_path 1.
BGP will strip private ASs only when propagating updates to the external peers. This means that the AS stripping would be configured on RTC as part of its neighbor connection to RTE. Private ASs should only be connected to a single provider. If the AS_path contains a mixture of private and legal AS numbers, BGP will view this as an illegal design and will not strip the private AS numbers from the list, and the update will be treated as usual. Only AS_path lists that contain private AS numbers in the range 64512 to 65535 are stripped. AS_Path and Route Aggregation Issues Route aggregation involves summarizing ranges of routes into one or more aggregates or CIDR blocks to minimize the number of routes in the global routing tables. A drawback of route aggregation is the loss of granularity that existed in the specific routes that form the aggregate. The AS_path information that exists in multiple routes, for example, will be lost when these routes get summarized into one single advertisement. This would lead to potential routing loops because a route that has passed through an AS might be accepted by the same AS as a new route. BGP defines another type of AS_path list called an AS-SET where the ASs are listed in an unordered set. The set includes all the ASs a route has traversed. Aggregates carrying the AS-SET information would have a collective set of the attributes that form the individual routes they summarize. In figure 5-16, AS1 is advertising 192.213.1.0/24, and AS2 is advertising 192.213.2.0/24. AS3 is aggregating both routes into 192.213.0.0/16. An AS that advertises an aggregate considers itself the originator of that route, irrespective of where that route came from. When AS3 advertises the aggregate 192.213.0.0/16, the AS_path information would be just 3. This would cause a loss of information because the originators of the route AS1 and AS2 are no longer listed in the AS_path. In a situation where the aggregate is somehow advertised back to AS1 and AS2 by some other AS, AS1 and AS2 would accept the route that would potentially lead to routing loops.
With the notion of AS-SET, it is possible to have AS3 advertise the aggregate 192.213.0.0/16 while keeping information about the components of the aggregate. The set {1 2} indicates that the aggregate has come from both of these ASs in no particular order. The AS_path information of the aggregate with the AS-SET option would be 3 {1 2}. AS_Path Manipulation AS_path information is manipulated to affect interdomain routing behavior. Because BGP prefers a shorter path over a longer one, system operators are tempted to change the path information by including dummy AS path numbers that would increase the path length and influence the traffic trajectory one way or the other. Cisco's implementation enables a user to insert AS numbers at the beginning of an AS_path to make the path length longer. The following example shows how this feature can be used.
In figure 5-17, AS50 is connected to two providers, AS200 and AS100. AS100 is directly connected to the NAP, whereas AS200 has to go through an extra hop via AS300 to reach the NAP. Figure 5-17 shows instances of prefix 192.213.1.0/24 as it traverses the ASs in its way to the NAP. When the 192.213.1.0/24 prefix reaches the NAP via AS300, it would have an AS_path of 300 200 50. If the same prefix reaches the NAP via AS100, it would have an AS_path of 100 50, which is shorter. ASs upstream from the NAP would prefer the shorter AS_path length and would direct their traffic toward AS100 at all times for destination 192.213.1.0/24.
AS50 is not too happy about this behavior because it prefers the traffic to come via its higher bandwidth T3 link to AS200. AS50 will manipulate the AS_path information by inserting dummy AS numbers when sending routing updates to AS100. One common practice is for AS50 to repeat its AS number as many times as necessary to tip the balance and make the path via AS200 become shorter. In figure 5-18, AS50 will insert two AS numbers 50 50 at the beginning of the AS_path of prefix 192.213.1.0/24. When the prefix 192.213.1.0/24 reaches the NAP via AS100, it would have the AS_path 100 50 50 50, which is longer than the AS_path 300 200 50 via AS300. ASs upstream of the NAP would prefer the shortest path and would direct the traffic toward AS300 for destination 192.213.1.0/24.
The bogus number should always be a duplicate of the AS announcing the route or the neighbor the route is learned from (in case an AS is increasing the path length for incoming updates). Adding any other number is misleading and could potentially lead to routing loops. Note the insertion point in figure 5-18. The Local Preference Attribute The local preference is a well-known discretionary attribute (type code 5). The local preference attribute is a degree of preference given to a route to compare it with other routes for the same destination. A higher local preference value is an indication that the route is more preferred. Local preference, as indicated by the name, is local to the autonomous system and gets exchanged between IBGP peers only and is not passed to EBGP peers.
An AS connected via BGP to multiple other ASs will get routing updates about the same destinations from different ASs. Local preference is usually used to set the exit point of an AS to reach a certain destination. Because this attribute is communicated within all BGP routers inside the AS, all BGP routers will have a common view on how to exit the AS. Consider the environment illustrated in figure 5-19. Suppose that company ANET has purchased Internet connections via two service providers XNET and YNET. ANET is connected to YNET via a primary T3 link and to XNET via a backup T1 link.
It is important for ANET to decide what path its outbound traffic is going to take. Of course ANET prefers to use the T3 link via YNET in normal operation because it is a high-speed link. This is where local preference comes into play: the LA router will give the routes coming from YNET a local preference of 300. The SJ router will give the routes coming from XNET a lower value, say 200. Because both the LA and SJ routers are exchanging routing updates via IBGP, they both agree that the exit point of the AS is going to be via YNET because of the higher local preference. In figure 5-19, ANET learns route 128.213.0.0/16 via XNET and YNET. The SJ and LA routers will agree on using YNET as the exit point for destination 128.213.0.0/16 because of the higher local preference value of 300. The local preference manipulation discussed in this case affects the traffic going out of the AS and not traffic coming into the AS. Inbound traffic can still come via the T1 link. Cisco's proprietary weight parameter is similar to the local preference in that it gives higher preference to the route that has a higher weight. The difference is that the weight parameter is local to the router and does not get exchanged between routers. The weight parameter influences routes coming from different providers to the same router (one router with multiple connections to two or more providers). The weight parameter has a higher precedence than any other attribute; it is used as a main (proprietary) switch to determine route preference. The ATOMIC_AGGREGATE Attribute Route aggregation causes a loss of information because the aggregate is coming from different sources that have different attributes. The ATOMIC_AGGREGATE attribute is a well-known discretionary attribute (type code 6) that gets set as an indication of information loss. Basically, if a system propagates an aggregate that causes loss of information, it is required to attach the ATOMIC_AGGREGATE attribute to the route. The ATOMIC_AGGREGATE should not be set when the aggregate carries some extra information that gives an indication of where the aggregated information came. An example is an aggregate with the AS-SET parameter, as discussed earlier. An aggregate that carries the set of ASs that form the aggregate is not required to attach the ATOMIC_AGGREGATE attribute.
The AGGREGATOR Attribute The AGGREGATOR attribute is an optional transitive attribute (type code 7). It specifies the autonomous system and the router that has generated an aggregate. A BGP speaker that performs route aggregation might add the AGGREGATOR attribute, which contains the speaker's AS number and IP address. In Cisco's implementation, the IP address is actually the Router ID (RID), which is the highest IP address on the router or the loopback address if it exists. The loopback interface is the virtual interface discussed earlier in this chapter. Figure 5-22 illustrates the AGGREGATOR attribute. AS300 is receiving routes 192.213.1.0/24 and 192.213.2.0/24 from AS100 and AS200, respectively. When RTA generates aggregate 192.213.0.0/16, it has the option of including the AGGREGATOR attribute, which consists of the AS number 300 and the RID 193.0.34.1 of the router (RTA) that originated the aggregate.
The ORIGIN Attribute The ORIGIN attribute is a well-known mandatory attribute (type code 1). It indicates the origin of the routing update (NLRI, which indicates prefix and mask) with respect to the autonomous system that originated it. BGP considers three types of origins:
BGP considers the ORIGIN attribute in its decision-making process to establish a preference ranking among multiple routes. Specifically, BGP prefers the path with the lowest origin type, where IGP is lower than EGP, and EGP is lower than INCOMPLETE. For more details on how the ORIGIN attribute is calculated, refer to the section, "ORIGIN of Routes," earlier in this chapter.
BGP Decision Process Summary BGP bases its decision process on the attribute values. When faced with multiple routes to the same destination, BGP chooses the best route for routing traffic toward the destination. The following process summarizes how BGP chooses the best route.
|
[1] RFC 1321 The MD5 Message-Digest Algorithm
[2] RFC 1997 BGP Communities Attribute