Internet-Draft | BGP SECOPTS | October 2024 |
Fiebig | Expires 5 April 2025 | [Page] |
The Border Gateway Protocol (BGP) is the protocol is a critical component in the Internet to exchange routing information between network domains. Due to this central nature, it is an accepted best practice to ensure basic security properties for BGP and BGP speaking routers. While these general principles are outlined in BCP194, it does not provide a list of technical and implementation options for securing BGP.¶
This document lists available options for securing BGP, serving as a contemporary, non-exhaustive, repository of options and methods. The document explicitly does not make value statements on the efficacy of individual techniques, not does it mandate or prescribe the use of specific technique or implementations.¶
Operators are advised to carefully consider whether the listed methods are applicable for their use-case to ensure best current practices are followed in terms of which security properties need to be ensured when operating BGP speakers. Furthermore, the listed options in this document may change over time, and should not be used as a timeless ground-truth of applicable or sufficient methods.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 5 April 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The Border Gateway Protocol (BGP), specified in [RFC4271], is the protocol used in the Internet to exchange routing information between network domains. BGP does not directly include mechanisms that control whether the routes exchanged conform to the various guidelines defined by the Internet community. Besides, BGP itself, by its design, does not have any direct way to protect itself against possible security-related threats. This document intends to serve as a snapshot of currently available methods for ensuring BGP security.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The methods listed in this document are intended for BGP sessions carrying generic Internet routing information within the DFZ. It specifically does not cover security mechanics for other uses of BGP, e.g., when using BGP for NLRI exchange in a data-center context.¶
This document is a non-exhaustive and non-authoritative repository of available tools, methods, and techniques. When consulting this document, operators should consider that available tools and mechanisms, as well as the described circumstances and considerations may change over time.¶
The document does not make specific recommendations for the use of specific mechanisms, implementations, or configurations. Instead, operators are advised to carefully weigh the implications of listed methods and to apply their own judgement to assess whether these methods are appropriate for their network to ensure BGP security properties.¶
The BGP speaker needs to be protected from external attempts to subvert the BGP session. Furthermore, access to management services of the BGP speaker should be limited to neighbors, as these services usually share resources with the control plane and, e.g., automated attacks on management ports may impact the BGP speaker's ability to execute BGP related tasks.¶
To protect a BGP speaker on the network layer, the ability to connect to TCP port 179 on the local device should be restricted to known addresses that are permitted to become a BGP neighbor. Experience has shown that the natural protection TCP should offer is not always sufficient, as it is sometimes run in control-plane software. In the absence of ACLs, it is possible to attack a BGP speaker by simply sending a high volume of connection requests to it. This protection SHOULD be implemented by using an Access Control List (ACL) to limit access to TCP port 179 to authorized hosts.¶
If supported, an ACL specific to the control plane of the router SHOULD be used (receive-ACL, control-plane policing, etc.), to avoid configuration of data-plane filters for packets transiting through the router (and therefore not reaching the control plane). If the hardware cannot do that, interface ACLs can be used to block packets addressed to the local router.¶
Some routers automatically program such an ACL upon BGP configuration. On other devices, this ACL should be configured and maintained manually or using scripts.¶
In addition to strict filtering, rate-limiting MAY be configured for accepted BGP traffic. Rate-limiting BGP traffic consists in permitting only a certain quantity of bits per second (or packets per second) of BGP traffic to the control plane. This protects the BGP router control plane in case the amount of BGP traffic surpasses platform capabilities.¶
Furthermore, it is possible to use non-gloablly reachable addresses for BGP session links. Options include using IPv4 routes with an IPv6 next hop in IPv4 sessions (see [RFC9229]), using prefixes not advertised in the GRT ([TBD]), using unnumbered BGP/Link-Local addresses (also using [RFC9229]), or using [RFC1918] addresses for IPv4 sessions. Even though routing based network layer protection MAY be implemented, it SHOULD only be done in addition to deploying ACLs. If any of these approaches is utilized, it MUST be ensured that the BGP speaker originates Path MTU Discovery related packets (see [RFC1191] for IPv4 and [RFC8201] for IPv6) from a globally reachable address, to ensure that Reverse Path Filtering of external parties does not interfere with PMTUD discovery for transiting traffic.¶
Usually, a BGP speaker's management interface is also reachable in-band, i.e., via the default routing domain / VRF (Virtual Routing Fabric) of the control plane. To make it easier to separate BGP and management related control plane traffic, management traffic SHOULD be exclusively handled via dedicated out-of-band management. This network SHOULD be protected from unauthorized connections by ACLs not handled on the BGP speaker itself to ensure that the control plane cannot be overloaded by attacks on the management interfaces of the BGP speaker.¶
Please note that, in general, filtering and rate-limiting of control-plane traffic is a wider topic than "just for BGP". For further recommendations on how to protect the router's control plane, see [RFC6192] )¶
Current security issues of TCP-based protocols (therefore including BGP) have been documented in [RFC6952]. The following subsections list the major points raised in this document and give the best practices related to TCP session protection for BGP operation.¶
Attacks on TCP sessions used by BGP (aka BGP sessions), for example, sending spoofed TCP RST packets, could bring down a BGP session. Following a successful ARP spoofing attack (or other similar man-in- the-middle attack), the attacker might even be able to inject packets into the TCP stream (routing attacks).¶
BGP sessions can be secured with a variety of mechanisms.¶
MD5 protection of the TCP session header, described in [RFC5925], was the first available mechanism to protect the integrity of a BGP session. It has been obsoleted by the TCP Authentication Option (TCP-AO; [RFC5925]), which offers stronger protection. While MD5 is still the most used mechanism due to its availability in vendors' equipment, TCP-AO SHOULD be preferred when implemented by both sides of a session.¶
Optionally, if TCP-AO is not supported, while both sides of the BGP session can support a stronger authentication algorithm than MD5, such as SHA-1 or SHA-256, using the stronger method SHOULD be considered. Aside from that, using keychain-based cryptographic keys lifecycle management, as suggested in [RFC6518] is highly RECOMMENDED.¶
Additionally, IPsec could also be used for session protection. At the time of publication, there has been no wide-spread adoption of using IPsec for BGP sessions, and further analysis is required to define guidelines.¶
The drawback of TCP session protection is additional configuration and management overhead for the maintenance of authentication information (for example, MD5 passwords). In either case, protection of TCP sessions used by BGP SHOULD be enabled when BGP sessions are established over shared networks where the risk of spoofing is high (like IXPs). Operators are also RECOMMENDED to consider the trade-offs and apply BGP session protection on all other external BGP sessions as well.¶
Aside of this, most vendors use simple, reverse-decryptable password hash algorithm to store shared secrets keys for BGP (and other routing protocols) in devices' configuration files. While this practice simplifies password management tasks, since the passwords can always easily be deciphered, it carries the risk of leaking this information if a configuration is shared, e.g., with a vendor for a support case, or if the device is decommissioned and later resold without having been wiped. Hence, if a device offers more secure storage mechanisms for secrets, these SHOULD be used.¶
Furthermore, operators SHOULD block spoofed packets (packets with a source IP address not belonging to their IP address space) at all edges of their network (see [RFC2827] and [RFC3704] ). This protects the TCP session used by Internal BGP (iBGP) from attackers outside the Autonomous System. Similarly, the considerations for using non globally reachable addresses for links handling BGP sessions from Section 3.1 apply accordingly.¶
Furthermore, as an additional security measure, iBGP sessions SHOULD also be protected using the authentication mechanisms discussed above.¶
In 2018 an attack on BGP was described in the literature which claims to enable BGP route injection without Layer 2 adjacency by leveraging PMTUD, see ([FENG-22]). The attack leverages packet fragmentation to bypass standard TCP protection mechanisms, so routes can be injected into an established BGP session. While the attack would be mitigated by the integrity mechanisms suggested in Section 4.1.1, operators SHOULD additionally take precautions to defend against these attacks, especially if authentication mechanisms are not in use. To mitigate this attack, BGP speakers should not allow packet fragmentation on the control plane for BGP traffic between themselves and their neighbors. This is feasible, as even on multi-hop sessions, the path MTU should be known to the operators, meaning that it can be statically and consistently configured for both speakers involved in a session to prevent the need for fragmentation. Hence, operators SHOULD ensure that fragmentation is neither allowed nor necessary for BGP packets between two BGP speakers. If this is not possible, a strict lower limit for the MTU SHOULD be configured. This is usually done for TCP packets like those for a BGP session using MSS (Maximum Segment Size) clamping. Given that IPv6 requires an MTU of at least 1280b [RFC8200], and to keep clamping consistent between IPv4 and IPv6, an MTU of 1280b, i.e., an MSS of 1240b for IPv4 and 1220b for IPv6, is the RECOMMENDED minimum.¶
BGP sessions can be made harder to spoof with the Generalized TTL Security Mechanisms (GTSM aka TTL security), defined in [RFC5082]. Instead of sending TCP packets with TTL value of 1, the BGP speakers send the TCP packets with TTL value of 255, and the receiver checks that the TTL value equals 255. Since it's impossible to send an IP packet with TTL of 255 to an IP host that is not directly connected, BGP TTL security effectively prevents all spoofing attacks coming from third parties not directly connected to the same subnet as the BGP-speaking routers. Operators SHOULD implement TTL security on directly connected BGP neighbors.¶
GTSM could also be applied to multi-hop BGP session as well. To achieve this, TTL needs to be configured with a proper value depending on the distance between BGP speakers (using the principle described above). Nevertheless, it is not as effective because anyone inside the TTL diameter could spoof the TTL.¶
Like MD5 protection, TTL security has to be configured on both ends of a BGP session.¶
The main aspect of securing BGP resides in controlling the prefixes that are received and advertised on the BGP session. Prefixes exchanged between BGP neighbors are controlled with inbound and outbound filters that can match on well-known/statically typed IP prefixes (as described in this section), a combination of Prefix and AS paths (see ), BGP roles as (see Section 6.5.1), or any other attributes of a BGP prefix (for example, BGP communities, as described in Section 6.5.2).¶
This section lists the most commonly used static prefix filters. We define static prefixes as prefixes that are published via an authoritative list which changes, on average, not more frequently than every 12 months. We will utilize these definitions of static prefixes in Section 8 to clarify where and how these filters should be applied.¶
The IANA IPv4 Special-Purpose Address Registry [IANAv4Spec] maintains the list of IPv4 special-purpose prefixes and their routing scope, and it SHOULD be used for prefix-filter configuration. Prefixes with value "False" in column "Global" SHOULD be discarded on Internet BGP sessions (eBGP).¶
The IANA IPv6 Special-Purpose Address Registry [IANAv6Spec] maintains the list of IPv6 special-purpose prefixes and their routing scope, and it SHOULD be used for prefix-filter configuration. Only prefixes with value "False" in column "Global" SHOULD be discarded on Internet BGP sessions.¶
IANA allocates prefixes to RIRs that in turn allocate prefixes to LIRs (Local Internet Registries). While it is in general sensible to not accept routing table prefixes that are not allocated by IANA and/or RIRs, it is important to understand that filtering unallocated prefixes requires constant updates, as prefixes are continually allocated. Therefore, automation of such prefix filters is key for the success of this approach. Operators SHOULD NOT consider solutions described in this section if they are not capable of maintaining updated prefix filters: the damage would probably be worse than the intended security policy. In this section we focus on IP address space allocated to RIRs by IANA. Allocations by RIRs are generally more dynamic. Therefore, we will discuss using RIR level data in Section 6.1.¶
IANA has allocated all the IPv4 available space. Therefore, there is no reason why operators would keep checking that prefixes they receive from BGP neighbors are in the IANA-allocated IPv4 address space [IANAv4Reg]. No specific filters need to be put in place by operators who want to make sure that IPv4 prefixes they receive in BGP updates have been allocated by IANA.¶
For IPv6, given the size of the address space, it can be seen as wise to accept only prefixes derived from those allocated by IANA. Operators can dynamically build this list from the IANA- allocated IPv6 space [IANAv6Reg]. As IANA keeps allocating prefixes to RIRs, the aforementioned list should be checked regularly against changes, and if they occur, prefix filters should be computed and pushed on network devices. The list could also be pulled directly by routers when they implement such mechanisms. As there is delay between the time an RIR receives a new prefix and the moment it starts allocating portions of it to its LIRs, there is no need for doing this step quickly and frequently. However, operators SHOULD ensure that all IPv6 prefix filters are updated within a maximum of one month after any change in the list of IPv6 prefixes allocated by IANA.¶
If the process in place (whether manual or automatic) cannot guarantee that the list is updated regularly, then it's better not to configure any filters based on allocated networks. The IPv4 experience has shown that many network operators implemented filters for prefixes not allocated by IANA but did not update them on a regular basis. This created problems for the latest allocations, and required extra work for RIRs that had to "de-bogonize" the newly allocated prefixes. (See [RIPE-351] for information on de-bogonizing.)¶
Most ISPs will not accept advertisements beyond a certain level of specificity (and in return, they do not announce prefixes they consider to be too specific). That acceptable specificity is decided for each session between two BGP neighbors. Some ISP communities have tried to document acceptable specificity. This document does not make any judgement on what the best approach is, it just notes that there are existing practices on the Internet and recommends that the reader refer to them. As an example, the RIPE community has documented that, at the time of writing of this document, IPv4 prefixes longer than /24 and IPv6 prefixes longer than /48 are generally neither announced nor accepted in the Internet [RIPE-399] [RIPE-532]. These values may change in the future.¶
Some operators MAY choose to allow customers to additionally announce more specifics than commonly used on the Internet (see Section 5.3). This can be to allow customers more fine-grained traffic steering in case of multiple BGP sessions between the AS and its customer in multiple locations, and/or to sub-delegate IPv4 address space smaller than a /24 from the AS' allocation to the customer.¶
In that case, the operators SHOULD add a specific accept rule for these exact prefixes before Rule 11. Routes of this type SHOULD be annotated in away that ensures they are not re-exported to other neighbors (see Section 6.5.2). Furthermore, in case of using more specifics for traffic steering, the customer SHOULD also announce at least the covering /24 to ensure global reachability of the prefix and prevent issues with uRPF (see also [RFC8704] and Section 6.6.3).¶
Similar to too specific routes, most ISPs will not accept advertisements beyond a certain level of aggregation. The general guideline here are the least specific allocations commonly handed out by RIRs to LIRs. At the moment, the largest allocations for IPv4 are continuous /8. For IPv6, one /13 allocation exists, followed by several LIRs holding /19. Several operators currently limit the smallest prefix size for IPv6 to /16. This document does not make any judgement on what the best approach is, it just notes that there are existing practices on the Internet and recommends that the reader refer to them. These values may change in the future.¶
In this section, we discuss dynamic prefix filters, i.e., filters that decide whether a prefix should be im- or exported or not based on frequently changing parameters and external resources.¶
A more precise check can be performed when one would like to make sure that received prefixes are being originated or transited by Autonomous Systems (ASes) entitled to do so. It has been observed in the past that an AS could easily advertise someone else's prefix (or more specific prefixes) and create black holes or security threats. To partially mitigate this risk, administrators would need to make sure BGP advertisements correspond to information located in the existing registries.¶
An Internet Routing Registry (IRR) is a database containing Internet routing information, described using Routing Policy Specification Language objects as described in [RFC4012]. Operators are given privileges to describe routing policies of their own networks in the IRR, and that information is published, usually publicly. A majority of Regional Internet Registries do also operate an IRR and can control whether registered routes conform to the prefixes that are allocated or directly assigned. However, it should be noted that the list of such prefixes is not necessarily a complete list, and as such the list of routes in an IRR is not the same as the set of RIR-allocated prefixes. Furthermore, especially IRRs not operated by RIRs regularly list conflicting information, see Section 6.1.¶
The corner stone of IRR based information are ROUTE (IPv4) and ROUTE6 (IPv6) objects. These document, for a given prefix, the AS/ASes allowed to originate the prefix. Note that for a given prefix also more specific objects may exist. However, technically, the semantic of a ROUTE/ROUTE6 object is that of an exact match.¶
Operators SHOULD create ROUTE/ROUTE6 objects for all prefixes they do or do plan to originate.¶
An AS-SET is an object that contains AS numbers or other AS-SETs. The purpose of AS-SETs is creating a recursively queryable structure documenting the cone of an AS. An operator may create an AS-SET defining all AS numbers of its customers. A transit provider might create an AS-SET listing the AS numbers or AS-SETS of those ASes it provides upstream to. In turn, these ASes describe the AS numbers/AS-SETS of their customers, etc. Using recursion, it is possible to retrieve from an AS-SET the complete list of AS numbers that the neighbor is likely to announce. For each of these AS numbers, it is also easy to look in the corresponding IRR for all associated prefixes.¶
Please note that different IRR may provide conflicting data, especially on AS-SETs. Recently, an attack was observed where a malicious party created an empty AS-SET for a large transit provider (see [NLNOG-22]). As it was created in an RIR database often taking precedent over other IRR sources, several ASes imported this empty AS-SET, and hence filtered all prefixes advertised by this transit provider. To mitigate this issue, hierarchical AS-SETs reside in the IRR of the RIR and explicitly list the ASN to which they pertain, e.g., AS65536:AS-EXAMPLE. Additionally, the IRR source may also be referenced: RIPE::AS65536:AS-EXAMPLE.¶
Operators SHOULD create a hierarchical AS-SET representing their cone. If AS-SETs are included in another AS-SET, they SHOULD be hierarchical.¶
Using AS-SETs and ROUTE/ROUTE6 objects, it is possible to use the IRR information to build, for a given neighbor AS, a list of prefixes the neighbor is authorized to originated or transited. This can be done relatively easily using scripts and existing tools capable of retrieving this information from the registries. This approach is exactly the same for both IPv4 and IPv6.¶
The macro-algorithm for the script is as follows. For the neighbor that is considered, the distant operator has provided the AS and may be able to provide a hierarchically named AS-SET object (aka AS-MACRO). With these two mechanisms, a script can build, for a given neighbor, that lists allowed prefixes and the AS number from which they should be originated. One could decide not to use the origin information and only build monolithic prefix filters from fetched data combining prefixes a neighbor is authorized to transit and originate.¶
As prefixes, AS numbers, and AS-SETs may not all be under the same RIR authority, it is difficult to choose for each object the appropriate IRR to poll. Some IRRs have been created and are not restricted to a given region or authoritative RIR. They allow RIRs to publish information contained in their IRR in a common place. They also make it possible for any subscriber (probably under contract) to publish information too. When doing requests inside such an IRR, it is possible to specify the source of information in order to have the most reliable data. One could check a popular IRR containing many sources (such as RADb [RADb], the Routing Assets Database) and only select as sources some desired RIRs and trusted major ISPs (Internet Service Providers).¶
As objects in IRRs may frequently vary over time, it is important that prefix filters computed using this mechanism are refreshed regularly. Refreshing the filters on a daily basis SHOULD be considered because routing changes must sometimes be done in an emergency and registries may be updated at the very last moment. Note that this approach significantly increases the complexity of the router configurations, as it can quickly add tens of thousands of configuration lines for some important neighbors, e.g., large peers or downstreams. To manage this complexity, operators could use, for example, bgpq4 [bgpq4], a set of tools making it possible to simplify the creation of automated filter configuration from policies stored in an IRR.¶
SIDR (Secure Inter-Domain Routing), described in [RFC6480], has been designed to secure Internet advertisements. Even though technically incorrect, as it is only the name of an important component, the use of techniques entailed in SIDR is commonly referred to as RPKI (Resource Public Key Infrastructure).¶
There are basically two services that SIDR offers:¶
Implementing SIDR mechanisms is expected to solve many of the BGP routing security problems in the long term, but it may take time for deployments to be made and objects to become signed. Also, note that the SIDR infrastructure is complementing (not replacing) the security best practices listed in this document. Therefore, operators SHOULD implement any SIDR proposed mechanism (for example, route origin validation) on top of the other existing mechanisms even if they could sometimes appear to be targeting the same goal.¶
If route origin validation is implemented, the reader SHOULD refer to the rules described in [RFC7115]. In short, each external route received on a router SHOULD be checked against the Resource Public Key Infrastructure (RPKI) data set:¶
In addition to this, operators SHOULD sign their routing objects so their routes can be validated by other networks running origin validation. Please note that, when signing routing objects, operators SHOULD strive to create minimally covering ROAs for their intended announcements, see [RFC7115] and [RFC9319], to reduce the attack surface of forged-origin hijacks and attempts to exhaust routers' route processing capacity in terms of memory and CPU [KIRIN-22]. For example, if an operator received a /29 allocation and intends to announce it in a deaggregation of /32, the corresponding ROA should cover the /29 with a longest allowed prefix of /32, instead of signing for a deaggregation up until /48.¶
One should understand that the RPKI model brings new, interesting challenges. The paper "On the Risk of Misbehaving RPKI Authorities" [hotRPKI] explains how the RPKI model can impact the Internet if authorities don't behave as they are supposed to. Further analysis is certainly required on RPKI, which carries part of BGP security.¶
If autonomous system provider authorization is implemented, the reader SHOULD refer to the rules described in [I-D.ietf-sidrops-aspa-verification]. In short, each external route received on a router SHOULD be checked against the ASPA record found in the Resource Public Key Infrastructure (RPKI) based on the relationship to the neighbor.¶
In [I-D.ietf-sidrops-aspa-verification], see following sections based on the neighbor relationship:¶
ASPA validation can result in one of three outcomes, VALID, INVALID, and UNKNOWN.¶
A key component of RPKI ROV is a validator that collates ROAs from the RIR TAs and distributes this information to routes (via the RTR protocol, or others per operator preference). Operators SHOULD run their own validator and SHOULD NOT outsource the collection and validation of ROAs to a third party.¶
A network SHOULD filter its own prefixes on BGP sessions with all its neighbors (inbound direction). This prevents local traffic (from a local source to a local destination) from leaking over an external BGP session, in case someone else is announcing the prefix over the Internet. This also protects the infrastructure that may directly suffer if the backbone's prefix is suddenly preferred over the Internet.¶
In some cases, for example, multihoming scenarios, such filters SHOULD NOT be applied, as this would break the desired redundancy.¶
Filtering prefixes belonging to multi-homed downstreams on sessions with other ASes is NOT RECOMMENDED. This practice may lead to blackholing of traffic if the filter is semi-statically configured, i.e., not removed upon withdrawal of the specific prefix by a downstream. Downstreams may choose to not advertise prefixes to an upstream for a variety of reasons, including traffic engineering and Denial-of-Service attack response. Instead, operators SHOULD assign downstreams' prefixes learned from other neighbors a lower priority than those routes directly learned from downstreams. This can be done, e.g., by adding additional path prepends or using local preference settings. Please note, though, that using local preferences for this purpose may lead to a situation where a downstream is unable to perform traffic engineering apart from withdrawing a route towards its upstream in case of, e.g., a congested link in a multi-homed setup.¶
Even though filtering prefixes belonging to single-homed downstreams on sessions with other ASes carries less risk of immediate negative impact, it is crucial that operators coordinate closely with their downstream if such practices are applied. Otherwise, if a downstream becomes multi-homed connectivity issues may appear. Hence, assuming that other appropriate filters are in place ensuring, e.g., validity of the announcing AS and the AS-PATH, see Section 8.2, not filtering prefixes originated by downstreams on sessions with other ASes solely based on the prefix is NOT RECOMMENDED.¶
TODO: Make more general about annotating routes, also include BGP neighbor roles.¶
Prefixes learned from BGP neighbors may technically conform to static metrics and filter types discussed above. For example, when learning prefixes from peers and/or upstreams which have been originally announced by downstreams of an AS, it is crucial to not leak these routes to upstreams and peers in case they are preferred over those learned directly from a downstream. This may occur, for example, if a downstream uses path prepending with an upstream, while the upstream has a peering session with another AS which is also an upstream of said downstream. With the route advertised by the peer being shorter, the AS may export the learned route via the peer if:¶
To counteract this issue, outbound filtering should consider the source type, i.e., relationship to the neighbor from whom a route was originally learned.¶
To ensure that no prefixes leak via AS relationships (routes learned from peers or upstreams to other peers or upstreams), [RFC9234] introduces BGP roles and the BGP Only to Customer (OTC) attribute. The OTC attribute forms a tandem with ASPA, see Section 6.2.2. Operators SHOULD configure appropriate roles according to Section 3 of [RFC9234] to enable prefix filtering based on BGP relationships. Furthermore, for prefixes imported from upstreams, the OTC attribute SHOULD be set and evaluated according to [RFC9234], Section 5:¶
When OTC is being used, and a route is received, it should be handled as follows:¶
Despite a fall-back mechanism being implemented to support one-sided BGP roles, they must be supported by both neighbors in a BGP session to be fully effective. To completely cover an AS, all neighbors should utilize BGP roles on their sessions. Hence, if at least one neighbor does not yet utilize BGP roles, or if the operator cannot deploy BGP roles and/or use the OTC attribute on their own infrastructure, operators SHOULD additionally utilize BGP large-communities to annotate where they learned prefixes and filter accordingly on sessions where they re-announce these prefixes, see [RFC8195]. While technically possible, standard BGP communities (see [RFC1997]) SHOULD NOT be used for this purpose due to the prevalence of 32bit ASNs which can only be represented in large-communities (see [RFC8092]).¶
Operators SHOULD designate a large community namespace for each neighbor relationship, for example, OPERATOR_ASN:100:NEIGHBOR_ASN for upstreams, OPERATOR_ASN:101:NEIGHBOR_ASN for peers, OPERATOR_ASN:102:NEIGHBOR_ASN for downstreams, etc. These communities SHOULD cover all relationships documented in Section 3 of [RFC9234]. Additionally, if operators allow downstreams to announce more specifics than generally accepted in the GRT (see [CCR-22]), they should dedicate a large-community list to that purpose as well, to ensure they can effectively prevent re-announcements of these prefixes.¶
For information on how these annotations SHOULD be included in filter sets, please see Section 8.¶
Within the IXP community, most IXPs prefer the IXP LAN prefix to not be advertised to the GRT ([TBD]). While some IXPs may opt to advertise the IXP LAN prefix, e.g., with the route server's ASN, operators present on an IXP MUST respect the choice of the IXP regarding the advertisement state of the IXP LAN prefix. Furthermore, e.g., the RIPE region now reached consensus on reducing the initial IXP allocation size for IPv4 (see [RIPE-804]) above their own limits on maximum prefix lengths acceptable in the GRT (see [RIPE-399] and [RIPE-532]). When a network is present on an IXP and has sessions with other IXP members over a common subnet (IXP LAN prefix), it SHOULD NOT accept exact matches or more-specific prefixes for the IXP LAN prefix from any of its external BGP neighbors. Accepting these routes may create a black hole for connectivity to the IXP LAN. To reduce the risk of accidental route leaks of IXP LAN prefixes for which the corresponding IXP opted to not have them in the GRT, operators MAY choose to use "BGP next-hop-self" on all routes learned on that IXP to not be required to distribute the IXP LAN Prefix within their IGP. Furthermore, IXPs may opt to create ROAs indicating AS0 as the only valid origin AS if they want to prevent their prefixes from being announced on the Internet.¶
If the IXP LAN prefix is accepted at all, it SHOULD only be accepted from the ASes that the IXP authorizes to announce it -- this will usually be automatically achieved by filtering announcements using RPKI and/or IRR database.¶
It is suggested (see also [APNICTRN-17]), that operators dedicate routers for connections to an IXP that SHOULD only carry routes from the ASes cone, and not a full-table or default-route. This reduces the chance of accidental route leaks and prevents other IXP members from pointing default routes via the IXP LAN to such a router. Alternative, MAY use a separate routing context (e.g. VRF) for IXP peerings, which only containins routes form the local AS cone.¶
Originally, in order to have PMTUD working in the presence of loose uRPF, it would be necessary that all the networks that may source traffic that could flow through the IXP have a route for the IXP LAN prefix. This relates to "packet too big" ICMP messages sent by IXP members' routers potentially being sourced using an address of the IXP LAN prefix. In the presence of loose uRPF, this ICMP packet is dropped if there is no route for the IXP LAN prefix or a less specific route covering the IXP LAN prefix.¶
Hence, similar to considerations in Section 3 regarding non globally routable transit networks, IXP members SHOULD ensure that "packet too big" ICMP messages sent by their routers have a source address in IP address space advertised to the GRT, e.g., the router's loopback address. Note that this issue causes service interruption in case of lost "packet too big" messages, but may also reduce debuggability in, e.g., traceroutes. If they decide to implement this behavior for all ICMP messages, operators SHOULD ensure that this address is only used for ICMP messages egressing via the interface connected to the IXP LAN. Otherwise, readability of traceroutes will be significantly reduced, as the specific interface a packet passed through is no longer visible in traceroutes.¶
The BGP route flap dampening mechanism makes it possible to give penalties to routes each time they change in the BGP routing table [RFC2439]. Initially, this mechanism was created to protect the entire Internet from multiple events that impact a single network. Studies have shown that implementations of BGP route flap dampening could cause more harm than benefit; therefore, in the past, the RIPE community has recommended against using BGP route flap dampening [RIPE-378]. Later, studies were conducted to propose new route flap dampening thresholds in order to make the solution "usable"; see [RFC7196] and [RIPE-580] (in which RIPE reviewed its recommendations). Following IETF and RIPE recommendations and using BGP route flap dampening with the adjusted configured thresholds is RECOMMENDED.¶
A spike in the number of received and imported prefixes can be a threat to the availability of a BGP speaker. Furthermore, a significant increase in the number of prefixes received from a neighbor might indicate a misconfiguration, e.g., a failure in outbound filtering for the advertising neighbor, or a failure in inbound filtering in the ingesting neighbor. Finally, it is important to limit the overall GRT growth given theoretical attacks utilizing deaggregation of IPv6 prefixes to globally exhaust routers' memory and CPU capacity (see [KIRIN-22]), the number of prefixes accepted to be originated by a neighboring AS across all BGP sessions should be limited.¶
It is RECOMMENDED to configure a limit on the number of routes to be accepted from a neighbor. The following rules are generally RECOMMENDED:¶
It is important to regularly review the limits that are configured as the Internet can quickly change over time. Some vendors propose mechanisms to have two thresholds: while the higher number specified will shut down the session, the first threshold will only trigger a log and can be used to passively adjust limits based on observations made on the network.¶
When enforcing limits on the number of prefixes sent by neighbors, including upstreams, an operator may lose connectivity to one or multiple peers if, e.g., the GRT or the number of routes in the peer's cone suddenly increases. Such a sudden growth might occur due to organic effects, but could also be triggered by a malicious actor.¶
For example, with RPKI allowing operators to sign ROAs specifying a minimum and maximum prefix length (contrary to ROUTE/ROUTE6 objects), researchers noted that this allows deaggregation attacks ([KIRIN-22]). By configuring a ROA that cover an, e.g., /32, one can effectively authorize an AS to announce 65536 unique prefixes. Leveraging the by now large availability of free and/or cheap opportunities to obtain IPv6 upstream, a malicious party could leverage this to cause significant Internet wide route churn and GRT growth. By constantly advertising and withdrawing prefixes, churn exceeding the size of the IPv6 fulltable at the time of writing (around 200k prefixes) could be created by constantly announcing and withdrawing prefixes to upstream ASes at various PoPs.¶
It is therefore RECOMMENDED that operators implement continuous monitoring of all prefix limits configured on BGP sessions. That monitoring SHOULD include verifying configured prefix limits against published information on suggested prefix limits by neighbors, if available. Furthermore, the monitoring SHOULD notify operators of sudden changes in the number of received prefixes, as well as of limits being gradually approached over time.¶
This section discusses filtering AS_PATHs, as well as recommendations for AS_PATH manipulation, and which practices to avoid there.¶
This section lists the RECOMMENDED practices when processing BGP AS_PATHs in addition the considerations from Section 6.¶
This section lists the RECOMMENDED practices when manipulating BGP AS_PATHs, to limit chances of accidentally producing AS_PATHs that would have to be filtered by neighbors according to Section 7.3.1.¶
Some BGP implementations offer various advanced AS_PATH manipulation features, such as overriding or rewriting a part of the AS_PATH. For instance, a very commonly used mechanism is the so-called "AS Override" feature, primarily intended for use in MPLS L3 VPNs, where the customer's AS number is overridden with the provider's AS number, to allow site-to-site communication where both customer sites use the same AS number. Some vendors went even further, offering a possibility to fully rewrite or even delete the AS_PATH Attribute from incoming or outgoing BGP Update messages.¶
Furthermore, AS_PATH filtering is an option when ASN renumbering is done. Such an operation is common, and mechanisms exist to allow smooth ASN migration [RFC7705]. The usual migration technique, local to a router, consists of modifying the AS_PATH so it is presented to a neighbor with the previous ASN, as if no renumbering was done. This makes it possible to change the ASN of a router without reconfiguring all eBGP neighbors at the same time (as that operation would require synchronization with all neighbors attached to that router). During this renumbering operation, the rules described above may be adjusted.¶
In principle, use of any AS_PATH modification mechanism except AS_PATH prepend in the public Internet SHOULD be avoided at all. Also, as discussed already, AS_PATH prepends SHOULD NOT be excessive. Operators are RECOMMENDED to not prepend more than five times. The "AS Override" feature MAY still be used in closed environments, such as VPNs not directly exchanging any NLRIs with the Internet. AS_PATH rewriting/deleting SHOULD be avoided. Especially the practice of providing upstream to customers using a private ASN and then using rewriting on either side is strongly NOT RECOMMENDED.¶
When establishing sessions via a shared network, like an IXP, BGP can advertise prefixes with a third-party next hop, thus directing packets not to the neighbor announcing the prefix but somewhere else.¶
This is a desirable property for BGP route-server setups [RFC7947], where the route server will relay routing information but has neither the capacity nor the desire to receive the actual data packets. So, the BGP route server will announce prefixes with a next-hop setting pointing to the router that originally announced the prefix to the route server.¶
In direct sessions between ASes via an IXP LAN, this is undesirable, as one of the neighbors could trick the other one into sending packets into a black hole (unreachable next hop) or to an unsuspecting third party who would then have to carry the traffic. Especially for black-holing, the root cause of the problem is hard to see without inspecting BGP prefixes at the receiving router of the IXP.¶
Therefore, an inbound route policy SHOULD be applied on direct sessions via an IXP LAN in order to set the next hop for accepted prefixes to the BGP neighbor's IP address (belonging to the IXP LAN) that sent the prefix (which is what "next-hop-self" would enforce on the sending side).¶
This policy SHOULD NOT be used on sessions with route-servers or on sessions where operators intentionally permit the other side to send third-party next hops.¶
This policy also SHOULD be adjusted if the best practice of Remote Triggered Black Holing (aka RTBH as described in [RFC6666]) is implemented. In that case, operators would apply a well-known BGP next hop for routes they want to filter (if an Internet threat is observed from/to this route, for example). This well-known next hop will be statically routed to a null interface. In combination with a unicast RPF check, this will discard traffic from and toward this prefix. BGP speakers can exchange information about black holes using, for example, particular BGP communities, see [RFC6666]. Operators could propagate black-hole information to their neighbors using an agreed-upon BGP community: when receiving a route with that community, a configured policy could change the next hop in order to create the black hole.¶
For BGP, BGP communities [RFC1997], extended BGP communities [RFC4360], and BGP large-communities [RFC8092] have been defined for additional inband signaling. In the remainder of this section, we use the term 'BGP communities' to mean [RFC1997] and [RFC8092] BGP communities alike, while we explicitly refer to [RFC4360] as 'extended BGP communities'.¶
Communities are useful in iBGP and eBGP alike. For example, BGP communities are often used by operators to allow neighbors to signal additional traffic engineering requirements, e.g., asking an upstream not to announce a specific NLRI to one of its neighbors. Similarly, BGP communities are essential for proper filtering of downstreams' prefixes in the absence of ASPA/OTC. While usually more focused on L2VPN and L3VPN scenarios, extended BGP communities may also find specific use when interacting with external neighbors, see, e.g., [RFC4364], Inter-AS VPN Option B. Hence, while they should generally should not act transitively, operators SHOULD nevertheless ensure that these communities do not accidentally leak.¶
However, as they may carry instructive information, external unauthorized neighbors should not be allowed to send NLRI with AS specific BGP communities. Similarly, internally used BGP communities may reveal non-public information or cause disturbance in misconfigured networks. The in- and outbound filtering rules for all forms of BGP communities in Section 7.5.1 and Section 7.5.2 are RECOMMENDED¶
Additionally, please note the following general recommendations for community scrubbing:¶
While there is a list of well-known and defined transitive BGP attributes, operators sometimes accidentally or intentionally use undocumented BGP attributes. Similarly, newly introduced attributes may not yet be known to a specific implementation.¶
In general, unknown transitive BGP attributes SHOULD NOT be filtered. However, sometimes bugs may occur in implementations that require filtering or correction of attributes on the border to protect BGP speakers before a patch for the implementation is available.¶
This section documents practices for scrubbing and normalizing BGP attribute related data in received NLRI.¶
Over the past years several instances of network disruptions due to routers being unable to process specific BGP attributes were encountered. As such, operators MAY opt to temporarily scrub specific BGP attributes known to cause service disruptions on their infrastructure. Operators SHOULD NOT scrub unknown transitive attributes in general.¶
However, while being a very useful tool, BGP attribute scrubbing features may cause undesired effects and sometimes even large-scale outages as well. Therefore, they MUST NOT be used as a permanent solution, but only as a last-resort temporary workaround. Furthermore, removing mandatory BGP attributes and optional attributes commonly used in the Internet, such as AS_PATH, Communities, MED etc. may have a significant negative impact beyond an operator's own AS. Hence, it is RECOMMENDED that such attributes are never removed when importing NLRI.¶
When sending NLRI to external neighbors, operators SHOULD avoid sending not yet standardized or only internally used attributes, i.e., scrub attributes they added which are not in public use before exporting NLRI.¶
BGP attributes are stored within BGP UPDATE messages as a vector of Type-Length-Value (TLV) fields. The Attribute Type field contains a set of control bits, such as the Optional Bit (set to 1 for Optional Attributes and 0 for Well-Known), the Transitive Bit (specifying whether the attribute is Transitive, i.e., should be propagated outside the local AS or, Non-Transitive, i.e., should not be propagated outside the local AS) etc. Initially, [RFC4271] mandated that a BGP speaker tears down a BGP session when receiving even a single UPDATE message containing a malformed combination of Attribute TLV headers. However, [RFC7606] allows BGP implementers to optionally add features providing self-correction of malformed attributes in a limited number of cases.¶
Operators MAY use such, self-correcting mechanisms for BGP Attribute TLV headers. However, they SHOULD consider the operational impact such features have, SHOULD monitor for cases where such self-correction is necessary, and SHOULD follow up on such cases to ensure that root-causes are identified and addressed.¶
As documented in [RFC3345], the use of BGP Route Reflection [RFC4456] and BGP Confederation [RFC5065] can lead to route oscilation, especially in conjunction with the MULTI_EXIT_DISC (MED) attribute (see [RFC4271]). If BGP route oscilation occurs, routes may be blackholed if dampening is implemented by neighbors, or individual BGP speakers may become overloaded, further aggravating the oscilation issue.¶
Hence, operators SHOULD familiarize themselves with [RFC7964], which describes methods and approaches to counteract MED related route-oscilation. Operators SHOULD carefully evaluate their network's requirements and implement the practices documented in [RFC7964] as appropriate.¶
IXPs are an essential aspect of the modern Internet, and contribute to keeping local traffic local. As such, IXP fabrics often handle a significant amount of traffic, providing challenges for traffic engineering. Hence, this section documents best practices when connecting to an IXP that inflict on the reliability of the global routing ecosystem.¶
Given that traffic forwarded via an IXP can be more cost-efficient than sending that same traffic via an upstream, many operators set a higher LOCAL_PREF for NLRI received via an IXP. This means that all traffic from the AS and all members of its cone routing via this AS will preferentially be routed via these paths (see [RFC4271]), effectively overriding the effect of AS_PATH prepending, see also [I-D.ietf-grow-as-path-prepending].¶
As noted in [I-D.ietf-grow-as-path-prepending], setting a higher LOCAL_PREF on IXP links means that neighbors on the IXP can no longer use AS_PATH prepending for, e.g., traffic engineering. More crucially, it prevents operators from draining traffic flowing via an IXP when necessary, e.g., prior to a scheduled maintenance. Especially when NLRI are exchanged via an RS, simply terminating the session is usually not possible without also impacting other neighbors.¶
Hence, operators SHOULD NOT set a higher local preference for NLRI received via an IXP RS. Instead, other non-transitive methods, e.g., setting a corresponding MED on imported routes, should be preferred.¶
If, when trying to drain traffic on an IXP link via AS_PATH prepending of NLRI sent to the RS, an operator encounters an IXP member ignoring these prepends, they may be able to selectively widthdraw routes from being announced to that member by using communities documented by the IXP to prevent the RS from exporting their NLRI to that specific IXP member.¶
Graceful BGP Session Shutdown (GSHUT) as defined in [RFC8326] is a formalized method for draining traffic from sessions gracefully before, e.g., maintenance. However, while AS_PATH prepending does not have to be supported by two neighbors, GSHUT requires all neighbors to implement it by implementing a policy that assigns a lower LOCAL_PREF to NLRI matching the GRACEFUL_SHUTDOWN BGP community.¶
GSHUT is a more effective method of traffic draining than, e.g., AS_PATH prepending. Hence, in general, GSHUT SHOULD be supported on all eBGP sessions. However, as an IXP member, when ignoring the previous recommendation and setting a higher LOCAL_PREF for sessions via an IXP LAN, GSHUT MUST be supported.¶
Besides the overall generation of prefix filters and to which relationships these should be applied, the way how these can be implemented needs to be considered.¶
Almost all BGP implementations have specific default behavior, including behavior when reaching the end of a policy, behavior when no policy is defined (even though [RFC8212] now requires a default-deny in the absence of policy), etc. However, default behavior and matching characteristics may differ between vendors and implementations. Implicitly relying on vendor-specific default behavior can pose issues if a network operator migrates from one vendor to the other, or when operating a mixed-vendor environment. Furthermore, implicit defaults may change, requiring intervention by operators. Therefore, it is RECOMMENDED that operators create explicit policy statements, even for behavior covered by defaults. Such a practice helps simplifying automation of router configurations, and prevents incidents due to changing or differing implicit defaults, especially when migrating between vendors and in interoperability scenarios.¶
BGP is a policy-based routing protocol with import/export policies controlling advertisements/acceptance of NLRI (see [RFC4272] Sec. 9.1), and BGP sessions without policy being applied should default to a deny-all stance (see [RFC8212]). The specific implementation of import/export policies varies between vendors in terms of complexity and naming, from basic prefix-based / AS_PATH-based filters, to complex IF-THEN-like policy structures (typical names are: "route maps", "route policies", "policy statements").¶
Independent of the implementation, all BGP policies consist of one or more rule sets, that are executed in a sequence, one after another. The first rule set will scan the complete content of Adj-RIBs-in or Adj-RIBs-out; NLRIs permitted by a rule set will be passed to subsequent rule sets, while denied prefixes are discarded.¶
Policies SHOULD avoid computationally expensive setups, or setups of rules that apply computation to NLRI that will subsequently be discarded. Hence, the more prefixes a rule is likely to discard, the earlier it SHOULD be evaluated.¶
To further illustrate this, you can find an (incomplete) example for a simple inbound filter for a session with a neighbor in a peer relationship who has 60 prefixes in its cone creating additional load. We assume that, accidentally, an IPv4 fulltable of 1,000,000 entries is being sent. 2,500 NLRI contain unregistered/private AS numbers, 500 NLRI relate to bogon prefixes, and 5,000 NLRI are RPKI invalid, with none of the routes in the neigbor's cone falling in any of these categories. The number of operations per line are given in parentheses.¶
In this list, rules 1-5 will be executed for most NLRI seen from the neighbor. In total, 5,986,500 operations are executed on the received NLRI, even though ultimately only 60 prefixes should be imported.¶
While most hardware implementations of BGP speakers should be sufficiently equipped with resources to handle such individual spikes, practice shows that operators can not always use BGP speakers with an abundance of resources. Furthermore, even more well equipped platforms may suffer if multiple neighbors coordinate and utilize this mechanic to induce load. Ultimately, it is also desirable to reduce unnecessary computation independent of security considerations.¶
Hence, it is RECOMMENDED that operators structure rulesets in a way that prioritized early decisions on the majority of routes. For the example above, this would mean, again noting the number of operations per rule:¶
Overall, this reduces the number of operations for our hypothetical full-table from 5,986,500 operations to 1,000,300. Similar effects can occur when not filtering on, e.g., the OTC attribute first when sending prefixes to customers.¶
As discussed in Section 8.1.2, many BGP implementations use a sequential order for applying different prefix filters to ingested routes. However, at the same time, several implementations do not perform atomic operations when applying rules. This means that, especially on resource constraint BGP speakers or BGP speakers under load consistency of a ruleset may be lost during a rule-set update.¶
For example, consider the following simplified export rule-set towards a peer:¶
If one now wants to swap the order of Rule 2 and 3, an implementation applying rule updates not atomically would proceed as follows:¶
During the timeframe between the execution of Step 1 and 2, an NLRI for a bogon prefix would be passed by the filter. While, technically, this timeframe should be negligibly small, a loaded control plane my create unexpected overhead allowing prefixes that should be filtered to pass. Similarly, an error during the application of a ruleset, making the application stop after the execution of Step 1 may have a similar effect if rule-set changes are not atomic.¶
Hence, it is RECOMMENDED that operators assess whether the application of changes to rule-sets on their BGP speakers is atomic. If it is not atomic, operators SHOULD take special care in drafting rule-set updates concerning inconsistent state that could be created by a delayed or incomplete update. If no atomicity is provided by the BGP speaker, and the load-conditions are uncertain, operators SHOULD consider creating a new complete rule-set with the desired changes, and then changing the referenced rule-set for a given neighbor instead of updating an existing rule-set in-place. Naturally, after the new rule-set has been activated, the old rule-set should be deleted.¶
As prefix filters are changed regularly, idempotency is essential when issuing automated updates of prefix filters. Specifically, prefix filters SHOULD NOT be generated on routers itself.¶
Instead, filter lists SHOULD be generated on dedicated systems. These systems SHOULD ensure the idempotency of changes to filters applied to routers, i.e., they should only deploy a policy, if the policy changed. This ensures no unnecessary regular load is placed on the control plane of BGP speakers.¶
Some BGP speaker implementations, and especially older BGP speakers, are restrained in terms of the number of prefixes and rules they can apply. A common reaction of operators in such cases is reducing the number of filters applied on sessions. Even though it is NOT RECOMMENDED to aggregate prefix lists for filtering, operators SHOULD consider aggressive aggregation of prefix filter lists to restrict the perfixes accepted by neighbors if the alternative is not using filters at all.¶
Another approach that MAY be a suitable rule-set creation approach for downstreams and peers is offline validation. In that case, a dedicated system regularly, e.g., every two hours, obtains the list of prefixes advertised by a given peer or downstream. That list is then validated according to the applicable section below. Subsequently, instead of using a full representation of the neighbors cone, a condensed prefix list matching the aggregate of the exact prefixes announced is generated and deployed to the BGP speaker. While this increases the timeframe for newly added prefixes to be accepted, and may be unsuitable for, e.g., DDoS defense services, it can also reduce the size of prefix lists significantly.¶
As noted in Section 6.1.3, the creation and application of filter rules should be automated to reduce the margin for error and misconfigurations. Nevertheless, the regeneration of filter rules may fail.¶
Before applying a generated ruleset, an operator should check it for obvious errors and potentially require manual intervention to remediate the issue. Examples include a ruleset for a neighbor suddenly significantly increasing or decreasing in size, or being empty.¶
In case of such a failure, each administrator MAY decide which actions they will take. Options include re-using the previously active rule set, or either accepting or rejecting all routes depending on routing policy. Generally accepting all routes during that time frame could break BGP routing security. However, rejecting them might re-route too much traffic towards upstreams, and could cause more harm than accepting invalid prefixes. Similarly, reusing the previously active rule set may lead to prefixes being wrongfully accepted or rejected, despite on a smaller scale than for a general accept or reject decision.¶
Hence, to still provide sufficient protection for an individual AS experiencing issues with rule generation, and therefore deciding to deviate to more permissive inbound filters, it is strongly RECOMMENDED that all BGP speakers in general employ inbound and outbound filtering as described in this document.¶
For networks that have the full Internet BGP table, policies should be applied on each BGP neighbor for received and advertised routes. It is RECOMMENDED that each Autonomous System configures rules for advertised and received routes at all its borders, as this will protect the network and its neighbors even in case of misconfiguration. The most commonly used filtering policy is proposed in this section and uses prefix filters defined in Section 5, Section 6, and Section 7.¶
Inbound filtering on sessions with peers does not only ensure that an operator does not ingest maliciously or wrongfully advertised routes, but also serves as an additional safety net in case of unintentional misconfigurations. For inbound filters with peers, the following rules SHOULD be applied in the given order to limit resource use on filter application (see Section 8.1).¶
The RECOMMENDED filters ensure advertisements strictly conform to what is declared in routing registries (Section 6.1). Warning is given as registries are not always accurate (prefixes missing, wrong information, etc.). This varies across the registries and regions of the Internet. Hence, before applying this policy, the reader SHOULD check the impact on the filter and make sure no prefixes are filtered that should actually be accepted.¶
Note that Rule 12 MAY be formulated as an acceptance rule, i.e., accepting all prefixes that are between a /8 and a /24 for IPv4 and between a /16 and a /48 for IPv6.¶
Additionally, Rule 11 MUST NOT be set for sessions with IXP routeservers, while it SHOULD be set on direct sessions via IXP LANs (see Section 6.6).¶
If BGP roles are used, the OTC attribute should be set according to [RFC9234].¶
The configuration should ensure that only appropriate prefixes are sent, i.e., prefixes a neighbour would not need to filter based on Section 8.2.1.1. These can be, for example, prefixes belonging to both the network in question and its downstreams. This can be achieved by using BGP communities, AS paths, or both.¶
Also, it may be desirable to add the following filters before any further policy to avoid unwanted route announcements due to bad configuration:¶
If it is possible to list the prefixes to be advertised, then just configuring the list of allowed prefixes and denying the rest is technically sufficient. Nevertheless, to ensure robustness in case of failure, especially for manually operated BGP speakers, it is RECOMMENDED that operators apply the full rule-set.¶
Note that Rule 12 is technically not necessary for eBGP. However, in some rare cases misconfigurations or implementation errors may occur, especially for sessions with a neighbor via an IXP LAN (directly or indirectly), where the implementation on the BGP speaker might export routes with a non-local next-hop. While Rule 12 could prevent disturbance in such cases, the likelihood of such events is sufficiently low that operators MAY opt to not use Rule 12.¶
From customers, only customer prefixes SHOULD be accepted, all others SHOULD be discarded. However, additionally, an operator should ensure that prefixes announced by customers also conform to best practices in terms of other BGP aspects (AS path, IRR compliance, RPKI etc.) Not doing so might lead to intransparent failures when the customer is able to export routes to the upstream, but these are then not ingested by the upstream's neighbors. Applying filtering close to the source ensures better debugability for such issues.¶
Technically, the inbound policy with end customers is pretty straightforward: only customer prefixes SHOULD be accepted, all others SHOULD be discarded. For smaller downstreams, the list of accepted prefixes can be manually specified, after having verified that they are valid. This validation can be done with the appropriate IP address management authorities. For larger downstreams, an approach as documented in Section 8.1.5 MAY also be suitable.¶
Additionally, Rule 11 MUST NOT be set in the rare case of an IXP routeserver providing upstream (see [CommunityIX]), while it SHOULD be set when providing upstream to a customer with a direct session via IXP LANs (see Section 7.4).¶
The outbound policy with customers may vary according to the routes the customer wants to receive. In the simplest possible scenario, the customer may want to receive only the default route; this can be done easily by applying a filter with the default route only.¶
In case the customer wants to receive the full routing table (if it is multihomed or if it wants to have a view of the Internet table), the following filters SHOULD be applied on the BGP session:¶
In some cases, the customer may desire to receive the default route in addition to the full BGP table. This can be done by the provider removing the filter for the default route in Rule 7. As the default route may not be present in the routing table, operators SHOULD only originate it for neighbors that requested it.¶
Note that Rule 9 is technically not necessary for eBGP. However, in some rare cases misconfigurations or implementation errors may occur, especially on sessions with a neighbor via an IXP LAN (directly or indirectly), where the implementation on the BGP speaker might export routes with a non-local next-hop. While Rule 9 could prevent disturbance in such cases, the likelihood of such events is sufficiently low that operators MAY opt to not use Rule 9.¶
If the upstream provider is supposed to announce only the default route, a simple filter will be applied to accept only the default prefix and nothing else.¶
If the full routing table is desired from the upstream, the prefix filtering below should be applied:¶
Sometimes, the default route (in addition to the full BGP table) can be desired from an upstream provider. In that case, Rule 9 MAY be removed.¶
Additionally, Rule 10 MUST NOT be set in the rare case of an IXP routeserver providing upstream (see [CommunityIX]), while it SHOULD be set when receiving upstream on a direct session via IXP LANs (see Section 7.4).¶
In general, at least the same outbound filters as applied for Internet peers (Section 8.2.1.2) SHOULD be applied for upstreams. However, different policies could be applied if a particular upstream should not provide transit to all prefixes.¶
When deciding to selectively announce prefixes to an upstream, it is important to be mindful of potential issues with uRPF in case of asymmetric traffic flows. In certain strict uRPF cases traffic for a prefix may be blackholed if the outbound route to a destination traverses one upstream, while the prefix is only announced to another upstream. It is RECOMMENDED that operators do not implement strict uRPF solely based on visible or selected routes received from a peer. Instead, either an approach similar to the cone determination (see Section 6.1), or loose uRPF should be used (see [RFC8704]).¶
The leaf network will deploy the filters corresponding to the routes it is requesting from its upstream. If a default route is requested, a simple inbound filter can be applied to accept only the default route (Section 5.4). If the leaf network is not capable of listing the prefixes because there are too many (for example, if it requires the full Internet routing table), then it SHOULD follow the filter recommendations in Section 8.2.3.1.¶
A leaf network will most likely have a very straightforward policy: it SHOULD only announce its local routes. For additional scrutiny, it is also RECOMMENDED that leaf ASes follow the recommendations in Section 8.2.3.2 to avoid announcing invalid routes to its upstream provider, and for additional resillience if the network later becomes multihomed.¶
If a mutual-transit relationship as defined in [I-D.ietf-sidrops-aspa-verification] exists between two neighbors, each neighbor SHOULD follow the recommendations in [I-D.ietf-sidrops-aspa-verification]. Furthermore, it is RECOMMENDED that both parties in a mutual-transit relationship take additional precautions to ensure that they do not export routes the other neighbor learned from their own upstreams to peers and upstreams of their own. This can be accomplished, e.g., via annotations of imported routes (see Section 6.5.2) differing based on a filter representing the neighbor's cone (see Section 6.1).¶
While iBGP sessions should generally be trusted, it is good practice to implement basic filters on iBGP sessions carrying external NLRI as well. It is RECOMMENDED that other internal routing signalling is handled by a dedicated IGP or via a dedicated VRF/Routing Domain (see [NSRC-17]) to reduce the likelihood of internal routes leaking due to misconfigurations, with routes appropriately annotated to not be exported (see Section 6.5). Doing so ensures that, e.g., localized misconfigurations, e.g., leaked (internal) routes, remain localized to a region or PoP, instead of spreading throughout the whole AS and to external neighbors, ideally limiting their impact.¶
If the iBGP mesh/sessions via a route reflector (see [RFC4456]) of BGP speakers connected to external neighbors only carries external NLRI, the following filters are RECOMMENDED, the following rules should be applied when importing or exporting routes.¶
Depending on the local circumstances an operator MAY deviate from this suggestion and refrain from using individual rules. For example, if the AS ingests a default route from at least one neighbor, Rule 9 should be omitted. Similarly, when allowing downstreams to announce hyperspecifics (see Section 6.5.2), Rule 10 SHOULD be omitted.¶
While an operator MAY opt to not use any of the suggested rules, it is RECOMMENDED that at least Rule 1 is applied to iBGP sessions to ensure absent annotations do not propagate and cause route leaks.¶
This document does not require any IANA actions.¶
This document is entirely about BGP operational security. The document understands security not only as resilience against attacks, but also in the context of safety, i.e., ensuring that systems remain operational and behave as expected even if individual components fail or are mishandled. It depicts best practices that one should adopt to secure BGP infrastructure: protecting BGP routers and BGP sessions, adopting consistent BGP prefix and AS path filters, and configuring other options to secure the BGP network.¶
This document does not aim to describe specific BGP implementations, their potential vulnerabilities, or ways they handle errors. It does not detail how protection could be enforced against attack techniques using crafted packets.¶