Dead Gateway Detection - gateway status based source address selection for Linux Julian Anastasov - September 2001 Development status and internals History (in reverse order): 29-NOV-2002 Released routes-2.4.20-9.diff with the following fixes: - don't restrict lsrc to be local IP address, problem noticed from Patrick McHardy. 04-MAY-2002 Released routes-2.4.19-8.diff with the following fixes: - WARNING: 2.4.19pre8 already deletes multipath routes when nexthop's device is unregistered - properly set the nh_scope after fib_lookup in fib_sync_up, other issues in using fib_lookup there - fixed many bugs in fib_check_nh including one that leads to kernel oops when adding nexthop via downed device - allow using local IPs as gateways (added more checks for nh->nh_scope == RT_SCOPE_LINK). Now when local gateways are deleted as IP address we propagate the event to all nexthops - fix some races on SMP when playing with nh->nh_flags - allow NAT rules (ip rule ... nat ...) to point to alternative routes (ported from routes-2.2.20-7) - rewrote rerouting, now ip_route_input and ip_route_output have original interface, the 05_nf_reroute-2.4.19-8.diff patch is now shorter. Most of the ip_route_input lsrc argument handling is stolen from the rtlsrc-* patches. Now patches that fix the ip_route_output calls in foreign code are not needed. 03-FEB-2002 Released routes-2.2.20-7.diff with the following fixes: - 7th version of the alternative_route patch for 2.2: 01_alt_routes-2.2.20-7.diff: - allow NAT rules (ip rule ... nat ...) to point to alternative routes 02_masq_csum_reroute-2.2.20-7.diff: - bugfix: missing ip_rt_put(rt) after failed rerouting can lead to unaccounted routing cache entries 05_key_gw-2.2.20-7.diff: - rediffed after 02_masq_csum_reroute-2.2.20-7.diff Released 02_masq_csum_reroute-2.2.20-IPVS-1.0.8-7.diff: -the same bugfix: missing ip_rt_put(rt) Added patch only for the LVS users: routes-2.2.20-IPVS-1.0.8-7.diff 14-DEC-2001 Released routes-2.4.16-6.diff and routes-2.2.20-6.diff with the following fixes: - 6th version of the static_routes patch: 00_static_routes-2.4.16-6.diff, 00_static_routes-2.2.20-6.diff: - repeat the fib_sync_up checks after a gateway appears. This fixes the problem when one or many routes can remain in dead state and it happens when in the fib_info list they are before the routes for the gateways they are using. Their gateway is marked alive after them and they can not notice this fact. For this, we repeat the check until no new gateways are added. 24-NOV-2001 Released routes-2.4.14-5.diff The second (pre2) version for Netfilter's NAT in Linux 2.4: - don't check for fragments in ip_nat_route_input 16-NOV-2001 The first (pre1) version for Netfilter's NAT in Linux 2.4 24-OCT-2001 The 5th version should contain the following fixes: - compile fib_sync_up even when CONFIG_IP_ROUTE_MULTIPATH is not defined - static routes: should check for RTN_UNICAST in fib_sync_up The hard work remains: teach the 2.4 NAT to use many gateways 14-OCT-2001 It seems dgd-2.2.19-3 is not enough to support masquerading with initial multipath route when two of the paths have same device. So, in addition to the alternative routes, changes in the masquerading are required to find the masquerading source address by looking for the right gateway. As result, the 4th version contains all independent functionalities in different patches: - ??_static_routes-*: static routes - ??_alt_routes-*: alternative routes - ??_arp_prefsrc-*: always use the prefsrc in our ARP probes - ??_masq_csum_reroute-*: connection rerouting and incremental checksum updates for the masquerading - ??_key_gw-*: use the gateway address as routing key, for now used only for the masquerading The Alternative routes now match by TOS 29-SEP-2001 dgd-2.2.19-3.diff: Note: this patch version may look ugly and the support for alternative routes and multipath routes can work by magic at some places. The key in this support is the remote host to be present in the neighbour table. The main difficulties are for multipath routes. - the alternative routes now can be selected by output device. Now the masquerade and other users can use alternative routes when creating route through specific device - support for non-gatewayed (direct) alternative routes, now one network can reside on many devices to provide some failover capabilities. Don't be surprised from the ARP behaviour. - support for alternative non-default routes, i.e. for networks with prefixlen>0. The last_dflt value is now defined in fib node - fib_select_multipath now updates fib_power when the paths change their state. This avoids route failures when some of the paths go in bad state. The lookup by output device uses different scheduler and we assume this case is used mostly after a lookup without specific device. The drawback is that we don't select the same path when many paths with same device exist. - if fib_lookup fails for output route with specific output device check that the device is up. 05-SEP-2001 dgd-2.2.19-2.diff: - don't treat the NOARP devices as last resort, check their DEAD flag and try to follow the order specified from the user, i.e. assume that they are valid. The user have to put the NOARP devices at the end of the list if the ARP devices have more priority in the group with the alternative routes 04-SEP-2001 dgd-2.2.19-1.diff: first release for Linux 2.2.19 The patches dgd-*.diff contain the following parts: 1. maintaining "proto static" routes in DEAD state 2. if (1) is present allow marking of nexthops and routes as RTNH_F_DEAD when the host/gateway is not available (usually when device is down or the gateway is not present). By this way we can create static routes/nexthops in dead state without hiting ENETUNREACH/ENETDOWN errors. 3. alternative routes: the process of selection of source address can detect the current state of the hosts and gateways and to use only alive ones. First the terms and axioms: - let path has the same semantic (it is implemented) as nexthop - routes are unipath or multipath - the paths can be marked RTNH_F_DEAD (device is down, proto static, etc) and this can happen even for ARP devices, for paths in scope [host and link] - the group of routes created with same metric we call "alternative routes" - the alternative routes are allowed for input or output routes. Their users can be traffic originated from the host or source address selection for masquerading in NAT gateways, they can be used even for packets that have source address that will not change on forwarding, i.e. already selected source address) - the nexthop path from one multipath route is selected with fib_select_multipath after a valid alternative route is selected. By this way the alternative routes can be a list of routes from any kind: unipath or multipath Some rules for handling of the alternative routes: - we can create one or more alternative routes with ip route append by specifying same metric value (or to use the default value 0) - the selection of valid alternative route can choose one of them according to their order but more priority in this process takes the status of each path from the route. Considering the fact that one nexthop can be in suspected state we can skip even reachable gateways on ARP devices if we don't know their state because they are not in the neighbour table. So, the selection can really look random but the first routes have better chance to be selected. - fib_lookup can return route that is not the first in the zone, i.e. after skipping dead routes (all nexthops marked RTNH_F_DEAD), it can even skip such routes with lower metric value. Then fib_select_default can select alternative route from a group of alternative routes with higher metric value, i.e. not from the first group. fn_hash_select_default operation: - last_dflt is now field in the fib node and is not global for all tables or zones. This value is preserved in the first node for the specified network. fib_detect_death operation: - we have to answer whether the route (checking all nexthops) contains alive paths and whether the route can be used as last resort if there are no valid alternative routes in the group - if all nexthops are marked RTNH_F_DEAD we can't select this route, even for last resort - paths with NOARP device are considered alive (if not marked DEAD) - if the path is scope host (non-gatewayed) the ARP state of the route target is checked (the destination host) - if the route is multipath then we have to refresh the state of all nexthops because fib_select_multipath needs fresh information about all nexthops. If all paths are suspected we mark the selected nexthop from the last resort multipath route as alive. fib_validate_source operation: - restrict the allowed devices to the list of devices in all nexthops in the route (multipath) - allow one network to reside on many devices (multipath route or many alternative routes). To allow this to work we add the restrictions all routes/nexthops to have the same scope, prefix length and they to be defined in same routing table Some rules for maintaing routes from proto static: - the proto static routes are not flushed (removed from the kernel) when all their nexthops are in dead state (RTNH_F_DEAD). By this way they can survive device down events. - the routes are automatically created in DEAD state if their nexthops have devices in down state Netfilter changes and requirements: - key "gw" for ip_route_output used to select the right route for the gateway - key "lsrc" for ip_route_input used to find the best unicast route between this IP and the destination address (similar to output routing call but still makes the checks needed for input packet). - new hook function in PRE_ROUTING used to call ip_route_input with lsrc argument for all initialized connections that have source address manipulation scheduled for post_routing in the detected direction. - MASQUERADE/DNAT completed - ipchains/ipfwadm: completed, still with possible MTU problems and relying on the fact there are no other functions in the FORWARD hook after us - SNAT: needs SNAT rule match by route's gateway (TODO, takers?) Other changes: - new flag: RTNH_F_SUSPECT (we don't know the real state of nexthop, it can be alive or not alive) - new mask RTNH_F_BADSTATE (nexthop is DEAD or suspected) - fib_select_multipath checks also for suspected nexthops and by this way we skip the nexthops that are not reachable if this multipath route is alternative route in some zone. - in fib_inetaddr_event on inetaddr NETDEV_UP event call fib_sync_up after an IP address is added and by this way try to mark alive nexthops that were marked dead due to missing gateway. To ensure that one route will notice that its gateway becomes alive from the same netdev up event we repeat the check if new gateway is marked alive and by this way any routes before this gateway will be marked alive. - in route.c modify ip_route_output to call fib_select_default before fib_select_multipath and add fib_select_multipath for ip_route_input to allow using alternative routes