Improvement of Egress IP on secondary interface of the OpenShift (OVN-K CNI) node Published on 13 Jun 2024 by Vinu K
First of all, what is an EgressIP?, an EgressIP allows one to ensure that the traffic from one or more pods in one or more namespaces has a consistent source IP address for services outside the cluster network. It uses the namespaceSelector
or podSelector
to identify the traffic. The OVN-K documentation explains the in-depth traffic flow. When it comes to the EgressIP that has attached to the secondary interface of the OpenShift node, it showed the limitation to communicate to the different subnet. The workaround for the issue was not practical as the node’s rule table
with source IP of the pod should be modified.
What went wrong
Whenever a Pod with an EgressIP is created, a policy routing rule with the priority 6000
will be added on the node wherein the EgressIP has been assigned. It can be listed using the command ip rule
. The policy routing rule will tell the system which table to use to determine the correct route. For example, the 6000: from 10.128.1.55 lookup 1134
tells the system to use the table 1134
if the source address is 10.128.1.55
. Hence, it is often referred to as source routing. However, the table
will not be reflected in the /etc/iproute2/rt_tables
. Why? The /etc/iproute2/rt_tables
file is a human-readable reference that maps numerical table IDs to names. This file is not automatically updated when table is created dynamically by the ovnkube-controller
. The ip route show table 1134
command will show the route that needs to be used for the Pod traffic.
Here comes the actual issue. The controller only adds a single line any
or default
route to the rule table
, this will not route the traffic to the network (for example, internet) which does not belong to the interface.
The 4.15.9 code shows the below function to get the route for the secondary link.
func getDefaultRouteForLink(link netlink.Link, v6 bool) *routemanager.RoutesPerLink {
return &routemanager.RoutesPerLink{Link: link,
Routes: []routemanager.Route{
getDefaultRoute(link.Attrs().Index, v6),
},
}
}
func getDefaultRoute(linkIdx int, v6 bool) routemanager.Route {
anyCIDR := defaultV4AnyCIDR
if v6 {
anyCIDR = defaultV6AnyCIDR
}
return routemanager.Route{
Table: getRouteTableID(linkIdx),
Subnet: anyCIDR,
}
}
The above generates the route:
default dev <interface>
See the logs from the ovnkube-controller
.
I0613 03:59:46.272484 7647 egressip.go:381] Processing Egress IP foo
I0613 04:00:07.368172 7647 egressip.go:591] Adding pod egress IP status: {node.sno4.onp.local 192.168.124.99} for EgressIP: foo and pod: foo/pod/[10.128.0.208/23]
I0613 04:00:10.474732 7647 egressip.go:381] Processing Egress IP foo
I0613 04:00:10.480805 7647 egressip.go:542] Generating config for EgressIP foo IP 192.168.124.99 which is hosted by a non-OVN managed interface (name enp7s0)
I0613 04:00:10.535929 7647 route_manager.go:92] Route Manager: attempting to add routes for link: Route(s) for link name: "enp7s0", with 1 routes: Route 1: "Table 1134 Subnet: 0.0.0.0/0"
I0613 04:00:10.536104 7647 route_manager.go:102] Route Manager: completed adding route: Route(s) for link name: "enp7s0", with 1 routes: Route 1: "Table 1134 Subnet: 0.0.0.0/0"
I0613 04:00:10.537000 7647 route_manager.go:145] Route Manager: netlink route addition event: "{Ifindex: 134 Dst: <nil> Src: <nil> Gw: <nil> Flags: [] Table: 1134 Realm: 0}"
See the packet captures from the node when the Pod initiates a ping
to google.com
and its gateway. The former does not succeed, but the latter does.
15:29:33.393427 5fd314eac04272b P IP 10.128.1.55 > 142.251.42.78: ICMP echo request, id 15, seq 1, length 64
15:29:33.393912 ovn-k8s-mp0 In IP 10.128.1.55 > 142.251.42.78: ICMP echo request, id 15, seq 1, length 64
15:29:36.459963 ovn-k8s-mp0 Out IP 192.168.124.99 > 10.128.1.55: ICMP host 142.251.42.78 unreachable, length 92
15:29:36.460344 5fd314eac04272b Out IP 192.168.124.99 > 10.128.1.55: ICMP host 142.251.42.78 unreachable, length 92
16:08:51.191993 5fd314eac04272b P IP 10.128.1.55 > 192.168.124.1: ICMP echo request, id 18, seq 1, length 64
16:08:51.192063 ovn-k8s-mp0 In IP 10.128.1.55 > 192.168.124.1: ICMP echo request, id 18, seq 1, length 64
16:08:51.192088 enp7s0 Out IP 192.168.124.99 > 192.168.124.1: ICMP echo request, id 18, seq 1, length 64
16:08:51.192269 enp7s0 In IP 192.168.124.1 > 192.168.124.99: ICMP echo reply, id 18, seq 1, length 64
16:08:51.192288 ovn-k8s-mp0 Out IP 192.168.124.1 > 10.128.1.55: ICMP echo reply, id 18, seq 1, length 64
16:08:51.192308 5fd314eac04272b Out IP 192.168.124.1 > 10.128.1.55: ICMP echo reply, id 18, seq 1, length 64
The fix
The function has been changed from getDefaultRouteForLink
to generateRoutesForLink
and the fix has been added in the 4.15.10
release. What it does is, it copies all the route of the interface from the main table
and uses it in the new rule table
. If the route does not exist in the main table
for the interface, it creates one.
The below output compares both the functions from the non-fixed and fixed releases.
$ oc adm release info 4.15.9 --commits | grep 'ovn-kubernetes' | awk '{print $3}' | head -n 1
42b1cc427538a736f8c056171b4de7e6c6a366fb
$ git checkout 42b1cc427
HEAD is now at 42b1cc427 Merge pull request #2074 from tssurya/OCPBUGS-29599
$ git blame go-controller/pkg/node/controllers/egressip/egressip.go | grep -A6 'func getDefaultRouteForLink'
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 1086) func getDefaultRouteForLink(link netlink.Link, v6 bool) *routemanager.RoutesPerLink {
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 1087) return &routemanager.RoutesPerLink{Link: link,
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 1088) Routes: []routemanager.Route{
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 1089) getDefaultRoute(link.Attrs().Index, v6),
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 1090) },
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 1091) }
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 1092) }
$ oc adm release info 4.15.10 --commits | grep 'ovn-kubernetes' | awk '{print $3}' | head -n 1
feca446a2e3848f79e533bb28763b2d61074de6e
$ git checkout feca446a2
HEAD is now at feca446a2 Merge pull request #2094 from arghosh93/SDN-4544
$ git blame go-controller/pkg/node/controllers/egressip/egressip.go | grep -A8 'func generateRoutesForLink'
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 620) func generateRoutesForLink(link netlink.Link, isV6 bool) ([]netlink.Route, error) {
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 621) linkRoutes, err := netlink.RouteList(link, util.GetIPFamily(isV6))
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 622) if err != nil {
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 623) return nil, fmt.Errorf("failed to get routes for link %s: %v", link.Attrs().Name, err)
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 624) }
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 625) linkRoutes = ensureAtLeastOneDefaultRoute(linkRoutes, link.Attrs().Index, isV6)
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 626) overwriteRoutesTableID(linkRoutes, getRouteTableID(link.Attrs().Index))
ced7a1c229 (Martin Kennelly 2024-01-29 10:08:36 +0000 627) return linkRoutes, nil
b64825622f (Martin Kennelly 2023-06-27 13:52:40 +0100 628) }
The difference between the return values *routemanager.RoutesPerLink
and []netlink.Route
in the context of the OVN-K’s routemanager
and github.com/vishvananda/netlink
is:
- the
*routemanager.RoutesPerLink
is a pointer represents a single network route. - the
[]netlink.Route
is a slice ofnetlink.Route
struct
s. This represents a collection of network routes.
See the logs of ovnkube-controller
.
I0613 06:10:42.946169 6486 egressip.go:426] Processing Egress IP foo
I0613 06:11:05.136532 6486 egressip.go:591] Adding pod egress IP status: {node.sno4.example.local 192.168.124.99} for EgressIP: foo and pod: foo/pod/[10.128.0.137/23]
I0613 06:11:09.176605 6486 route_manager.go:149] Route Manager: netlink route addition event: "{Ifindex: 143 Dst: 192.168.124.99/32 Src: 192.168.124.99 Gw: <nil> Flags: [] Table: 255 Realm: 0}"
I0613 06:11:09.185916 6486 route_manager.go:93] Route Manager: attempting to add route: {Ifindex: 143 Dst: 0.0.0.0/0 Src: 192.168.124.45 Gw: 192.168.124.1 Flags: [] Table: 1143 Realm: 0}
I0613 06:11:09.191761 6486 route_manager.go:110] Route Manager: completed adding route: {Ifindex: 143 Dst: 0.0.0.0/0 Src: 192.168.124.45 Gw: 192.168.124.1 Flags: [] Table: 1143 Realm: 0}
I0613 06:11:09.191837 6486 route_manager.go:149] Route Manager: netlink route addition event: "{Ifindex: 143 Dst: 0.0.0.0/0 Src: 192.168.124.45 Gw: 192.168.124.1 Flags: [] Table: 1143 Realm: 0}"
I0613 06:11:09.191870 6486 route_manager.go:93] Route Manager: attempting to add route: {Ifindex: 143 Dst: 192.168.124.0/24 Src: 192.168.124.45 Gw: <nil> Flags: [] Table: 1143 Realm: 0}
I0613 06:11:09.192082 6486 route_manager.go:110] Route Manager: completed adding route: {Ifindex: 143 Dst: 192.168.124.0/24 Src: 192.168.124.45 Gw: <nil> Flags: [] Table: 1143 Realm: 0}
I0613 06:11:09.195575 6486 route_manager.go:149] Route Manager: netlink route addition event: "{Ifindex: 143 Dst: 192.168.124.0/24 Src: 192.168.124.45 Gw: <nil> Flags: [] Table: 1143 Realm: 0}"
See the packet capture from the node when the Pod initiates a ping
to google.com
.
15:11:57.599484 5fd314eac04272b P IP 10.128.1.55 > 142.250.183.110: ICMP echo request, id 14, seq 1, length 64
15:11:57.600066 ovn-k8s-mp0 In IP 10.128.1.55 > 142.250.183.110: ICMP echo request, id 14, seq 1, length 64
15:11:57.600092 enp7s0 Out IP 192.168.124.99 > 142.250.183.110: ICMP echo request, id 14, seq 1, length 64
15:11:57.635614 enp7s0 In IP 142.250.183.110 > 192.168.124.99: ICMP echo reply, id 14, seq 1, length 64
15:11:57.635630 ovn-k8s-mp0 Out IP 142.250.183.110 > 10.128.1.55: ICMP echo reply, id 14, seq 1, length 64
15:11:57.636298 5fd314eac04272b Out IP 142.250.183.110 > 10.128.1.55: ICMP echo reply, id 14, seq 1, length 64