-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Error when deploying multiple VPC-NAT-GWs with same lanIP in Cilium chain mode #4952
Comments
I found a similar closed issue regarding the inability to use the same EIP for DNAT and SNAT in VPC gateways, which was resolved by a pull request that added support for shared EIP in iptables NAT [1]. In your case, the error arises because the same To work around this issue, you should ensure that each VPC NAT Gateway has a unique To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
As a temporary solution, executing However, when deleting the Cilium endpoint, the existing VPC-NAT-GW stops functioning, and all except the latest one becomes unable to communicate. |
@oilbeater @zbb88888 @zhangzujian Hi. Is this within specifications? I await your response. |
more than one VPC NAT Gateways with the same lanIp, the lanip are in different VPCs. I think it is a bug. the pod can not be ready, cloud you please show the err log in the kube-ovn-cni. it will use arp or ping to check whether the gw is pingable. |
maybe something blocks the ping packet. |
It seems that Cilium does not support same/conflict endpoint IPs. I suggest you to report this problem to the Cilium community. |
@zhangzujian |
You are creating two vpc-nat-gateways for two different VPCs. VPCs are isolated from each other and different VPCs can have identical CIDRs/IPs. This is not a bug but a feature. BTW, underlay subnets can also have identical CIDRs/IPs since they are working in a different netwoking mode. |
Thank you for your reply. I understand that using VPC functionality allows for the use of identical IPs. However, I believe vpc-nat-gw might be an exception. Is it a matter of CNI priority that Cilium performs Endpoint assignment before Kube-OVN? I plan to test by modifying the CNI priority. |
As I predicted, when I increased Kube-OVN's priority by setting a lower configuration number than Cilium's and rebuilt the environment, vpc-nat-gw Pods with identical IPs were able to deploy normally without any Cilium Endpoint assignment errors. ubuntu@ubuntu:~$ sudo ls /etc/cni/net.d/
00-multus.conf 01-kube-ovn.conflist 05-cilium.conflist multus.d
ubuntu@ubuntu:~$ ubuntu@ubuntu:~$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system cilium-envoy-hkbvd 1/1 Running 0 8m13s 192.168.0.20 ubuntu <none> <none>
kube-system cilium-operator-78f5fdf98f-drhh6 1/1 Running 1 (6m26s ago) 8m13s 192.168.0.20 ubuntu <none> <none>
kube-system cilium-tkftt 1/1 Running 0 8m13s 192.168.0.20 ubuntu <none> <none>
kube-system coredns-668d6bf9bc-ntb4n 1/1 Running 0 40s 10.16.0.19 ubuntu <none> <none>
kube-system coredns-668d6bf9bc-qffsk 1/1 Running 0 25s 10.16.0.20 ubuntu <none> <none>
kube-system etcd-ubuntu 1/1 Running 0 12m 192.168.0.20 ubuntu <none> <none>
kube-system hubble-relay-75d5bdf84b-mlpt4 1/1 Running 0 8m13s 10.16.0.5 ubuntu <none> <none>
kube-system hubble-ui-69d69b64cf-vfbvt 2/2 Running 0 8m13s 10.16.0.6 ubuntu <none> <none>
kube-system kube-apiserver-ubuntu 1/1 Running 0 12m 192.168.0.20 ubuntu <none> <none>
kube-system kube-controller-manager-ubuntu 1/1 Running 2 (6m34s ago) 12m 192.168.0.20 ubuntu <none> <none>
kube-system kube-multus-ds-2txkf 1/1 Running 0 4m58s 192.168.0.20 ubuntu <none> <none>
kube-system kube-ovn-cni-cs626 1/1 Running 0 11m 192.168.0.20 ubuntu <none> <none>
kube-system kube-ovn-controller-59c7b45555-svvgz 1/1 Running 1 (6m21s ago) 11m 192.168.0.20 ubuntu <none> <none>
kube-system kube-ovn-monitor-7df749fb4-b4xl6 1/1 Running 0 11m 192.168.0.20 ubuntu <none> <none>
kube-system kube-ovn-pinger-5hm4l 1/1 Running 0 11m 10.16.0.2 ubuntu <none> <none>
kube-system kube-proxy-nsvdv 1/1 Running 0 11m 192.168.0.20 ubuntu <none> <none>
kube-system kube-scheduler-ubuntu 1/1 Running 1 (6m27s ago) 12m 192.168.0.20 ubuntu <none> <none>
kube-system ovn-central-5bbcc6b688-bx99t 1/1 Running 0 11m 192.168.0.20 ubuntu <none> <none>
kube-system ovs-ovn-ll4pf 1/1 Running 0 11m 192.168.0.20 ubuntu <none> <none>
kube-system vpc-nat-gw-vpc1-natgw1-0 1/1 Running 0 61s 10.0.1.254 ubuntu <none> <none>
kube-system vpc-nat-gw-vpc2-natgw1-0 1/1 Running 0 9s 10.0.1.254 ubuntu <none> <none>
ubuntu@ubuntu:~$ Why does the official Cilium integration procedure set Cilium's priority higher than Kube-OVN? |
I think this is what happened:
Cilium chaining is supposed to be disabled. |
Ah, I apologize. Indeed, the Cilium chain was not functioning, and Hubble and Cilium Network Policy were not working.
Yes, I believe that's exactly what was happening. |
Cilium, of course. |
I understand. I'll check if Cilium can address this issue. If that's not possible, I'd like to explore alternative solutions. To implement such a scenario, what configurations are possible with Kube-OVN? |
If you are using Kube-OVN VPC and Cilium chaining, there is a chance that multiple normal pods are assigned identical IPs by Kube-OVN. This may also trigger cilium endpoint error. |
The root cause may be that Cilium does not support VPC or network isolation, so identical endpoints/IPs are invalid. |
I see. So, in environments using multiple VPCs, it would be better not to use the Cilium chain? |
This is exactly what I mean. |
Kube-OVN Version
v1.13.2
Cilium Version
v1.16.6
Kubernetes Version
v1.32.1
Operation-system/Kernel Version
"Ubuntu 22.04.5 LTS"
5.15.0-125-generic__
Description
Hello.
I am using Cilium Chain.
In this environment, when deploying multiple VPC NAT Gateways with the same lanIp, I'm encountering a bug where Cilium Endpoints overlap, preventing the creation of VPC NAT Gateways.
Is there any way to work around this?
While the first deployed vpc-nat-gw-vpc1-natgw1-0 is functioning normally, the second deployed vpc-nat-gw-vpc2-natgw1-0 is not working.
vpc-nat-gw1 yaml (working)
vpc-nat-gw2 yaml (failed)
vpc-nat-gw2 detail
kube-ovn controller logs
cilium agent log
cilium endpoint list
Steps To Reproduce
environment setup:
Current Behavior
When trying to create VPC NAT Gateways with the same lanIp, they cannot be created properly.
Expected Behavior
VPC NAT Gateways should be created successfully even when they have the same lanIp.
The text was updated successfully, but these errors were encountered: