Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPsrcaddr : retource stop problem #2019

Open
Rico29 opened this issue Jan 27, 2025 · 3 comments
Open

IPsrcaddr : retource stop problem #2019

Rico29 opened this issue Jan 27, 2025 · 3 comments

Comments

@Rico29
Copy link

Rico29 commented Jan 27, 2025

Hello,
I have an issue with IPsrcaddr resource on latest debian 12 with up-to-date packages

I've pulled findif.sh and IPsrcaddr from ClusterLabs/resource-agents github repo

Problem occurs when moving a resource (or resource group) to another node.

Reproduction :

node 1 :

root@freepbx-lab-ha1:~# crm status
[...]
Node List:
  * Online: [ freepbx-lab-ha1 freepbx-lab-ha2 ]

Full List of Resources:
  * email_alert (ocf:heartbeat:MailTo):  Started freepbx-lab-ha2
  * Resource Group: grp_services:
    * shared_ip (ocf:heartbeat:IPaddr2):         Started freepbx-lab-ha1
    * src_ip    (ocf:heartbeat:IPsrcaddr):       Started freepbx-lab-ha1
    * srv_freepbx       (systemd:freepbx):       Started freepbx-lab-ha1

root@freepbx-lab-ha1:~# ip a
[...]
3: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:24:11:6b:df:9e brd ff:ff:ff:ff:ff:ff
    inet 192.168.222.211/24 brd 192.168.222.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet 192.168.222.210/32 brd 192.168.222.255 scope global bond0
       valid_lft forever preferred_lft forever

root@freepbx-lab-ha1:~# ip r
default via 192.168.222.1 dev bond0 proto keepalived src 192.168.222.210 onlink 
192.168.222.0/24 dev bond0 proto keepalived scope link src 192.168.222.210 
192.168.222.212 dev bond0 scope link src 192.168.222.211 

node 2 :

root@freepbx-lab-ha2:~# crm status
[...]
Node List:
  * Online: [ freepbx-lab-ha1 freepbx-lab-ha2 ]

Full List of Resources:
  * email_alert (ocf:heartbeat:MailTo):  Started freepbx-lab-ha2
  * Resource Group: grp_services:
    * shared_ip (ocf:heartbeat:IPaddr2):         Started freepbx-lab-ha1
    * src_ip    (ocf:heartbeat:IPsrcaddr):       Started freepbx-lab-ha1
    * srv_freepbx       (systemd:freepbx):       Started freepbx-lab-ha1

root@freepbx-lab-ha2:~# ip a
[...]
3: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether bc:24:11:bf:7d:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.222.212/24 brd 192.168.222.255 scope global bond0
       valid_lft forever preferred_lft forever

root@freepbx-lab-ha2:~# ip r
default via 192.168.222.1 dev bond0 proto keepalived src 192.168.222.212 onlink 
192.168.222.0/24 dev bond0 proto kernel scope link src 192.168.222.212 
192.168.222.211 dev bond0 scope link src 192.168.222.212 

At startup, everything is working correctly. node1 owns the IPaddr2 address, and default route uses this address as src address

When moving resource group via command crm resource move grp_services freepbx-lab-ha2 , I get this status and error

Node List:
  * Online: [ freepbx-lab-ha1 freepbx-lab-ha2 ]

Full List of Resources:
  * email_alert (ocf:heartbeat:MailTo):  Started freepbx-lab-ha2
  * Resource Group: grp_services:
    * shared_ip (ocf:heartbeat:IPaddr2):         Started freepbx-lab-ha1
    * src_ip    (ocf:heartbeat:IPsrcaddr):       FAILED freepbx-lab-ha1 (blocked)
    * srv_freepbx       (systemd:freepbx):       Stopped

Failed Resource Actions:
  * src_ip stop on freepbx-lab-ha1 returned 'error' (command 'ip route replace  192.168.222.0/24 dev bond0 proto kernel scope link src 192.168.222.211) at Mon Jan 27 12:59
:40 2025 after 48ms

running the "ip route..." command manually returns no error on given node :

# ip route replace  192.168.222.0/24 dev bond0 proto kernel scope link src 192.168.222.211 && echo $?
0

How can I fix this ?
Regards

@oalbrigt
Copy link
Contributor

If you run pcs resource update src_ip trace_ra=1 or the crm equivalent you will get trace-files for every run of each action in /var/lib/heartbeat.

Then you try to move it again and should be able to identify exactly command fails.

@Rico29
Copy link
Author

Rico29 commented Feb 26, 2025

Hello
Sorry for late response, was working on other things. but I've found the problem.

IPaddr2 resource adds a virtual IP address to the interface, but this ip address is not marked as "secondary". so the PRIMARY_IP variable contains 2 ip addresses, the original one (defined in host network config) and the ip address added by IPaddr2 resource :

# ip -4 -o addr show dev bond0.324 primary | awk '{split($4,a,"/");print a[1]}'
172.24.0.2
172.24.0.1

So the fix you proposed id #1450 is working fine.

But is that the good solution ? Wouldn't be IPaddr2 to set up the virtual ip address as "secondary" ?

when adding the IP address as secondary, I get this warning :

# ip addr add 172.24.0.1/16 dev bond0.324 secondary
Warning: secondary option is not mutable from userspace

But the IP address is correctly added as "secondary"

# ip -4 -o addr show dev bond0.324
6: bond0.324    inet 172.24.0.2/16 brd 172.24.255.255 scope global bond0.324\       valid_lft forever preferred_lft forever
6: bond0.324    inet 172.24.0.1/16 scope global secondary bond0.324\       valid_lft forever preferred_lft forever

and the awk in IPsrcaddr works as expected and only returns the primary ip address.

So, IPaddr2 responsability or IPsrcaddr responsability ?
Regards

@oalbrigt
Copy link
Contributor

That should be in the IPaddr2 agent.

I'll look into adding the logic to add the IP as secondary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants