Intro
OVMRU_002036E OVM-Repo2 - Cannot present the Repository to server: calavsovm02. The server needs to be in a cluster. [Thu Jun 22 10:25:34 EDT 2023]
We could mount the repositories manually to test, These repositories should mount automatically when a node is added to the cluster.
But the GUI shows node will be in the cluster. But repositories will not be visible on node02.
If the node is part of the cluster, I would recommend removing the node from the cluster before making any changes. In this scenario, node addition to the cluster is not moving after the below mount. This log is from ovs agent.
Agent log output
"DEBUG (ocfs2:182) cluster debug: {'/sys/kernel/debug/o2dlm': [],
'/sys/kernel/debug/o2net': ['connected_nodes', 'stats', 'sock_containers', 'send_tracking'],
'/sys/kernel/debug/o2hb': ['0004FB0000050000B705B4397850AAD6', 'failed_regions', 'quorum_regions', 'live_regions', 'livenodes'],
'service o2cb status': 'Driver for "configfs": Loaded\nFilesystem "configfs": Mounted\nStack glue driver: Loaded\nStack plugin "o2cb":
Loaded\nDriver for "ocfs2_dlmfs": Loaded\nFilesystem "ocfs2_dlmfs":
Mounted\nChecking O2CB cluster "f6f6b47b38e288e0": Online\n Heartbeat dead threshold: 61\n Network idle timeout: 60000\n Network keepalive delay: 2000\n Network reconnect delay: 2000\n Heartbeat mode: Global\nChecking O2CB heartbeat: Active\n 0004FB0000050000B705B4397850AAD6 /dev/dm-2\nNodes in O2CB cluster: 0 1 \nDebug file system at /sys/kernel/debug: mounted\n'}
[2023-06-22 11:10:25 12640] DEBUG (ocfs2:258) Trying to mount /dev/mapper/36861a6fddaa0481ec0dd3584514a8d62 to /poolfsmnt/0004fb0000050000b705b4397850aad6 "
Var/Log/Message outupt
Jun 27 12:55:32 calavsovm02 kernel: [ 659.079952] o2net: Connection to node calavsovm01 (num 0) at 10.110.110.101:7777 shutdown, state 7
Jun 27 12:55:34 calavsovm02 kernel: [ 661.080005] o2net: Connection to node calavsovm01 (num 0) at 10.110.110.101:7777 shutdown, state 7
Jun 27 12:55:36 calavsovm02 kernel: [ 663.079916] o2net: Connection to node calavsovm01 (num 0) at 10.110.110.101:7777 shutdown, state 7
Jun 27 12:55:38 calavsovm02 kernel: [ 665.080167] o2net: Connection to node calavsovm01 (num 0) at 10.110.110.101:7777 shutdown, state 7
Jun 27 12:55:40 calavsovm02 kernel: [ 667.079905] o2net: No connection established with node 0 after 60.0 seconds, check network and cluster configuration.
Troubleshoot steps
Validate storage settings on both the servers
First, validate from both nodes' storage visible. This can be validated by running a multipath -ll command.
[root@ovs-node01 ~]# multipath -ll
36861a6fddaa0481ec0dd3584514a8d62 dm-0 EQLOGIC,100E-00
size=16G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 11:0:0:0 sdc 8:32 active ready running
36861a6fddaa0787dbeddf57e514abd8a dm-1 EQLOGIC,100E-00
size=3.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 10:0:0:0 sdb 8:16 active ready running
36861a6fddaa0d8306edd157b4d4aed23 dm-2 EQLOGIC,100E-00
size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 9:0:0:0 sdd 8:48 active ready running
[root@ovs-node02 ~]# multipath -ll
36861a6fddaa0481ec0dd3584514a8d62 dm-1 EQLOGIC,100E-00
size=16G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 12:0:0:0 sde 8:64 active ready running
36861a6fddaa0787dbeddf57e514abd8a dm-2 EQLOGIC,100E-00
size=3.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 11:0:0:0 sdd 8:48 active ready running
36861a6fddaa0d8306edd157b4d4aed23 dm-0 EQLOGIC,100E-00
size=2.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 10:0:0:0 sdc 8:32 active ready running
[root@calavsovm02 oswatcher]#
Validate network Setting.
In this architecture, a storage connection is established via bond1. both servers are configured to use jumbo frames which 9000
Node02 Network settings
[root@ovs-node02 network-scripts]# cat ifcfg-bond1
DEVICE=bond1
BONDING_OPTS="mode=6 miimon=250 use_carrier=1 updelay=500 downdelay=500 primary_reselect=2 primary=eth1"
BOOTPROTO=static
IPADDR=*.*.*.*
NETMASK=*.*.*.*
ONBOOT=yes
MTU=9000
-- ifcfg-eth1
[root@ovs-node02 network-scripts]# cat ifcfg-eth1
DEVICE="eth1"
BOOTPROTO="none"
DHCP_HOSTNAME="ovs-node02"
HWADDR="*.*.*.*"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
UUID="36f65c92-3ad9-487b-b9ab-5bd792372d37"
MASTER=bond1
SLAVE=yes
MTU=9000
-- ifcfg-eth2
[root@ovs-node02 network-scripts]# cat ifcfg-eth2
DEVICE="eth2"
BOOTPROTO="none"
DHCP_HOSTNAME="ovs-node02"
HWADDR="*.*.*.*"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
UUID="44589639-3b05-453b-9dc5-1b1a3b4d4c4d"
MASTER=bond1
SLAVE=yes
MTU=9000
Validate Jumps frames working
As per our above-mentioned logs, we observed that there is a problem with the network connectivity, We thought of validating bond1 with 8000 packets and 1000 packets.
As per the below mentioned for 8000 packets there is no output. But we have output for 1000 packets. This concludes some configuration mismatch is there on the switch side.
Test 8000 packet
[root@ovs-node02 ~]# ping -s 8000 -M do 10.110.210.201
PING 10.110.210.201 (10.110.210.201) 8000(8028) bytes of data.
Test 1000 packet
[root@ovs-node02 ~]# ping -s 1000 -M do 10.110.210.201
PING 10.110.210.201 10.110.210.201) 1000(1028) bytes of data.
1008 bytes from 10.110.210.201: icmp_seq=1 ttl=64 time=0.183 ms
1008 bytes from 10.110.210.201: icmp_seq=2 ttl=64 time=0.194 ms
Solution
In the initial stage, we try to add the node to the cluster it's sending the 9000 packet to the storage network on node01. In this scenario, jumbo frames are not working as expected. So automatic storage mounting is not working.
The solution is to change the network MTU size on node02 to 1500 and restart the network service. On the safe side, you can reboot node02.
Change network settings to 1500 on node02
[root@ovs-node02 network-scripts]# cat ifcfg-bond1
DEVICE=bond1
BONDING_OPTS="mode=6 miimon=250 use_carrier=1 updelay=500 downdelay=500 primary_reselect=2 primary=eth1"
BOOTPROTO=static
IPADDR=*.*.*.*
NETMASK=*.*.*.*
ONBOOT=yes
MTU=1500
[root@ovs-node02 network-scripts]#
Now try to add node 02 to the cluster again. This solution worked for our environment, Now try to map other repositories to node02.
Conclusion
These issues are complex and we need to spend more time to understand the issue. I would recommend creating a service request with Oracle before making any changes.
Carefully looks at the logs ovs-agent and var-log-message to understand the issue. Also, I would suggest executing a sosreport of the problematic node.
No comments:
Post a Comment