Friday, June 30, 2023

OVM - Remove stale cluster enties

 






Intro 

It has been ages since when Oracle released its own hypervisor (OVM).  OVM   technology is based on paravirtualization and uses Xen-based hypervisor.  OVM's latest release version 3.4.6.3 is the latest one available. Oracle announces extended support for OVM and the support period is March 2021 and will end on March 31, 2024.

If you need more information about OVM support read the below mentioned article: https://blogs.oracle.com/virtualization/post/announcing-oracle-vm-3-extended-support/

This is going to be the end of the OVM tree, after this there will be no release for OVM. The latest technology going to Oracle KVM.  Oracle KVM is much more stable than the OVM and gives you more flexibility in the virtualization environment. If you are still planning on staying on on-prem. I would say this is the right time to plan your journey to KVM.  

In this article, I will cover the issue we faced recently in the OVM environment.

 

Overview of the issue

    
We faced a new issue with the OVM cluster environment. This was caused due to sudden data center power outage. Once everything was online we were not able to start the OVM hypervisor. so we had to perform a complete reinstallation of the node. 

When I try to add node backup to the cluster we faced an issue with mounting the repositories. The next option was to remove the nodes from the cluster again, This action was performed via GUI.

I did validation and realized that cluster entries are still there in node02. we found that there were some stale entries in the master node and node02.

Please find the Oracle meta link node that covers the issue for stale entries : 
OVM - How To Remove Stale Entry of the Oracle VM Server which was Removed from The Pool (Doc ID 2418834.1)

How to identify there are stale entire?

First, validate the o2cb status. This is the cluster service which consists of all the information about the cluster. I have highlighted node information in red color.

[root@ovm-node02 ~]# service o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster "f6f6b47b38e288e0": Online
  Heartbeat dead threshold: 61
  Network idle timeout: 60000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
  Heartbeat mode: Global
Checking O2CB heartbeat: Active
  0004FB0000050000B705B4397850AAD6 /dev/dm-2
Nodes in O2CB cluster: 0 1
Debug file system at /sys/kernel/debug: mounted

Now let's check entries from node02, if this is correctly removed from the cluster you should see only one entry. But here there are two entries.

[root@ovm-node02 ovm-node02]# ls -lrth /sys/kernel/config/cluster/f6f6b47b38e288e0/node/
total 0
drwxr-xr-x 2 root root 0 Jun 23 09:28 ovm-node02
drwxr-xr-x 2 root root 0 Jun 23 09:33 ovm-node01
[root@ovm-node02 ovm-node02]#

The next step is to validate from the master node (ovm-node01)  database entries. This shows there are two pool_member_ip_list. 



[root@ovm-node01]# ovs-agent-db dump_db server
{'cluster_state': 'DLM_Ready',
 'clustered': True,
 'fs_stat_uuid_list': ['0004fb000005000015c1fb14ef761f40',
                       '0004fb000005000079ae03177c3edc7e',
                       '0004fb000005000065985109f8834e8b'],
 'is_master': True,
 'manager_event_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Event',
 'manager_ip': '192.168.85.152',
 'manager_statistic_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Statistic',
 'manager_uuid': '0004fb0000010000c8ecbd219dc6b1ee',
 'node_number': 0,
 'pool_alias': 'EclipsysOVM',
 'pool_master_ip': '192.168.85.177',
 'pool_member_ip_list': ['192.168.85.177', '192.168.85.178'],
 'pool_uuid': '0004fb0000020000f6f6b47b38e288e0',
 'poolfs_nfsbase_uuid': '',
 'poolfs_target': '/dev/mapper/36861a6fddaa0481ec0dd3584514a8d62',
 'poolfs_type': 'lun',
 'poolfs_uuid': '0004fb0000050000b705b4397850aad6',
 'registered_hostname': 'ovm-node01',
 'registered_ip': '192.168.85.177',
 'roles': set(['utility', 'xen'])}
[root@calavsovm01 ovm-node01]#

Remove node from cluster commands line

Now we can remove the oven-node02 from the second node.


[root@ovm-node01]# o2cb remove-node f6f6b47b38e288e0 ovm-node02

Validate node entries

After removing node02, we can see only one entry in the OVM database.


[root@ovm-node01]# ls /sys/kernel/config/cluster/f6f6b47b38e288e0/node/
ovm-node02
[root@ovm-node01]#

Validate using O2CB

First, restart the ovs-agent on both nodes and validate the o2cb cluster status from node01.


[root@ovm-node01]# service ovs-agent restart
Stopping Oracle VM Agent:                                  [  OK  ]
Starting Oracle VM Agent:                                  [  OK  ]

[root@ovm-node01 ~]# service ovs-agent status
log server (pid 32442) is running...
notificationserver server (pid 32458) is running...
remaster server (pid 32464) is running...
monitor server (pid 32466) is running...
ha server (pid 32468) is running...
stats server (pid 32470) is running...
xmlrpc server (pid 32474) is running...
fsstats server (pid 32476) is running...
apparentsize server (pid 32477) is running...
[root@ovm-node01 ~]#


Also I would recommend to restart the node02 after the node removal, Once the node is back online validate the /etc/ocfs2/cluster.conf


 
 [root@ovm-node01 ~]# cat /etc/ocfs2/cluster.conf
cluster:
        heartbeat_mode = global
        node_count = 1
        name = f6f6b47b38e288e0

node:
        number = 0
        cluster = f6f6b47b38e288e0
        ip_port = 7777
        ip_address = 10.110.110.101
        name = ovm-node01

heartbeat:
        cluster = f6f6b47b38e288e0
        region = 0004FB0000050000B705B4397850AAD6
 
 

Note: ovs-agent restart won't have any impact on running VMs.


  
  [root@ovm-node01]# ovs-agent-db dump_db server
{'cluster_state': 'DLM_Ready',
 'clustered': True,
 'fs_stat_uuid_list': ['0004fb000005000015c1fb14ef761f40',
                       '0004fb000005000079ae03177c3edc7e',
                       '0004fb000005000065985109f8834e8b'],
 'is_master': True,
 'manager_event_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Event',
 'manager_ip': '192.168.85.152',
 'manager_statistic_url': 'https://192.168.85.152:7002/ovm/core/wsapi/rest/internal/Server/08:00:20:ff:ff:ff:ff:ff:ff:ff:00:10:e0:ef:de:6a/Statistic',
 'manager_uuid': '0004fb0000010000c8ecbd219dc6b1ee',
 'node_number': 0,
 'pool_alias': 'EclipsysOVM',
 'pool_master_ip': '192.168.85.177',
 'pool_member_ip_list': ['192.168.85.177'],
 'pool_uuid': '0004fb0000020000f6f6b47b38e288e0',
 'poolfs_nfsbase_uuid': '',
 'poolfs_target': '/dev/mapper/36861a6fddaa0481ec0dd3584514a8d62',
 'poolfs_type': 'lun',
 'poolfs_uuid': '0004fb0000050000b705b4397850aad6',
 'registered_hostname': 'ovm-node01',
 'registered_ip': '192.168.85.177',
 'roles': set(['utility', 'xen'])}
[root@ovm-node01]#
  
  

Conclusion

There can be situations gui will not remove the entries from the OVM hypervisor.  Always validate the OVM data entries before retying the node addition to the cluster. Make sure the cluster-shared repositories are mounting automatically.  

 

No comments:

Post a Comment

Exacs database creation using dbaascli

  Intro OCI (Oracle Cloud Infrastructure) provides robust automation capabilities for routine maintenance tasks such as patching, ...