Wednesday, January 26, 2022

ODA upgrade issues for - 12.1.2.6 to 12.1.2.10

   






ODA(Oracle Data Appliance) upgrades are really challenging. There is a high possibility that server upgrades can give you a hiccups during the grid component upgrade. This occurs mostly due to oakcli orchestration tool because oakcli completes certain checks sequentially before moving to the patching section. Having a clear understanding of each and every step gives the confidence to troubleshoot and overcome complex issues.

Patience is really important here, Once you reboot the oda_base nodes to take backups, mounting repos on dom0 takes close to 15- 20 min. During that time you can validate whether exportfs cluster settings are in order.

To verify the correctness of the exportfs below noted commands could be executed.

First, check the crsctl havip dependencies, if the dependencies are up and running it will take close to 15 min to show the repo on dom0



[root@ecl-odabase-0 ~]# crsctl status res ora.havip_3.havip -p | grep -i havip
NAME=ora.havip_3.havip
TYPE=ora.havip.type
HAVIP_DESCRIPTION=HAVIP_NODE1
HAVIP_ID=havip_3
START_DEPENDENCIES=hard(ora.net3.network,uniform:type:ora.havip_3.export.type) attraction(ora.data.kali_test.acfs,ora.data.kali_test.acfs,ora.data.qualys.acfs,ora.data.vmdata.acfs,ora.reco.vmsdev.acfs) dispersion:active(type:ora.havip.type) pullup(ora.net3.network) pullup:always(type:ora.havip_3.export.type)
STOP_DEPENDENCIES=hard(intermediate:ora.net3.network,uniform:intermediate:type:ora.havip_3.export.type)
[root@ecl-odabase-0 ~]#

After checking dependencies check havip status.



[root@ecl-odabase-0 ~]# crsctl status res -t | grep -A 2 -i havip
ora.havip_3.havip
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.havip_4.havip
      1        ONLINE  ONLINE       ecl-odabase-1            STABLE


Execute srvctl config and status commands to check havips are healthy



srvctl config havip
srvctl status havip

Expected output



[root@ecl-odabase-0 ~]# srvctl config havip
HAVIP exists: /havip_3/192.168.18.21, network number 3
Description: HAVIP_NODE1
Home Node: ecl-odabase-0
HAVIP is enabled.
HAVIP is individually enabled on nodes:
HAVIP is individually disabled on nodes: ecl-odabase-1
HAVIP exists: /havip_4/192.168.19.21, network number 4
Description: HAVIP_NODE2
Home Node: ecl-odabase-1
HAVIP is enabled.
HAVIP is individually enabled on nodes:
HAVIP is individually disabled on nodes: ecl-odabase-0

[root@ecl-odabase-0 ~]# srvctl status havip
HAVIP ora.havip_3.havip is enabled
HAVIP ora.havip_3.havip is individually disabled on nodes ecl-odabase-1
HAVIP ora.havip_3.havip is running on nodes ecl-odabase-0
HAVIP ora.havip_4.havip is enabled
HAVIP ora.havip_4.havip is individually disabled on nodes ecl-odabase-0
HAVIP ora.havip_4.havip is running on nodes ecl-odabase-1
[root@ecl-odabase-0 ~]#



These are checks triggered via oakcli orchestration tool. 

1. Check VM's are up and running, if so oakcli will shutdown the vms.

2. Check repo's are up and running, if so oakcli will shutdown the repos.

3. Check grid is up and running on both the nodes, as patching happens rolling fashion oakcli will shutdown the grid on the patching node.


There are certain checks that need to be completed by oakcli before moving on to the patching stage. Chances are high on getting errors while upgrading the grid and repo getting invisible from dom0. Mostly repo issues occur due to missing configuration from the cluster side. I went through the roller-coaster ride while upgrading  ODA from 12.1.2.6 to 12.1.2.10. 

This 12.1.2.6 to 12.1.2.10 is a minor upgrade. But still, We have encountered major issues on the repo and grid component. In this article, I will illustrate how to overcome this major grid issue.

On completion of node 2 upgrade, We had to deal with the repo visibility issue from dom0, After consulting oracle we did a reboot of oda_base nodes via ilom . Once the reboot is completed the asm exits on rolling patch mode. Because node02 was completed the grid patching, the cluster was in rolling patching mode.

This is the oracle documentation used to start asm in rolling patch mode.

https://docs.oracle.com/database/121/OSTMG/GUID-CF124D9F-FF28-4B71-B179-640F1258E98A.htm#OSTMG95330



SELECT SYS_CONTEXT('SYS_CLUSTER_PROPERTIES', 'CLUSTER_STATE') FROM DUAL;

SELECT SYS_CONTEXT('SYS_CLUSTER_PROPERTIES', 'CURRENT_PATCHLVL') FROM DUAL;

Sample output: This is sample output where the cluster is in a normal state.



SQL> show parameter db_uni

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_unique_name                       string      +ASM
SQL> SELECT SYS_CONTEXT('SYS_CLUSTER_PROPERTIES', 'CLUSTER_STATE') FROM DUAL;

SYS_CONTEXT('SYS_CLUSTER_PROPERTIES','CLUSTER_STATE')
--------------------------------------------------------------------------------
Normal

SQL> SELECT SYS_CONTEXT('SYS_CLUSTER_PROPERTIES', 'CURRENT_PATCHLVL') FROM DUAL;

SYS_CONTEXT('SYS_CLUSTER_PROPERTIES','CURRENT_PATCHLVL')
--------------------------------------------------------------------------------
221230242

SQL>

First, try to start the ASM in rolling mode


ALTER SYSTEM START ROLLING PATCH;

Solution: if you cannot put asm on rolling patch mode.

In our scenario, We were not able to put the asm on rolling patching mode, Only option left was to compare the patches and apply those missing patching on node01 There is a simple way to compare the patches on both nodes rather than running opatch lsinventory command.



[grid@piodadb0 ~]$ kfod op=patches
---------------
List of Patches
===============
19769480
20299023
20831110
21359755
21436941
21948344
21948354
22266052
[grid@piodadb0 ~]$ kfod op=patchlvl
-------------------
Current Patch level
===================
964513507
[grid@piodadb0 ~]$


-- Node02 - fully patched and there are few missing patches , it's even not there in opatch lsinventory.
[grid@piodadb1 ~]$ kfod op=patches
---------------
List of Patches
===============
19769480
20299023
20831110
21359755
21436941
21948344 -- Missing
21948354
22266052 -- Missing 
22291127
23054246
24006101
24732082
24828633
24828643

-- get the patch level
[grid@piodadb1 ~]$ kfod op=patchlvl
-------------------
Current Patch level
===================
2648582804

To make the grid patch level consistent we installed all the missing grid patches on node01 manually.

Apply grid patches manually.

All the required commands are there in the patching readme file, I just copied sample commands to make it easier.


-- Patch location
/opt/oracle/oak/pkgrepos/orapkgs/DB/12.1.0.2.170117/Patches/24732082

-- sample commands 
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<OCW TRACKING BUG>
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<ACFS TRACKING BUG>
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<DBWLM TRACKING BUG>
$ <GI_HOME>/OPatch/opatch apply -oh <GI_HOME> -local <UNZIPPED_PATCH_LOCATION>/%BUGNO%/<RDBMS PSU TRACKING BUG>

-- actual commands
export ORACLE_HOME=/u01/app/12.1.0.2/grid
$ORACLE_HOME/crs/install/rootcrs.sh -prepatch 

Note : if needed 
$grid_home/crs/install/rootcrs.sh -unlock -- manually unlock

$ /u01/app/12.1.0.2/grid/OPatch/opatch apply -oh /u01/app/12.1.0.2/grid -local /u01/12.1.0.2.170117_Patch/24917825/24828633 #OCW
$ /u01/app/12.1.0.2/grid/OPatch/opatch apply -oh /u01/app/12.1.0.2/grid -local /u01/12.1.0.2.170117_Patch/24917825/24828643 #ACFS
$ /u01/app/12.1.0.2/grid/OPatch/opatch apply -oh /u01/app/12.1.0.2/grid -local /u01/12.1.0.2.170117_Patch/24917825/21436941 #DBWLM 
$ /u01/app/12.1.0.2/grid/OPatch/opatch apply -oh /u01/app/12.1.0.2/grid -local /u01/12.1.0.2.170117_Patch/24917825/24732082 #Database Bundle

## Post patching

export ORACLE_HOME=/u01/app/12.1.0.2/grid
$ORACLE_HOME/rdbms/install/rootadd_rdbms.sh
$ORACLE_HOME/crs/install/rootcrs.sh -postpatch


On completion of manual grid patching, I found that the grid cluster version shows the different patch level. First, make sure to run opatch lsinvetory and understand the patches installed on grid home. Also, We can use kfod op=patchlvl command to verify the grid patch level and execute kfod op=patches to verify installed patches.

As a practice, I used lspatches to validate psu that applied from the quarterly patch bundle.




-- run this on both nodes
kfod op=patches
-- grid patching level
kfod op=patchlvl
[grid@piodadb0 ~]$ /u01/app/12.1.0.2/grid/OPatch/opatch lspatches 24732082;Database Patch Set Update : 12.1.0.2.170117 (24732082) 24828643;ACFS Patch Set Update : 12.1.0.2.170117 (24828643) 24828633;OCW Patch Set Update : 12.1.0.2.170117 (24828633) 21436941;WLM Patch Set Update: 12.1.0.2.5 (21436941) OPatch succeeded. [grid@piodadb0 ~]$

Also, it's better to check the applied date as verification.


export GI=/u01/app/12.1.0.2/grid
/u01/app/12.1.0.2/grid/OPatch/opatch lsinventory -oh $GI | grep ^Patch

[grid@piodadb0 ~]$ /u01/app/12.1.0.2/grid/OPatch/opatch lsinventory -oh $GI | grep ^Patch
Patch  24732082     : applied on Sun Dec 05 12:56:58 EST 2021
Patch description:  "Database Patch Set Update : 12.1.0.2.170117 (24732082)"
Patch  24828643     : applied on Sun Dec 05 12:54:57 EST 2021
Patch description:  "ACFS Patch Set Update :  12.1.0.2.170117 (24828643)"
Patch  24828633     : applied on Sun Dec 05 12:51:01 EST 2021
Patch description:  "OCW Patch Set Update : 12.1.0.2.170117 (24828633)"
Patch  21436941     : applied on Wed Oct 21 22:30:48 EDT 2015
Patch description:  "WLM Patch Set Update: 12.1.0.2.5 (21436941)"
Patch level status of Cluster nodes :
[grid@piodadb0 ~]$

We found that still there are some additional patches , but it's not there in the binary level . First consult this with oracle engineer and get the confirmation to remove these patches on existing node. you can remove these patches from grid binary level. Missing patches were not installed on binary level , so we had to remove those patches from grid, We used below mention command to remove patch from grid.


$grid_home/bin/patchgen commit -rb <missing-patch> command . 

Once complete on removing the patches on binary level We need to run patch command to lock the grid. Once patch command completes verify the cluster patch level and installed patches on both nodes



$grid_home/crs/install/rootcrs.sh -patch

Conclusion

Before proceeding with patching always create a proactive SR with oracle , as Virtulized ODA upgrades are tricky

Exacs database creation using dbaascli

  Intro OCI (Oracle Cloud Infrastructure) provides robust automation capabilities for routine maintenance tasks such as patching, ...