Tuesday, September 14, 2021

Virtualized ODA upgrade 18.3.0.0 to 18.8.0.0 - Journey to 12.1.2.12 - 19.13.0.0 - Part 2

 

ODA upgrade 18.3.0.0 to 18.8.0.0





I expect the previous blog was useful for patching oda from 12.1.2.12 to 18.3.0.0. 

Our plan is to upgrade oda with the latest 12.1.2.12 to 19.13.0.0. Before moving to 19.13.0.0 we need to upgrade oda to 18.8.0.0. (Plan is there in ODA upgrade 12.1.2.12 to 18.3.0.0 -- Journey to 19.13.0.0 - Part 1).

This blog elaborates on the steps taken to patch virtualized oda from 18.3.0.0 to 18.8.0.0. ODA virtualized platform - X5 upgrade commands are orchestrated by oakcli utility.

The previous patching grid upgraded from 12C to 18C and that was the major upgrade. In this patching 18.8.0.0 applies grid PSU on top of the current 18c grid binary and there are few storage patches included in this upgrade.

This article is focused on 18.8.0.0 upgrade for X5 hardware platform.

How to find the hardware version.

Find hardware version:


[root@ecl-odabase-0 delshare]# oakcli show env_hw
VM-ODA_BASE ODA X5-2
[root@ecl-odabase-0 delshare]#



Patching plan :

12.1.2.12 - >  18.3.0.0 - complete
18.3.0.0  - >  18.8.0.0 - In - progress
18.8.0.0  - >  19.8.0.0 -
19.8.0.0  - >  19.9.0.0 -
To get an understanding please find the patching sequence below. 

Also make sure to run oakcli show disk to validate the disk status, if there are any disk failures address those disk failures before the patching.

###########  Patching sequnece

1. computenodes -ODA_BASE
2. storage
3. database - create new 18.8.0.0 home and move database to 18.8 or 
           upgrade oracle database with latest psu that comes with 18.8 bundle

############ Pre checkDisk status 
oakcli show disk

1. Preparation

1.1 Space requirement



First of all we need to ensure we have enough space on (root mount point ) /, /u01 and /opt file systems. At least 20 GB should be available. If not, we can do some cleaning or extend the LVM partitions to gain space.

df -h / /u01 /opt

[root@ecl-odabase-0 18.8.0.0]# df -h / /opt /u01
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       55G   33G   20G  63% /
/dev/xvda2       55G   33G   20G  63% /
/dev/xvdb1       92G   61G   27G  70% /u01
[root@ecl-odabase-0 18.8.0.0]#

Download the Oracle Database Appliance Server Patch for OAK Stack and Virtualized Platforms (patch 30518438)

Stage the patches in /u01 mount point and unpack the binaries.

# oakcli unpack -package /tmp/p30518438_188000_Linux-x86-64_1of2.zip
# oakcli unpack -package /tmp/p30518438_188000_Linux-x86-64_2of2.zip

/u01/PATCH
# oakcli unpack -package /u01/PATCH/18.8.0.0/p30518438_188000_Linux-x86-64_1of2.zip
# oakcli unpack -package /u01/PATCH/18.8.0.0/p30518438_188000_Linux-x86-64_2of2.zip
Please find the expected output after unpacking

expected output:

[root@ecl-odabase-0 18.8.0.0]# oakcli unpack -package /u01/PATCH/18.8.0.0/p30518438_188000_Linux-x86-64_1of2.zip
Unpacking will take some time,  Please wait...
Successfully unpacked the files to repository.

[root@ecl-odabase-0 18.8.0.0]# oakcli unpack -package /u01/PATCH/18.8.0.0/p30518438_188000_Linux-x86-64_2of2.zip
Unpacking will take some time,  Please wait...
Successfully unpacked the files to repository.
[root@ecl-odabase-0 18.8.0.0]#

Once the unpacking completes update the repository with the latest patches.

############ update repository with latest bundle patches 
oakcli update -patch 18.8.0.0.0 --verify

1.2 Backup ODA Base 


ODA_BASE backup can be taken from DOM0. Also, take the database full-back and VM backup before performing this upgrade activity.


- Take level zero backup of the running databases
- Backup the running vms
- Backup oda_base(domu) from dom0
2. Pre-Patching Steps

Before running the patching commands always make sure to check the pre patching report for os and components. If there are any major issues you can work with oracle to address these issues before patching.

2.1 OS post-validation steps  

Use below mentioned commands to validate the os upgrade. run this from both the nodes.
########## Validate ospatch
oakcli validate -c ospatch -ver 18.8.0.0.0
These commands use to validate the ODA components.

########## Validate from first node

oakcli validate -a

3. Patching 

 We will follow the below mentioned patching sequence. First start patching with compute nodes (oda_base), storage, and last database PSU.

###########  Patching sequnece
computenodes -ODA_BASE
storage
database - create new 18.8.0.0 home and move database to 18.8 or 
           upgrade oracle database with latest psu that comes with 18.8 bundle
First note down the running VM and the repo details. use the below commands to note down the running repo's.

[root@ecl-odabase-0 18.3.0.0]# oakcli show repo



          NAME                          TYPE            NODENUM  FREE SPACE     STATE           SIZE


          kali_test                     shared          0               N/A     OFFLINE         N/A

          kali_test                     shared          1               N/A     OFFLINE         N/A

          odarepo1                      local           0               N/A     N/A             N/A

          odarepo2                      local           1               N/A     N/A             N/A

          qualys                        shared          0               N/A     OFFLINE         N/A

          qualys                        shared          1               N/A     OFFLINE         N/A

          vmdata                        shared          0               N/A     OFFLINE         N/A

          vmdata                        shared          1            99.99%     ONLINE          4068352.0M

          vmsdev                        shared          0               N/A     OFFLINE         N/A

          vmsdev                        shared          1               N/A     UNKNOWN         N/A
          
Use below mentioned commands to note down running VMS.

[root@ecl-odabase-1 PATCH]# oakcli show vm

          NAME                                  NODENUM         MEMORY          VCPU            STATE           REPOSITORY

        kali_server                             0               4196M              2            UNKNOWN         kali_test
        qualyssrv                               0               4196M              2  
Note: 18.8 there is a bug for TFA : 

TFA – it should be stopped manually. before starting the patching process.

/etc/init.d/init.tfa stop
expected output (TFA):

[root@ecl-odabase-0 18.8.0.0]# /etc/init.d/init.tfa stop
Stopping TFA from init for shutdown/reboot
oracle-tfa stop/waiting
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
TFAmain Force Stopped Successfully : status mismatch
TFA Stopped Successfully
Killing TFA running with pid 19343
. . .
Successfully stopped TFA..
[root@ecl-odabase-0 18.8.0.0]#

[root@ecl-odabase-1 ~]# /etc/init.d/init.tfa stop
Stopping TFA from init for shutdown/reboot
oracle-tfa stop/waiting
WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.
TFA-00104 Cannot establish connection with TFA Server. Please check TFA Certificates
Killing TFA running with pid 10681
. . .
Successfully stopped TFA..
[root@ecl-odabase-1 ~]#

3.1 Patching ODA Base servers.


Run below mention patching command in screen terminal, so we do not want to panic about the connection disconnections. If the connection got interrupted during the patching window we can still attach the screen using screen -r.

screen
screen -ls -- screen terminal verification
script /tmp/odabase_upgrade_18800_19082021.txt - record all the steps 
/opt/oracle/oak/bin/oakcli update -patch 18.8.0.0.0 --server

3.2 Troubleshooting server patching issues

Patching failed due to grid pre-check failure. 
This is the error displayed in the terminal window.



 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 stop: Unknown instance:
 TFA-00002 Oracle Trace File Analyzer (TFA) is not running
 SUCCESS: 2021-08-17 15:14:55: Successfully update AHF rpm.
 INFO: 2021-08-17 15:14:55: ------------------Patching Grid-------------------------
 INFO: 2021-08-17 15:14:57: Clusterware is not running on local node
 INFO: 2021-08-17 15:14:57: Attempting to start clusterware and its resources on local
 node
 INFO: 2021-08-17 15:16:16: Successfully started the clusterware on local node

 INFO: 2021-08-17 15:16:16: Checking for available free space on /, /tmp, /u01
 INFO: 2021-08-17 15:16:16: Shutting down Clusterware and CRS on local node.
 INFO: 2021-08-17 15:16:16: Clusterware is running on local node
 INFO: 2021-08-17 15:16:16: Attempting to stop clusterware and its resources locally
 SUCCESS: 2021-08-17 15:17:18: Successfully stopped the clusterware on local node

 INFO: 2021-08-17 15:17:18: Shutting down CRS on the node...
 SUCCESS: 2021-08-17 15:17:21: Successfully stopped CRS processes on the node
 INFO: 2021-08-17 15:17:21: Checking for running CRS processes on the node.
 INFO: 2021-08-17 15:17:43: Starting up CRS and Clusterware on the node
 INFO: 2021-08-17 15:17:43: Starting up CRS on the node...
 SUCCESS: 2021-08-17 15:20:49: CRS has started on the node

 INFO: 2021-08-17 15:20:51: Running cluvfy to correct cluster state
 ERROR: 2021-08-17 15:24:49: Clusterware state is not NORMAL.
 ERROR: 2021-08-17 15:24:49: Failed to patch server (grid) component

error at Command = /usr/bin/ssh -l root ecl-odabase-1 /opt/oracle/oak/pkgrepos/System/18.8.0.0.0/bin/PatchDriver -tag 20210817140429 -server -version 18.8.0.0.0> and errnum=
ERROR  : Command = /usr/bin/ssh -l root ecl-odabase-1 /opt/oracle/oak/pkgrepos/System/18.8.0.0.0/bin/PatchDriver -tag 20210817140429 -server -version 18.8.0.0.0 did not complete successfully. Exit code 1 #Step -1#
Exiting...
ERROR: Unable to apply the patch 
oda patching and other logs are under /opt mount point , still patching is Log location : /opt/oracle/oak/log/ecl-odabase-0/patch/18.8.0.0.0

ecl-odabase-1: PRVG-11368 : A SCAN is recommended to resolve to "3" or more IP
                addresses, but SCAN "ecl-oda-scan" resolves to only
                "/10.11.30.48,/10.11.30.49"

 ecl-odabase-0: PRVG-11368 : A SCAN is recommended to resolve to "3" or more IP
                addresses, but SCAN "ecl-oda-scan" resolves to only
                "/10.11.30.48,/10.11.30.49"

   Verifying DNS/NIS name service 'ecl-oda-scan' ...FAILED
   PRVG-1101 : SCAN name "ecl-oda-scan" failed to resolve

 Verifying Clock Synchronization ...FAILED
   Verifying Network Time Protocol (NTP) ...FAILED
     Verifying NTP daemon is synchronized with at least one external time source
     ...FAILED
     ecl-odabase-1: PRVG-13602 : NTP daemon is not synchronized with any
                    external time source on node "ecl-odabase-1".

     ecl-odabase-0: PRVG-13602 : NTP daemon is not synchronized with any
                    external time source on node "ecl-odabase-0".


 CVU operation performed:      stage -post crsinst
 Date:                         Aug 17, 2021 3:20:55 PM
 CVU home:                     /u01/app/18.0.0.0/grid/
 User:                         grid

2021-08-17 15:24:49: Executing cmd: /u01/app/18.0.0.0/grid/bin/crsctl query crs activeversion -f
2021-08-17 15:24:49: Command output:
>  Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [UPGRADE FINAL]. The cluster active patch level is [3769208751]. ,
>End Command output
2021-08-17 15:24:49: ERROR:  Clusterware state is not NORMAL.

Followed:How to resolve the cluster upgrade to state of [UPGRADE FINAL] after successfully upgrading Grid Infrastructure (GI) to 18c or higher (Doc ID 2583141.1)

The solution to address grid pre-patching issues.

######### Solution

1. Issue "/u01/app/18.0.0.0/grid/bin/cluvfy stage -post crsinst -gi_upgrade -n all"
2. Fix the critical errors that above command reports
3. Rerun "/u01/app/18.0.0.0/grid/bin/cluvfy stage -post crsinst -collect cluster -gi_upgrade -n all"
4. Issue "/u01/app/18.0.0.0/grid/bin/crsctl query crs activeversion -f" and confirm that the cluster upgrade state is [NORMAL].
5. If the above command still reports that the the cluster upgrade state is [UPGRADE FINAL], repeat steps 1 to 3 and fix all critical errors.

3 issues are to be addressed in this scenario.

  1.  NTP 
  2.  DNS issue 
  3.  only two scan addresses are configured.

3.2.1 Resolution for NTP

In this environment we do not have an NTP server, so the only possibility is to use cluster network time. To use the in-build cluster network time feature we need to mv the NTP configuration files and start the cluster using the in-build network feature.


mv /etc/ntp.conf /etc/ntp.conf.ori
rm /var/run/ntpd.pid
crsctl start crs

3.2.2 Resolution for DNS issue 

 In this scenario, we do not have DNS server, so the plan is to use /etc/hosts file as an alternative. add required IP address in the /etc/hosts file in both servers. The first comment to resolve conf entry, if not cluster we try to resolve the IP address for DNS server.

[root@ecl-odabase-0 18.8.0.0]# cat /etc/resolv.conf

# Following added by OneCommand
search newco.local
#nameserver 10.11.30.254
# End of section
[root@ecl-odabase-0 18.8.0.0]#

[oracle@ecl-odabase-0 gg_191004]$ cat /etc/hosts


# Following added by OneCommand
127.0.0.1    localhost.localdomain localhost

# PUBLIC HOSTNAMES

# PRIVATE HOSTNAMES
192.168.16.27    ecl-oda-lab1-priv0.newco.local    ecl-oda-lab1-priv0
192.168.16.28    ecl-oda-lab2-priv0.newco.local    ecl-oda-lab2-priv0

# NET(0-3) HOSTNAMES
10.11.30.155    ecl-odabase-0.newco.local ecl-odabase-0
10.11.30.156    ecl-odabase-1.newco.local ecl-odabase-1

# VIP HOSTNAMES
10.11.30.157    ecl-oda-0-vip.newco.local  ecl-oda-0-vip
10.11.30.158    ecl-oda-1-vip.newco.local  ecl-oda-1-vip

# Below are SCAN IP addresses for reference.
# SCAN_IPS=(10.11.30.48 10.11.30.49)
10.11.30.48     ecl-oda-scan.newco.local  ecl-oda-scan
10.11.30.49     ecl-oda-scan.newco.local  ecl-oda-scan
10.11.30.50     ecl-oda-scan.newco.local  ecl-oda-scan

10.11.30.105  eclipsys-noc.localdomain          eclipsys-noc
[oracle@ecl-odabase-0 gg_191004]$

3.2.3 Resolution for missing scan address. 

 In this environment, we had only 2 scans of IP addresses. so we need to configure a new scan address. Check with your network team member and obtain IP address from the same scan range. This environment scan address is 10.11.30.50, Once you add this on both the nodes run below-mentioned commands to discover the new scan address.

/u01/app/18.0.0.0/grid/bin/srvctl modify scan -n ecl-oda-scan
Now run config commands to verify

[root@ecl-odabase-0 ~]# /u01/app/18.0.0.0/grid/bin/srvctl config scan
SCAN name: ecl-oda-scan, Network: 1
Subnet IPv4: 10.11.30.0/255.255.255.0/eth0, static
Subnet IPv6:
SCAN 1 IPv4 VIP: 10.11.30.48
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes:
SCAN VIP is individually disabled on nodes:
SCAN 2 IPv4 VIP: 10.11.30.49
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes:
SCAN VIP is individually disabled on nodes:
SCAN 3 IPv4 VIP: 10.11.30.50
SCAN VIP is enabled.
Once it’s discovered make sure to check the scan listener status and start the newly configured scan.

[root@ecl-odabase-0 ~]# srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node ecl-odabase-1
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node ecl-odabase-0
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is not running
[root@ecl-odabase-0 ~]#
So now time to verify the cluster post-check again. If there are any issues we need to address those before patching.

/u01/app/18.0.0.0/grid/bin/cluvfy stage -post crsinst -collect cluster -gi_upgrade -n all
When there are no more cluster issues, we can start the patching again using the below commands which is already mentioned in the chapter (3.1 Patching ODA base server)

script /tmp/odabase_upgrade_18800_19082021.txt - record all the steps 
/opt/oracle/oak/bin/oakcli update -patch 18.8.0.0.0 --server

3.3 Server patching failed on ilom 


 Again we faced an obstacle on server patching, this time it failed on ilom patching.

Note: We faced this error while performing the ODA patch for 18.3 - 18.8 upgrade and node01

Error:
ERROR  : Ran '/usr/bin/scp  root@192.168.16.28:/opt/oracle/oak/install/oakpatch_summary /opt/oracle/oak/install/oakpatch_summary' and it returned code(1) and output is:
         ssh: connect to host 192.168.16.28 port 22: Connection timed out

INFO: Infrastructure patching summary on node: 192.168.16.28

INFO: Running post-install scripts
INFO: Running postpatch on node 1...
ERROR  : Ran '/usr/bin/ssh -l root 192.168.16.28 /opt/oracle/oak/pkgrepos/System/18.8.0.0.0/bin/postpatch -v 18.8.0.0.0 --infra --gi -tag 20210819112727' and it returned code(255) and output is:
         ssh: connect to host 192.168.16.28 port 22: Connection timed out

error at  --gi="" --infra="" -l="" -tag="" -v="" 18.8.0.0.0="" 192.168.16.28="" 20210819112727="" bin="" oak="" opt="" oracle="" pkgrepos="" postpatch="" root="" ssh="" usr="" ystem=""> and errnum=
ERROR  : Command = /usr/bin/ssh -l root 192.168.16.28 /opt/oracle/oak/pkgrepos/System/18.8.0.0.0/bin/postpatch -v 18.8.0.0.0 --infra --gi -tag 20210819112727 did not complete successfully. Exit code 255 #Step -1#
Exiting...
ERROR: Unable to apply the patch 

3.3.1 ILOM patching solution

Only solution is to restart the oda_base and dom0 from ilom console , after reboot please check the oda components.

loging to ilom and power cycle the node 01 server.
### verify the component version once the node is fully up 
oakcli show version -detail
Validation output


========================
18.8 After pacthing 
========================
#### Node 01

[root@ecl-odabase-0 ~]# oakcli show version -detail
Reading the metadata. It takes a while...

System Version  Component Name            Installed Version         Supported Version
--------------  ---------------           ------------------        -----------------
18.8.0.0.0
                Controller_INT            4.650.00-7176             Up-to-date
                Controller_EXT            13.00.00.00               Up-to-date
                Expander                  0018                      001E
                SSD_SHARED {
                [ c1d20,c1d21,c1d22,      A29A                      Up-to-date
                c1d23,c1d44,c1d45,c1
                d46,c1d47 ]
                [ c1d16,c1d17,c1d18,      A29A                      Up-to-date
                c1d19,c1d40,c1d41,c1
                d42,c1d43 ]
                             }
                HDD_LOCAL                 A7E0                      Up-to-date
                HDD_SHARED {
                [ c1d0,c1d1,c1d2,c1d      PAG1                      PD51
                3,c1d4,c1d5,c1d6,c1d
                7,c1d8,c1d9,c1d10,c1
                d11,c1d12,c1d13,c1d1
                4,c1d15,c1d28 ]
                [ c1d24,c1d25,c1d26,      A3A0                      Up-to-date
                c1d27,c1d29,c1d30,c1
                d31,c1d32,c1d33,c1d3
                4,c1d35,c1d36,c1d37,
                c1d38,c1d39 ]
                             }
                ILOM                      4.0.4.52 r132805          Up-to-date
                BIOS                      30300200                  Up-to-date
                IPMI                      1.8.15.0                  Up-to-date
                HMP                       2.4.5.0.1                 Up-to-date
                OAK                       18.8.0.0.0                Up-to-date
                OL                        6.10                      Up-to-date
                OVM                       3.4.4                     Up-to-date
                GI_HOME                   18.8.0.0.191015           Up-to-date
                DB_HOME                   12.1.0.2.180717           12.1.0.2.191015
[root@ecl-odabase-0 ~]#

3.4 Troubleshoot shared repo start-up issues.


On completion, we noticed that shared repositories were not coming up due havip startup issues. Because all the exportfs mount points were missing from the cluster.

Dom0 mount points are mounted as NFS share and dynamic entries created under /etc/mtab were missing.
Please find the sample /etc/mtab entry for your perusal.
/dev/sda3 / ext3 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/sda2 /OVS ext3 rw 0 0
/dev/sda1 /boot ext3 rw 0 0
tmpfs /dev/shm tmpfs rw 0 0
debugfs /sys/kernel/debug debugfs rw 0 0
xenfs /proc/xen xenfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
none /var/lib/xenstored tmpfs rw 0 0
192.168.18.21:/u01/app/sharedrepo/vmstor1 /OVS/Repositories/vmstor1 nfs rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,nfsvers=3,timeo=600,addr=192.168.18.21 0 0
[root@pinode0 ~]#
This shared-repo issue is recorded under the know issues section, But we found this slight difference because the repo is not mounted on the dom0 server. 

https://docs.oracle.com/en/engineered-systems/oracle-database-appliance/18.8/cmtrn/issues-with-oda-odacli.html#GUID-5BA56322-127F-424F-8D1E-DEB3939CD60C



### verify the component version once the node is fully up 
oakcli show version -detail
Validation output

========================
18.8 After pacthing 
========================
#### Node 01

[root@ecl-odabase-0 ~]# oakcli show version -detail
Reading the metadata. It takes a while...

System Version  Component Name            Installed Version         Supported Version
--------------  ---------------           ------------------        -----------------
18.8.0.0.0
                Controller_INT            4.650.00-7176             Up-to-date
                Controller_EXT            13.00.00.00               Up-to-date
                Expander                  0018                      001E
                SSD_SHARED {
                [ c1d20,c1d21,c1d22,      A29A                      Up-to-date
                c1d23,c1d44,c1d45,c1
                d46,c1d47 ]
                [ c1d16,c1d17,c1d18,      A29A                      Up-to-date
                c1d19,c1d40,c1d41,c1
                d42,c1d43 ]
                             }
                HDD_LOCAL                 A7E0                      Up-to-date
                HDD_SHARED {
                [ c1d0,c1d1,c1d2,c1d      PAG1                      PD51
                3,c1d4,c1d5,c1d6,c1d
                7,c1d8,c1d9,c1d10,c1
                d11,c1d12,c1d13,c1d1
                4,c1d15,c1d28 ]
                [ c1d24,c1d25,c1d26,      A3A0                      Up-to-date
                c1d27,c1d29,c1d30,c1
                d31,c1d32,c1d33,c1d3
                4,c1d35,c1d36,c1d37,
                c1d38,c1d39 ]
                             }
                ILOM                      4.0.4.52 r132805          Up-to-date
                BIOS                      30300200                  Up-to-date
                IPMI                      1.8.15.0                  Up-to-date
                HMP                       2.4.5.0.1                 Up-to-date
                OAK                       18.8.0.0.0                Up-to-date
                OL                        6.10                      Up-to-date
                OVM                       3.4.4                     Up-to-date
                GI_HOME                   18.8.0.0.191015           Up-to-date
                DB_HOME                   12.1.0.2.180717           12.1.0.2.191015
[root@ecl-odabase-0 ~]#


Error log :

########## Error

####################### DOM 0 NODE01
2021-08-30 12:19:41,201 [Cmd_EnvId] [MainThread] [repoactions] [INFO] [162] Checking for shared repos
2021-08-30 12:19:44,228 [Cmd_EnvId] [MainThread] [repoactions] [ERROR] [182] Error encountered while checking for shared repos: OAKERR:7084The HAVIP 192.168.18.21 is not pingable
2021-08-30 12:19:44,230 [Cmd_EnvId] [MainThread] [agentutils] [DEBUG] [181] Created xml string 0OAKERR:7084The HAVIP 192.168.18.21 is not pingable
2021-08-30 12:19:59,169 [Cmd_EnvId] [MainThread] [repoactions] [INFO] [162] Checking for shared repos
2021-08-30 12:20:02,193 [Cmd_EnvId] [MainThread] [repoactions] [ERROR] [182] Error encountered while checking for shared repos: OAKERR:7084The HAVIP 192.168.18.21 is not pingable
2021-08-30 12:20:02,194 [Cmd_EnvId] [MainThread] [agentutils] [DEBUG] [181] Created xml string 0OAKERR:7084The HAVIP 192.168.18.21 is not pingable

####################### DOM 0 NODE02
2021-08-30 12:30:16,328 [Cmd_EnvId] [MainThread] [repoactions] [INFO] [162] Checking for shared repos
2021-08-30 12:30:19,364 [Cmd_EnvId] [MainThread] [repoactions] [ERROR] [182] Error encountered while checking for shared repos: OAKERR:7084The HAVIP 192.168.19.21 is not pingable
2021-08-30 12:30:19,366 [Cmd_EnvId] [MainThread] [agentutils] [DEBUG] [181] Created xml string 0OAKERR:7084The HAVIP 192.168.19.21 is not pingable
2021-08-30 12:31:36,167 [Cmd_EnvId] [MainThread] [repoactions] [INFO] [162] Checking for shared repos
2021-08-30 12:31:39,188 [Cmd_EnvId] [MainThread] [repoactions] [ERROR] [182] Error encountered while checking for shared repos: OAKERR:7084The HAVIP 192.168.19.21 is not pingable
2021-08-30 12:31:39,189 [Cmd_EnvId] [MainThread] [agentutils] [DEBUG] [181] Created xml string 0OAKERR:7084The HAVIP 192.168.19.21 is not pingable
2021-08-30 12:32:53,686 [Cmd_EnvId] [MainThread] [repoactions] [INFO] [162] Checking for shared repos

Secondly, check the acfsmount point status using below mention command.


[root@ecl-odabase-0 ~]# /sbin/acfsutil registry -l
Device : /dev/asm/datastore-37 : Mount Point : /u02/app/oracle/oradata/datastore : Options : none : Nodes : all : Disk Group: DATA : Primary Volume : DATASTORE : Accelerator Volumes :
Device : /dev/asm/datcdbdev-37 : Mount Point : /u02/app/oracle/oradata/datcdbdev : Options : none : Nodes : all : Disk Group: DATA : Primary Volume : DATCDBDEV : Accelerator Volumes :
Device : /dev/asm/kali_test-37 : Mount Point : /u01/app/sharedrepo/kali_test : Options : none : Nodes : all : Disk Group: DATA : Primary Volume : KALI_TEST : Accelerator Volumes :
Device : /dev/asm/qualys-37 : Mount Point : /u01/app/sharedrepo/qualys : Options : none : Nodes : all : Disk Group: DATA : Primary Volume : QUALYS : Accelerator Volumes :
Device : /dev/asm/vmdata-37 : Mount Point : /u01/app/sharedrepo/vmdata : Options : none : Nodes : all : Disk Group: DATA : Primary Volume : VMDATA : Accelerator Volumes :
Device : /dev/asm/flashdata-216 : Mount Point : /u02/app/oracle/oradata/flashdata : Options : none : Nodes : all : Disk Group: FLASH : Primary Volume : FLASHDATA : Accelerator Volumes :
Device : /dev/asm/datastore-445 : Mount Point : /u01/app/oracle/fast_recovery_area/datastore : Options : none : Nodes : all : Disk Group: RECO : Primary Volume : DATASTORE : Accelerator Volumes :
Device : /dev/asm/db_backup-445 : Mount Point : /db_backup : Options : none : Nodes : all : Disk Group: RECO : Primary Volume : DB_BACKUP : Accelerator Volumes :
Device : /dev/asm/delshare-445 : Mount Point : /delshare : Options : none : Nodes : all : Disk Group: RECO : Primary Volume : DELSHARE : Accelerator Volumes :
Device : /dev/asm/prdmgtshare-445 : Mount Point : /prdmgtshare : Options : none : Nodes : all : Disk Group: RECO : Primary Volume : PRDMGTSHARE : Accelerator Volumes :
Device : /dev/asm/rcocdbdev-445 : Mount Point : /u01/app/oracle/fast_recovery_area/rcocdbdev : Options : none : Nodes : all : Disk Group: RECO : Primary Volume : RCOCDBDEV : Accelerator Volumes :
Device : /dev/asm/vmsdev-445 : Mount Point : /u01/app/sharedrepo/vmsdev : Options : none : Nodes : all : Disk Group: RECO : Primary Volume : VMSDEV : Accelerator Volumes :
Device : /dev/asm/datastore-158 : Mount Point : /u01/app/oracle/oradata/datastore : Options : none : Nodes : all : Disk Group: REDO : Primary Volume : DATASTORE : Accelerator Volumes :
Device : /dev/asm/rdocdbdev-158 : Mount Point : /u01/app/oracle/oradata/rdocdbdev : Options : none : Nodes : all : Disk Group: REDO : Primary Volume : RDOCDBDEV : Accelerator Volumes :
If all the acfs mount points are mounted check the cluster status

root@ecl-odabase-0 ~]# /u01/app/18.0.0.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.crf
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.crsd
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.cssd
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.ctssd
      1        ONLINE  ONLINE       ecl-odabase-0            OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.drivers.oka
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.gipcd
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.gpnpd
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.mdnsd
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
ora.storage
      1        ONLINE  ONLINE       ecl-odabase-0            STABLE
--------------------------------------------------------------------------------
[root@ecl-odabase-0 ~]# ps -ef | grep pmon
Check the havip status and it shows the exportfs are not mounted.

####### HAVIP issue 

-- as root
/u01/app/18.0.0.0/grid/bin/srvctl config havip

[grid@ecl-odabase-0 trace]$ /u01/app/18.0.0.0/grid/bin/srvctl start havip -id havip_3 -n ecl-odabase-0
PRCE-1026 : Cannot start HAVIP resource without an Export FS resource.

Now let's check the log file for oda_base. This is where you can find the actual problem.

Log file : /opt/oracle/oak/log//odabaseagent/odabaseagent* Error message below:

 OAKERR8038 The filesystem could not be exported as a crs resource  
OAKERR:5015 Start repo operation has been disabled by flag

We can validate the mount nfs shares using under mention command.

showmount -e

3.4.1 Solution for shared repo issue.


 Let’s enable the shared repo from oda_base and start the reboot of the cluster in a rolling fashion. The better option is to stop the cluster and reboot the nodes from ilom.

 Meta Link note: Shared Repo Startup Fails with OAKERR:8038 and OAKERR:5015 on ODA 12.2.1.2.0 (Doc ID 2379347.1) 

 Known issues Link :

 https://docs.oracle.com/en/engineered-systems/oracle-database-appliance/18.8/cmtrn/issues-with-oda-odacli.html#GUID-5BA56322-127F-424F-8D1E-DEB3939CD60C

 [root@ecl-odabase-0 ~]# oakcli enable startrepo -node 0
Start repo operation is now ENABLED on node 0
[root@ecl-odabase-0 ~]# oakcli enable startrepo -node 1
Start repo operation is now ENABLED on node 1
[root@ecl-odabase-0 ~]#
 
oakcli show repo

Now only two components are left to patch 
  1.  Storage 
  2.  Database


3.5 Storage patching

Before storage patching makes sure to stop VM and repo's. 

script /tmp/output_storage_08202021.txt
/opt/oracle/oak/bin/oakcli update -patch version --storage

/opt/oracle/oak/bin/oakcli update -patch 18.8.0.0.0 --storage
Run below-mentioned command for verification.

===============================
After verification
===============================

[root@ecl-odabase-0 ~]# oakcli show version -detail
Reading the metadata. It takes a while...
System Version  Component Name            Installed Version         Supported Version
--------------  ---------------           ------------------        -----------------
18.8.0.0.0
                Controller_INT            4.650.00-7176             Up-to-date
                Controller_EXT            13.00.00.00               Up-to-date
                Expander                  001E                      Up-to-date
                SSD_SHARED {
                [ c1d20,c1d21,c1d22,      A29A                      Up-to-date
                c1d23,c1d44,c1d45,c1
                d46,c1d47 ]
                [ c1d16,c1d17,c1d18,      A29A                      Up-to-date
                c1d19,c1d40,c1d41,c1
                d42,c1d43 ]
                             }
                HDD_LOCAL                 A7E0                      Up-to-date
                HDD_SHARED {
                [ c1d24,c1d25,c1d26,      A3A0                      Up-to-date
                c1d27,c1d29,c1d30,c1
                d31,c1d32,c1d33,c1d3
                4,c1d35,c1d36,c1d37,
                c1d38,c1d39 ]
                [ c1d0,c1d1,c1d2,c1d      PD51                      Up-to-date
                3,c1d4,c1d5,c1d6,c1d
                7,c1d8,c1d9,c1d10,c1
                d11,c1d12,c1d13,c1d1
                4,c1d15,c1d28 ]
                             }
                ILOM                      4.0.4.52 r132805          Up-to-date
                BIOS                      30300200                  Up-to-date
                IPMI                      1.8.15.0                  Up-to-date
                HMP                       2.4.5.0.1                 Up-to-date
                OAK                       18.8.0.0.0                Up-to-date
                OL                        6.10                      Up-to-date
                OVM                       3.4.4                     Up-to-date
                GI_HOME                   18.8.0.0.191015           Up-to-date
                DB_HOME                   12.1.0.2.180717           12.1.0.2.191015
[root@ecl-odabase-0 ~]#

4. Post Patching Validation


Once this patching is complete. Validate the oda environment as mentioned below.
ps -ef | grep pmon - check database is up and running


ps -ef | grep pmon 
grid     22358     1  0 Sep10 ?        00:00:17 asm_pmon_+ASM1
grid     26041     1  0 Sep10 ?        00:00:17 apx_pmon_+APX1
oracle   93837     1  0 Sep13 ?        00:00:05 ora_pmon_clonedb1
root     98071 97908  0 14:02 pts/0    00:00:00 grep pmon

Also, execute oakcli show repo commands to validate running shared repositories and oakcli show vm to validate running VMS.



[root@ecl-odabase-0 ~]# oakcli show repo

          NAME                          TYPE            NODENUM  FREE SPACE     STATE           SIZE


          kali_test                     shared          0            94.74%     ONLINE          512000.0M

          kali_test                     shared          1            94.74%     ONLINE          512000.0M

          odarepo1                      local           0               N/A     N/A             N/A

          odarepo2                      local           1               N/A     N/A             N/A

          qualys                        shared          0            98.35%     ONLINE          204800.0M

          qualys                        shared          1            98.35%     ONLINE          204800.0M

          vmdata                        shared          0            99.99%     ONLINE          4068352.0M

          vmdata                        shared          1            99.99%     ONLINE          4068352.0M

          vmsdev                        shared          0            99.99%     ONLINE          1509376.0M

          vmsdev                        shared          1            99.99%     ONLINE 



[root@ecl-odabase-0 ~]# oakcli show vm

          NAME                                  NODENUM         MEMORY          VCPU            STATE           REPOSITORY

        kali_server                             0               4196M              2            OFFLINE         kali_test
        qualyssrv                               0               4196M              2  

No comments:

Post a Comment

Oracle world 2024 - AI

  Intro  The world is transitioning from the data era to the age of artificial intelligence. Many organizations are leveraging AI features t...