Friday, June 24, 2022

Azure DB template image - Part 1 (VM creation, Oracle grid and db software only installation)

 



Intro

In the current era, businesses are focusing on optimizing  IT infrastructure workloads adopting cloud models like AWS, Azure, GCP and OCI.  Cloud environments provide many on-demand resources which enable organizations to unleash their full potential to expand their market share.

Also moving to the cloud enables greater flexibility to control IT infrastructure costs. In any organization, major workloads are coming from databases. Cloud platforms have many services to get a clear understanding of cost predictions for heavy workloads. 

When moving to the cloud it's really important to compare costs and the features each cloud provides. Microsft Azure came to the cloud in an early stage and it is a strong platform to migrate application and database workloads. Moreover, Migrating databases to azure is challenging and the ongoing market is really hot.

Many businesses want to have multiple test environments before moving to prod changes. It's important to create custom images and build an environment quickly and efficiently. In this article, I will cover how we can create an oracle custom snapshot to create a VM.

Let's start with VM creation.

Create a VM in azure

Login to the Azure portal and select virtual machine



Azure work as an on-demand subscription model. Provide a suitable name for VM and Region (Better to go with the closest region to avoid network latency. Select the respective availability zone and the security type.

Here I'm going to use OEL 8.5 and the server model is standard D2s_v3 for this testing. Database workload it's recommended to go with E - Series. E- Series servers are memory-optimized servers to cater to database workloads. Also for the database servers, azure has a really good selection of memory-optimized VM templates.

Shared below mention links to get a better understanding of server versions and associate costs for each server.


Server Versions 

https://azure.microsoft.com/en-ca/pricing/details/virtual-machines/series/

Associate cost 

https://azure.microsoft.com/en-ca/pricing/details/virtual-machines/linux/#edv4-series




There are two options setup an administrator password.

  1. Setup user and a strong password.
  2. Use generated the ssh public key 




As this is oracle ASM (standalone installation ) I have added 4 disks including 2 disks for DATA (external redundancy) and one FRA. 



As this is a test I'm using public IP address. Make sure to create Virtual-network then only you can select this Virtual network.




The last step for VM creation check validation is passed.


Partitioning of disk and mount /u01


Execute lsblk to view all the attached disks and use fdisk commands to partition the disk. Once you format the partition disk to the ext4 file system make sure to add this to the disk to the partition table by executing partprob. Do not miss this step, if you missed this VM will not come up and will hand on the boot section.

These are the commands you needs to be execute to add partition to vm.

##### correct commands 
fdisk /dev/sdc1
mkfs.ext4 /dev/sdc1
partprobe /dev/sdc1
blkid
mkdir /u01
mount -a -- after adding this disk blkid to fstab

After creating a partition , you can use lsblk to validate partition


[azuser@localhost ~]$ lsblk
NAME               MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                  8:0    0   30G  0 disk
├─sda1               8:1    0  800M  0 part /boot
├─sda2               8:2    0 28.7G  0 part
│ ├─rootvg-rootlv  252:0    0 18.7G  0 lvm  /
│ └─rootvg-crashlv 252:1    0   10G  0 lvm  /var/crash
├─sda14              8:14   0    4M  0 part
└─sda15              8:15   0  495M  0 part /boot/efi
sdb                  8:16   0   75G  0 disk
└─sdb1               8:17   0   75G  0 part /mnt
sdc                  8:32   0  128G  0 disk
└─sdc1               8:33   0  128G  0 part 
sdd                  8:48   0  512G  0 disk
└─sdd1               8:49   0  512G  0 part
sde                  8:64   0  512G  0 disk


Sample blkid output


Execute partprobe command to add a disk to the partition table and use mkfs.ext4 to format the disk.  As the last step execute blkid to get the respective block id .




[root@oradb-01 ~]# blkid
/dev/mapper/rootvg-crashlv: UUID="cdf585f6-703a-4aef-b8de-404c9883a58e" BLOCK_SIZE="512" TYPE="xfs"
/dev/sde2: UUID="Z1I6oC-RWrU-DQPh-tkiS-VZP3-aGNb-khAaSc" TYPE="LVM2_member" PARTUUID="7239f924-9e5f-406f-b5b7-88862e86ea8d"
/dev/sde1: UUID="c5d1e180-5e1d-43c6-9ed6-c8ac37cb8061" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="83c7509b-9398-4ca4-847a-59f03e4f9577"
/dev/sde14: PARTUUID="1833132d-49f6-4ae0-861a-30eda86be687"
/dev/sde15: SEC_TYPE="msdos" UUID="3503-1054" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="0c2aee3b-7c66-42e5-91b3-52a8d2d7db16"
/dev/sdf1: UUID="06cd70bc-adec-4ab9-951e-eada6e45941d" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="619336a7-01"
/dev/sda1: UUID="6f9f562e-afe1-423e-8846-9ab001150f7c" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="727ac55d-01"
/dev/mapper/rootvg-rootlv: UUID="c866304a-09ef-4c3b-87cf-00f321a52356" BLOCK_SIZE="512" TYPE="xfs"
[root@oradb-01 ~]#

lsblk output mount /u01

 

[azuser@localhost ~]$ lsblk
NAME               MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                  8:0    0   30G  0 disk
├─sda1               8:1    0  800M  0 part /boot
├─sda2               8:2    0 28.7G  0 part
│ ├─rootvg-rootlv  252:0    0 18.7G  0 lvm  /
│ └─rootvg-crashlv 252:1    0   10G  0 lvm  /var/crash
├─sda14              8:14   0    4M  0 part
└─sda15              8:15   0  495M  0 part /boot/efi
sdb                  8:16   0   75G  0 disk
└─sdb1               8:17   0   75G  0 part /mnt
sdc                  8:32   0  256G  0 disk
└─sdc1               8:33   0  256G  0 part /u01
sdd                  8:48   0  512G  0 disk
└─sdd1               8:49   0  512G  0 part
sde                  8:64   0  512G  0 disk


Azure documentation is very well organized and crystal clear on ASM installation steps. Please find the below mention links from azure knowledge base for asm installation.

https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/oracle/configure-oracle-asm

Installed required RPMs for Oracle 19c database

Download and install oracle-preinstall-19c rpm and installed it on the server.


curl -o oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/getPackage/oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm
yum -y localinstall oracle-database-preinstall-19c-1.0-1.el7.x86_64.rpm

Installed required RPMs for oracle 19c grid


First installed oracle-database-preinstalled-19c rpm , Once the preinstalled is complete, Make sure to install required oracleasm RPMs , we need these RPMs for asm disk configuration.



[root@oradb-01 ~]# rpm -qa |grep oracleasm
kmod-redhat-oracleasm-2.0.8-12.2.0.1.el8.x86_64
oracleasm-support-2.1.12-1.el8.x86_64
oracleasmlib-2.0.17-1.el8.x86_64
[root@oradb-01 ~]#


Add required groups and users for the installation. Creating only oracle user for both grid and db installation



groupadd -g 54345 asmadmin 
groupadd -g 54346 asmdba 
groupadd -g 54347 asmoper 
useradd -u 3000 -g oinstall -G dba,asmadmin,asmdba,asmoper oracle 
usermod -g oinstall -G dba,asmdba,asmadmin,asmoper oracle

Configure ASM

After installing the asm , configure the ASM library for server startup.



[root@oradb-01 ~]# oracleasm configure -i
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting ENTER without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface []: oracle
Default group to own the driver interface []: asmadmin
Start Oracle ASM library driver on boot (y/n) [n]: Y
Scan for Oracle ASM disks on boot (y/n) [y]: Y
Writing Oracle ASM library driver configuration: done
[root@oradb-01 ~]#

Once the configuration is complete validate the asm status, Make sure the status is correct after restarting the oracleasm service, If there are any errors this going to impact on server startup.

Validate server status


[root@oradb-01 ~]# oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes
[root@oradb-01 ~]#

Validate from systemctl



[root@oradb-01 ~]# systemctl start oracleasm
[root@oradb-01 ~]# systemctl status oracleasm
● oracleasm.service - Load oracleasm Modules
   Loaded: loaded (/usr/lib/systemd/system/oracleasm.service; enabled; vendor preset: disabled)
   Active: active (exited) since Thu 2022-06-23 17:23:51 UTC; 5s ago
  Process: 16982 ExecStart=/usr/sbin/oracleasm.init start_sysctl (code=exited, status=0/SUCCESS)
 Main PID: 16982 (code=exited, status=0/SUCCESS)

Jun 23 17:23:50 oradb-01 systemd[1]: Starting Load oracleasm Modules...
Jun 23 17:23:50 oradb-01 oracleasm.init[16982]: Initializing the Oracle ASMLib driver: OK
Jun 23 17:23:51 oradb-01 oracleasm.init[16982]: Scanning the system for Oracle ASMLib disks: OK
Jun 23 17:23:51 oradb-01 systemd[1]: Started Load oracleasm Modules.


Create a directory for grid and DB installation



mkdir -p /u01/app/19.0.0.0/grid 
mkdir -p /u01/app/oracle/product/19.0.0/dbhome_1
chown oracle:oinstall /u01/app/19.0.0.0/grid
chown oracle:oinstall /u01/app/oracle/product/19.0.0/dbhome_1

19C grid installation

Here we will perform a software-only installation because we are going to use these images later to create new environments.

I have posted the screenshots for software-only installation. As per this figure select only the "Set up Software Only".






As per the below-mentioned figure select the respective ASM group for the installation.









As this is a test installation ignore the physical memory requirement.


Installation summary 


Execute root.sh scripts as root to complete the installation.



Expected root.sh outputs.



[root@oradb-01 ~]# /u01/app/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.
[root@oradb-01 ~]# /u01/app/19.0.0.0/grid/root.sh
Performing root user operation.

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/19.0.0.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

To configure Grid Infrastructure for a Cluster or Grid Infrastructure for a Stand-Alone Server execute the following command as oracle user:
/u01/app/19.0.0.0/grid/gridSetup.sh
This command launches the Grid Infrastructure Setup Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.

[root@oradb-01 ~]#


 DB Installation

As I mentioned earlier, We are going to use these images to create new database servers so we should perform a software-only installation.



Azure does not support oracle-RAC installation, so make sure to select a single instance only.



Select respective edition, Here I will select Enterprise Edition.


Specify the base directory 



Select required roles for installation.



I have not provided the sudo privileges to run the root.sh script.



Validate the pre-requisites page. I will create another blog to configure the swap mount in azure.



Execute root.sh as root to complete the installation.



Create a Snapshot of the disk

Once the grid and database software only installations are complete without any issue.
We need to power off the server to get the consistency of disk images.




We need to take two snapshots 
  1. OS disk
  2. Oracle binary installation disk

Under mentioned figure shows the create snapshot option after navigating to the disk.



Once the disk snapshots are created, navigate to the snapshot tab to verify the snap disk.




Conclusion

In this article, I have covered VM creation and installation of the grid and DB. Azure does not have template images with Oracle ASM installation. In part 2, I will cover the new VM creation using these snap disks.


Monday, June 6, 2022

ODA - 18.8.0.0 - OVM upgrade issue

 



ODA (Oracle Data Appliance) upgrades are challenging and need to be ready to spend more hours on upgrades due to unexpected failures.  As a proactive measurement, you should be ready to face these unexpected failures by creating a proactive SR with oracle. In this article, I will cover the upgrades issue encountered for 18.3.0.0 to 18.8.0.0.  The major component in the upgrading is the grid.

We encountered a major issue while upgrading the OVM on DOM0. Even if the component shows that the OVM version is up to date, oakcli orchestration tool revalidates this version by rerunning the required rpm on dom0 and the bug was hitting on opensm component , opensm is "an InfiniBand compliant Subnet Manager and Administration"


Using "oakcli update -patch 18.8.0.0.0 --verify" commands can verify the patch version after unpack.


###### localhost
[root@localhost ~]# oakcli update -patch  18.8.0.0.0 --verify
INFO: 2022-05-27 22:00:05: Reading the metadata file now...
                Component Name            Installed Version         Proposed Patch Version
                ---------------           ------------------        -----------------
                Controller_INT            4.650.00-7176             Up-to-date
                Controller_EXT            13.00.00.00               Up-to-date
                Expander                  0018                      001E
                SSD_SHARED {
                [ c1d20,c1d21,c1d22,      A29A                      Up-to-date
                c1d23 ]
                [ c1d16,c1d17,c1d18,      A29A                      Up-to-date
                c1d19 ]
                             }
                HDD_LOCAL                 A7E0                      Up-to-date
                HDD_SHARED {
                [ c1d0,c1d1,c1d2,c1d      PAG1                      PD51
                3,c1d4,c1d6,c1d7,c1d
                8,c1d9,c1d10,c1d12,c
                1d13,c1d14,c1d15 ]
                [ c1d5,c1d11 ]            PD51                      Up-to-date
                             }
                ILOM                      4.0.2.26.b r125868        4.0.4.52 r132805
                BIOS                      30130500                  30300200
                IPMI                      1.8.12.4                  1.8.15.0
                HMP                       2.4.1.0.14                2.4.5.0.1
                OAK                       18.3.0.0.0                18.8.0.0.0
                OL                        6.10                      Up-to-date
                OVM                       3.4.4                     Up-to-date
                GI_HOME                   18.3.0.0.180717           18.8.0.0.191015
                DB_HOME                   12.1.0.2.160119           12.1.0.2.191015
                ASR                       18.3.1                    19.4.0
[root@localhost ~]#

This OVM issue was directly impacting patching from 18.3.0.0 to 18.8.0.0, DOM0 patching failed when trying to upgrade to OVM to 3.4.4. While upgrading DOM0, stopping the opensm using the command "service opensmd stop" caused the process to not come up after the network service restart . Parameters are read from the “/etc/rc.local” file.

Below is the main server patching log.

Error 

Main Server Patching Log : 

This is the main server patch log and highlighted OVM update error for clarity.

2022-05-28 04:44:03: Executing cmd: /u01/app/18.0.0.0/grid/bin/crsctl check cluster
2022-05-28 04:44:03: Command output:
>  CRS-4535: Cannot communicate with Cluster Ready Services
>  CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
>  CRS-4534: Cannot communicate with Event Manager , 
>End Command output
2022-05-28 04:44:03: SUCCESS:  Successfully stopped the clusterware on local node
2022-05-28 04:44:03: Executing cmd: /opt/oracle/oak/bin/oakcli fstop oak
2022-05-28 04:44:06: Command output:
>  2022-05-28 04:44:03.583081370:[init.oak]:[Stopping oakd]
>  2022-05-28 04:44:06.640504545:[init.oak]:[Successfully stopped the oakd..] , 
>End Command output
2022-05-28 04:49:34: WARNING:  Unable to update the device OVM to:  3.4.4


DOM0 Log :

Dom0 log is trying to obtain a value from running the pgrep opensm command. Opensm services need to be up and running to obtain the value. These logs are useful to get a clear picture of the problem “/var/log/opensm.log” and “/var/log/messages”.


2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,732 root DEBUG Running: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm
2022-05-28 04:51:02,813 root DEBUG Failed to run the cmd: pgrep opensm

Solution

Make sure to backup the “/etc/rc.local” file before making any changes. New parameters “-y” to opensm service.

Use the below mention link to get an idea of the opensm parameters.


-y, --stay_on_fatal

This option will cause SM not to exit on fatal initialization issues: if SM discovers duplicated guides or a 12x link with lane reversal badly configured. By default, the SM will exit on these errors.




-- Before
[root@localhost patch]# cat /etc/rc.local | grep -i opensm
service opensmd stop
opensm -g 0x0010e00001889ad9 -W -p 15 -B
opensm -g 0x0010e00001889ada -W -p 0 -B

-- After 
[root@localhost patch]# cat /etc/rc.local | grep -i opensm
service opensmd stop
opensm -y -g 0x0010e00001889ad9 -W -p 15 -B
opensm -y -g 0x0010e00001889ada -W -p 0 -B


Conclusion

Before the reboot, I strongly recommend you run these commands in both the nodes' terminals before the restart. This helps to validate commands giving any errors.

Expected output :



-- Node01
[root@localhost patch]# opensm -y -g 0x0010e00001889ad9 -W -p 15 -B
-------------------------------------------------
OpenSM 3.3.19
 Reading Cached Option File: /etc/opensm/opensm.conf
-E- Parsing error in field guid, expected numeric input received: 0x0010e00001889ad9 0x0010e00001889ada
 Unrecognized token: "max_seq_redisc"
 Unrecognized token: "rereg_on_guid_migr"
 Unrecognized token: "aguid_inout_notice"
 Unrecognized token: "sm_assign_guid_func"
 Unrecognized token: "reports"
 Unrecognized token: "per_module_logging"
 Unrecognized token: "consolidate_ipv4_mask"
Command Line Arguments:
 Staying on fatal initialization errors
 Guid 0x10e00001889ad9
 Priority = 15
 Daemon mode
 Log File: /var/log/opensm.log
 
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

[root@localhost patch]# opensm -y -g 0x0010e00001889ada -W -p 0 -B
-------------------------------------------------
OpenSM 3.3.19
 Reading Cached Option File: /etc/opensm/opensm.conf
-E- Parsing error in field guid, expected numeric input received: 0x0010e00001889ad9 0x0010e00001889ada
 Unrecognized token: "max_seq_redisc"
 Unrecognized token: "rereg_on_guid_migr"
 Unrecognized token: "aguid_inout_notice"
 Unrecognized token: "sm_assign_guid_func"
 Unrecognized token: "reports"
 Unrecognized token: "per_module_logging"
 Unrecognized token: "consolidate_ipv4_mask"
Command Line Arguments:
 Staying on fatal initialization errors
 Guid 0x10e00001889ada
 Priority = 0
 Daemon mode
 Log File: /var/log/opensm.log
-------------------------------------------------
[root@pinode0 patch]#



12.1.2.10 - 12.1.2.12 - ODA upgrade - Pre-patching issue

 






ODA upgrades elevate technical knowledge to a different level for each upgrade. Each upgrade gives you a new obstacle to tackle in a limited time. 

While upgrading ODA from 12.1.2.10 to 12.1.2.12 found some unusual issue that was hitting during the pre-patching stage.

Patching was executed from the Master - node. The master node can be identified by executing the oakcli show ismaster


[root@piodadb1 ~]# oakcli show ismaster
OAKD is in Master Mode
[root@piodadb1 ~]#

Patching got stuck on the "/opt/oracle/oak/pkgrepos/System/12.1.2.12.0/bin/copydom0patch.py 12.1.2.12.0 http://192.168.18.1:7882" step. 

This step is used to copy the patch bundle into dom0  and extract the bundle. Once the extraction is complete same script will be used to apply the upgrade on dom0.

This was the log that was not moving for more than 45 min. 

Error: ODA_BASE



>End Command output
2022-01-22 11:26:52: Setting the DATA:RECO ratio to 43:57
2022-01-22 11:26:52: Executing cmd: /opt/oracle/oak/bin/oakcli modify config -DATA 43
2022-01-22 11:26:52: Executing cmd: python -c 'import sys;sys.path.append("/opt/oracle/oak/adapters/");import common_agentutils as cu;node=cu.getvmagentnodes();print("IP1:%s"%node[0]["IP"]);print("IP2:%s"%node[1]["IP"]);print("port1:%s"%node[0]["port"]);print("port2:%s"%node[1]["port"]);'
2022-01-22 11:26:52: Command output:
>  IP1:192.168.18.1
>  IP2:192.168.19.1
>  port1:7882
>  port2:7882 ,
>End Command output
2022-01-22 11:26:52: This is V4 machine
2022-01-22 11:26:52: Executing cmd: /opt/oracle/oak/pkgrepos/System/12.1.2.12.0/bin/copydom0patch.py 12.1.2.12.0 http://192.168.18.1:7882  -- Hanging for 45 min 

This is the part of the dom0 log that is located under /opt/oracle/oak/log/<hostname>/patch/oakpatching_*.log log updates were stuck after the extraction.

DOM0 : log



2000-09-21 09:38:32,942 root DEBUG Running: mkdir -p /tmp/otpatchdir
2000-09-21 09:38:32,942 root DEBUG Running: mkdir -p /tmp/otpatchdir
2000-09-21 09:38:32,942 root DEBUG Running: mkdir -p /tmp/otpatchdir
2000-09-21 09:38:33,006 root DEBUG Successfully run the command: mkdir -p /tmp/otpatchdir
2000-09-21 09:38:33,006 root DEBUG Successfully run the command: mkdir -p /tmp/otpatchdir
2000-09-21 09:38:33,006 root DEBUG Successfully run the command: mkdir -p /tmp/otpatchdir
2000-09-21 09:42:47,355 root DEBUG Running: cat /tmp/otpatchdir/* > /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar
2000-09-21 09:42:47,355 root DEBUG Running: cat /tmp/otpatchdir/* > /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar
2000-09-21 09:42:47,355 root DEBUG Running: cat /tmp/otpatchdir/* > /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar
2000-09-21 09:42:56,241 root DEBUG Successfully run the command: cat /tmp/otpatchdir/* > /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar
2000-09-21 09:42:56,241 root DEBUG Successfully run the command: cat /tmp/otpatchdir/* > /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar
2000-09-21 09:42:56,241 root DEBUG Successfully run the command: cat /tmp/otpatchdir/* > /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar
2000-09-21 09:42:56,242 root DEBUG Running: tar -xof  /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar  -C  /opt/oracle/oak
2000-09-21 09:42:56,242 root DEBUG Running: tar -xof  /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar  -C  /opt/oracle/oak
2000-09-21 09:42:56,242 root DEBUG Running: tar -xof  /opt/oracle/oak/pkgrepos/orapkgs/OVS/12.1.2.12.0/Base/OAKPatchBundle_12.1.2.12.0_170926.tar  -C  /opt/oracle/oak

Root Cause


In the script there is the exception clause with a sleep command after the extraction, In our case the system time of both ODA_BASE and DOM0 environment was different, somehow DOM0 time was Set to the year 2000 and ODA_BASE was in 2022.

 
Due to these different server times, the script was not moving and kept issuing the sleep function.



[root@localhost~]# cat /opt/oracle/oak/pkgrepos/System/12.1.2.12.0/bin/copydom0patch.py
#!/usr/bin/python
import os
import sys
import glob
import socket
import time
import httplib
from SimpleXMLRPCServer import SimpleXMLRPCServer
import xmlrpclib
from subprocess import Popen,PIPE

#TODO error handling
OAK_HOME = "/opt/oracle/oak"
version = sys.argv[1]
dom0addr = sys.argv[2]
fileloc = os.path.join(OAK_HOME, "pkgrepos", "orapkgs", "OVS", version, "Base/*")
filelist = glob.glob(fileloc)
srcfile = filelist[0]

tmppatchdest = os.path.join("/", "tmp", "otpatchdir")
tmppatchfile = os.path.join(tmppatchdest, "dom0patchfile.zip")

# create a temporary dir
cmd = "rm -rf " + tmppatchdest
proc = Popen(cmd, shell=True, stderr=PIPE, stdout=PIPE)
proc.wait()
cmd = "mkdir -p " + tmppatchdest
proc = Popen(cmd, shell=True, stderr=PIPE, stdout=PIPE)
proc.wait()

#split the patch file
cmd =  "split -b 25000000 -d "+ srcfile + " " +tmppatchfile
proc = Popen(cmd, shell=True, stderr=PIPE, stdout=PIPE)
proc.wait()

# get a server connection object
#proxy = xmlrpclib.ServerProxy("http://192.168.16.24:7881")
proxy = xmlrpclib.ServerProxy(dom0addr);

# bug20645053: First try to ping and see if the DOM0 agent is alive
# We do this only 3 times.


This was causing the issue due to different times on oda and dom0


count = 0
while (count < 2):
    try:
        print "trying to ping the dom0 agent"
        proxy.ping()
        count = 6
    except:
        time.sleep(60) <- This was causing the issue due to different times on oda and dom0
        count = count + 1

try:
    proxy.ping()
    print "successfully pinged the dom0 agent\n"
except (socket.error, xmlrpclib.Fault ,xmlrpclib.ProtocolError, xmlrpclib.ResponseError, httplib.InvalidURL), err:
    print "fatal error: could not ping DOM0 agent"
    print "exception %s", err
    sys.exit(1)

#create the destination area
# bug20645053: If we are unable to create a socket then exit
# with a "fatal error" message.
try:
    out = proxy.mkdest(tmppatchdest)
except (socket.error), err:
    print "fatal error: could not connect to the DOM0 agent"
    print "exception %s", err
    sys.exit(1)

#copy the patchfile to destination
globdir = os.path.join(tmppatchdest, "*")
for file in glob.glob(globdir):
    file = file.strip('\t\n\r')
    handle = open(file, "rb")
    out = proxy.copypatch(tmppatchdest, file, xmlrpclib.Binary(handle.read()))

# remove the  temporary dir
cmd = "rm -rf " + tmppatchdest
proc = Popen(cmd, shell=True, stderr=PIPE, stdout=PIPE)
proc.wait()


Conclusion

Before starting the update make sure to validate the server timing on all the nodes including dom0. I would recommend configuring NTP on all the servers which gives you better control over the server timing.  

Unified Auditing Housekeeping

  Intro  Data is the new currency. It is one of the most valuable organizational assets, however, if that data is not well...