9 High Availability

Oracle Enterprise Manager Ops Center has several capabilities that can be used to recover data and resume functions if the Enterprise Controller system or a Proxy Controller system fail.

If you set up a High Availability configuration during the installation and configuration process, you can fail over to the standby Enterprise Controller if the active Enterprise Controller fails.

The following features and topics are covered in this chapter:

Introduction to High Availability
Enterprise Controller High Availability
Proxy Controller High Availability

Introduction to High Availability

Oracle Enterprise Manager Ops Center has several tools that can be used for disaster recovery. These tools let you preserve Oracle Enterprise Manager Ops Center data and functionality if the Enterprise Controller or Proxy Controller systems fail.

Some of the procedures described in this section use the ecadm command. See the Oracle Enterprise Manager Ops Center Feature Reference Guide for more information about this command.

On Oracle Solaris systems, this command is in the /opt/SUNWxvmoc/bin/ directory.
On Linux systems, this command is in the /opt/sun/xvmoc/bin/ directory.

Enterprise Controller High Availability

High Availability is a setup involving multiple Enterprise Controllers using Oracle Clusterware and a remote database. The active Enterprise Controller is used for all Oracle Enterprise Manager Ops Center operations. The standby Enterprise Controllers are configured as backups.

If the active Enterprise Controller must be taken offline, you can make another Enterprise Controller active. One of the standby Enterprise Controllers is also activated if the active Enterprise Controller fails.

Requirements for High Availability

Use two or more systems of the same model and configured identically:
- Processor class
- Operating system
- Oracle Enterprise Manager Ops Center software version, including updates
- Network interfaces that are cabled identically to the same subnets
Add an asset tag to identify the active Enterprise Controller and to distinguish it from the standby Enterprise Controller. You can add a tag by using the Edit Asset action.
Maintain the standby Enterprise Controller's system in the same way as the active Enterprise Controller. The active and standby Enterprise Controllers must use the same version of Oracle Enterprise Manager Ops Center software. If you cannot use the user interface to verify the installed software versions at the time that you need to transfer functions to the standby system, view the content of the /n1gc-setup/.version.properties file. The product.version property lists the specific revision level of the installed software. For example:
```
# cat /n1gc-setup/.version.properties
#Note: This file is created at build time.
#Wed Jun 30 15:28:45 PDT 2010
version=dev-ga
date=2010/06/30 15\:28
build.variation=xvmopscenter
product.version=2.6.0.1395
product.installLocation=/var/opt/sun/xvm/EnterpriseController_installer_2.6.0.1395
#
```
Verify that the product.version property lists the same version on the active and standby Enterprise Controllers before you perform a relocate procedure.

Limitations

User accounts and data that are not associated with Oracle Enterprise Manager Ops Center are not part of the relocate process. Only Oracle Enterprise Manager Ops Center data is moved between the active and standby Enterprise Controllers.
UI sessions are lost on relocate.
The EC HA configuration applies only to the Enterprise Controller and its co-located Proxy Controller and not to other standalone Proxy Controllers.

Proxy Controller High Availability

Each asset, such as a server or operating system, is managed by a specific Proxy Controller. If a Proxy Controller fails or is uninstalled, you are prompted to migrate assets to another Proxy Controller if one is available.

You can also manually move assets to a new Proxy Controller.

Enterprise Controller High Availability

You can use Oracle Clusterware and Oracle Real Application Cluster software to create a High Availability configuration. A High Availability configuration includes one active Enterprise Controller node and one or more standby Enterprise Controller nodes, all using an external database. If the active Enterprise Controller node fails, a standby node is made active, and a notification is sent to notify the user that the relocate has occurred.

Converting a Single Enterprise Controller to High Availability

If you are using a single configured Enterprise Controller, you can switch to a high availability configuration.

This procedure assumes that you have already installed and configured a single Enterprise Controller. If you have not installed and configured an Enterprise Controller, see the Oracle Enterprise Manager Ops Center Installation Guide for Oracle Solaris Operating System or the Oracle Enterprise Manager Ops Center Installation Guide for Linux Operating Systems for information on installing with High Availability.

Preparing for High Availability with Oracle Clusterware

Installing and configuring Oracle Clusterware is the first step in setting up High Availability in your environment.

Installing Oracle Clusterware

Install Oracle Clusterware in your environment. The Oracle Clusterware installation documentation is available at http://download.oracle.com/docs/cd/B28359_01/install.111/b28262/toc.htm for Oracle Solaris systems and at http://download.oracle.com/docs/cd/B28359_01/install.111/b28263/toc.htm for Linux systems.
If you are using a local database, switch to a remote database using the procedure in the Database Management chapter.

Making the Current Enterprise Controller the Active Node

Once your environment is prepared, configure the current Enterprise Controller as the active node.

To Make the Current Enterprise Controller the Active Node

Stop the Enterprise Controller using the ecadm command and the stop subcommand.

Use the ecadm command with the ha-configure-primary subcommand to configure the system as the active Enterprise Controller.

For example:

# ./ecadm ha-configure-primary
INFO: HAECClusterwareAdapter/doConfigurePrimary() Stopping Ops Center ...
INFO: HAECClusterwareAdapter/doConfigurePrimary() Ops Center stopped
INFO: HAECClusterwareAdapter/createActionScript() created Resource Action Script='/var/opt/sun/xvm/ha/EnterpriseController'
INFO: HAECClusterwareAdapter/doConfigurePrimary() created Clusterware Action Script='/var/opt/sun/xvm/ha/EnterpriseController'
INFO: HAECClusterwareAdapter/doConfigurePrimary() created Clusterware Resource='EnterpriseController'
INFO: HAECClusterwareAdapter/doHAStart() starting resource='EnterpriseController' on node='primary-system'
INFO: HAECClusterwareAdapter/doHAStart()statusSB='CRS-2672: Attempting to start 'EnterpriseController' on 'primary-system'
CRS-2676: Start of 'EnterpriseController' on 'primary-system' succeeded'
INFO: HAECClusterwareAdapter/doHAStart() started resource='EnterpriseController' on node='primary-system'
INFO: HAECClusterwareAdapter/doConfigurePrimary() Ops Center started on node='primary-system'
ecadm:    --- Enterprise Controller successfully configured HA primary node
#

Installing a Standby Node

Once you have configured one Enterprise Controller as the active node, you can install and configure standby nodes.

To Install the Enterprise Controller on a Standby Node

If you are installing on Oracle Solaris 11, and if the system requires an HTTP proxy to reach the Internet, set the http_proxy and https_proxy environment variables using the following format:
- http_proxy: <protocol>://<host>:<port> - This variable specifies the proxy server to use for HTTP.
- https_proxy: <protocol>://<host>:<port> - This variable specifies the proxy server to use for HTTPS.
Create a temporary directory on your system, then copy or move the appropriate Oracle Enterprise Manager Ops Center archive for your system from delivery media to the temporary directory that you created. For example:
```
# mkdir /var/tmp/OC
# cp enterprise-controller.SunOS.i386.12.1.0.2001.tar.zip /var/tmp/OC
```
The installation archive consumes about 3.5 GBytes of disk space.
Change to the directory where the installation archive is located on your system.
```
# cd /var/tmp/OC
#
```
Expand the installation archive, then list the contents of the expanded directory.
- If your installation archive has the .tar.zip extension, use the unzip and tar commands to uncompress and un-tar the archive, then list the contents of the temporary directory.
  
  For example:
```
# gzcat enterprise-controller.Solaris.i386.12.1.0.2001.tar.zip | tar xf -
# ls
enterprise-controller.Solaris.i386.12.1.0.2001.tar.gz
xvmoc_full_bundle
#
```
- If your installation archive has the .zip extension, use the unzip command to uncompress the archive. For example:
```
# unzip enterprise-controller.Solaris.i386.12.1.0.2001.zip
# ls
enterprise-controller.Solaris.i386.12.1.0.2001.zip
xvmoc_full_bundle
#
```
Create a remote database properties file on the Enterprise Controller system. The remote database properties file must contain the location of the remote database and a user name and password that can access the database.

For example:
```
# vi /var/tmp/OC/DB/RemoteDBProps.txt
mgmtdb.appuser=user
mgmtdb.password=userpass
mgmtdb.roappuser=user
mgmtdb.ropassword=userpass
mgmtdb.dburl=jdbc:oracle:thin:@<database hostname>:<port>/<database service name>
```
Change directory to xvmoc_full_bundle, and run the install script with the --remoteDBprops <path to remote database properties file> and --StandbyEC options. Each installation archive only contains an install script that is appropriate for its intended OS and platform. For example:
```
# cd xvmoc_full_bundle
# ./install -s --remoteDBprops=/<Path to remote database properties file>/remoteDbCreds.txt --StandbyEC
```

The Oracle Configuration Manager installation text is displayed. Enter the My Oracle Support user name or email address that you want to associate with Oracle Enterprise Manager Ops Center.

Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:

If you want security updates to appear on your My Oracle Support page, enter your My Oracle Support password.

Provide your My Oracle Support password to receive security updates via your My Oracle Support account.
Password (optional):

The screen clears, then the install script displays a list of installation tasks that automatically updates as the installation proceeds. For example:

Ops Center Enterprise Controller Installer
(version 12.1.0.0 on SunOS)

1. Check for installation prerequisites. [Completed]
2. Configure file systems. [Completed]
3. Install prerequisite packages. [Not Completed]
4. Install Agent components. [Not Completed]
5. Create Deployable Proxy Bundles. [Not Completed]
6. Install application packages. [Not Completed]
7. Run postinstall tasks. [Not Completed]
8. Install Expect. [Not Completed]
9. Install IPMI tool. [Not Completed]
10. Set database credentials. [Not Completed]
11. Install and configure Oracle Database. [Not Completed]
12. Install Service container components. [Not Completed]
13. Install Core Channel components. [Not Completed]
14. Install Proxy Core components. [Not Completed]
..........................
19. Initialize and start services. [Not Completed]
(2 of 19 Completed)

Executing current step: Install prerequisite packages...

Review and correct any problems when the install script checks for installation prerequisites that are not met. For example, this install script detected insufficient disk space:
```
Warning for Step: Check for installation prerequisites.
The following is a portion of the installer
log which may indicate the cause of the warning.
If this does not indicate the cause of the
warning, you will need to view the full log
file. More information on how to do that is
available below.
You may choose to ignore this warning by selecting to continue.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Ignoring job: 01checkRPMs.pl
Ignoring job: 03removeEmptyDirs.pl


Executing job: jobs/00checkPrereqs.pl --install

WARNING: Installation prerequisites not met:
Disk: / 72G needed, 24G available.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Please fix the problem and then try this step again.
For a full log of the failed install see the file: /var/tmp/installer.log.9361.

t. Try this step again (correct the failure before proceeding)
c. Continue (ignore the warning)
x. Exit
Enter selection: (t/c/x)
```
You can enter t to try again, c to continue and ignore the warning, or x to exit the install script. Typically you would exit the install script, correct the problem, and then run the install script again, which resumes from where it stopped. Choose to continue and ignore the warning only if you accept the impact that the error condition will have on your installation. Entering t typically produces the same error, unless you are able to correct the problem before trying the step again. If the install script finds that all prerequisites have been satisfied, or if you choose to continue despite the warning, the install script continues and installs all Enterprise Controller and Proxy Controller components.

When complete, the install script displays a confirmation that all components have been installed. The /var/tmp/installer.log.latest file contains the installation log.

Create a password file containing the root user name and password for the active Enterprise Controller. For example:

# touch /tmp/creds.props
# chmod 400 /tmp/creds.props
# vi /tmp/creds.props
# cat /tmp/creds.props
username:root
password:XXXXX

Use the ecadm command with the ha-configure-standby and -p <password file> subcommands to configure the node as a standby node.

For example:

# ecadm ha-configure-standby -p /tmp/creds.props
INFO: HAECClusterwareAdapter/doConfigureStandby() Stopping Ops Center ...
INFO: HAECClusterwareAdapter/doConfigureStandby() Ops Center stopped
INFO: remoteFileCopy() copied '/etc/passwd' from remoteHostname='primary-system' to local file='/tmp/activeNodepw'
<output omitted>
ecadm:    --- Enterprise Controller successfully configured HA standby node

Use the ecadm command with the ha-status -d option to check the status of the standby Enterprise Controller.

For example:

# ecadm ha-status -d
INFO: HAECClusterwareAdapter/doHAStatus() Status:
# HAEC Cluster Info: Thu Sep 29 15:49:09 MDT 2011
haec.cluster.active.node=primary
haec.cluster.nodes=standby, primary
haec.ec.public.nics=nge1
haec.ec.status=ONLINE
<output omitted>
haec.cluster.script=/var/opt/sun/xvm/ha/EnterpriseController
haec.cluster.crsctl=/u01/app/11.2.0/grid/bin/crsctl
# End of Cluster Info
ecadm:    --- Enterprise Controller ha-status command succeeded
Status stored in file: /var/opt/sun/xvm/ha/HAECStatus
#

Converting a High Availability Configuration to a Single Enterprise Controller

You can convert your High Availability configuration to a single Enterprise Controller.

To Convert a High Availability Configuration to a Single Enterprise Controller

As root, log on to each standby Enterprise Controller node.
On each standby Enterprise Controller node, use the ecadm command with the ha-unconfigure-standby subcommand to remove the node from the High Availability configuration.

The node is removed from the cluster.
As root, log on to the active Enterprise Controller node.
Use the ecadm command with the stop-no-relocate subcommand to stop the active node without bringing up a new node.

The active Enterprise Controller node is stopped.
Use the ecadm command with the ha-unconfigure-primary subcommand to unconfigure the Enterprise Controller as part of a High Availability configuration.

The active Enterprise Controller node is unconfigured as the active node.
Use the ecadm command with the start subcommand to start the active node.

The Enterprise Controller is restarted.

Performing a Manual Relocate

You can manually relocate from the current Enterprise Controller to a standby Enterprise Controller.

To Manually Cause a Relocate

As root, log in to the active Enterprise Controller node.
Use the ecadm command with the ha-relocate subcommand to switch to a different node.

Another node is activated and the current node is switched to standby mode.

Managing HA Network Resources

Oracle Clusterware provides support for one network address known as the Single Client Access Name (SCAN). However, in some deployments, systems must communicate with the Enterprise Controller on a network separate from the SCAN network.

You can add and manage network resources for high availability using the Clusterware crsctl command.

For more information about these commands, and information about deleting, starting, stopping, or checking the status of network resources, see the Oracle Clusterware Administration and Deployment Guide 11g Release 2.

Adding a Network Resource

You can add a network resource using the crsctl command.

To add a network resource, run the crsctl add resource command with the following format:

/u01/app/11.2.0/grid/bin/crsctl add resource <resource name> -type application -attr ACTION_SCRIPT=/u01/app/11.2.0/grid/bin/usrvip, USR_ORA_NETMASK=<netmask>,USR_ORA_VIP=<vip IP address>,USR_ORA_START_TIMEOUT=0,USR_ORA_STOP_TIMEOUT=0,USR_ORA_STOP_MODE=immediate,USR_ORA_IF=<network interface>,USR_ORA_OPI=false,USR_ORA_CHECK_TIMEOUT=0,USR_ORA_DISCONNECT=false,USR_ORA_PRECONNECT=none,HOSTING_MEMBERS=<node1>:<node2>

The following options are included in this format:

<resource name> - Specifies the resource name.
-type application
USR_ORA_IF=<network interface> - Specifies the network interface (NIC) for the network resource.
USR_ORA_VIP= <ipaddress> - Specifies the IP address for the network resource.
USR_ORA_NETMASK=<netmask> - Specifies the netmask for the network resource.
USR_ORA_IF=<network interface> - Specifies the network interface (NIC) for the network resource.
HOSTING_MEMBERS=<node1>:<node2> - Specifies the cluster nodes hosting the Enterprise Controller.
ACTION_SCRIPT=/u01/app/11.2.0/grid/bin/usrvip
PLACEMENT=favored

Modifying a Network Resource

You can modify an existing network resource using the crsctl command.

To add a network resource, run the crsctl modify resource command with the following format:

./crsctl modify resource <resource name> -attr <attribute>=<new value>, <attribute>=<new value>,...

The following attributes can be modified:

USR_ORA_IF=<network interface> - Specifies the network interface (NIC) for the network resource.
USR_ORA_VIP= <ipaddress> - Specifies the IP address for the network resource.
USR_ORA_NETMASK=<netmask> - Specifies the netmask for the network resource.
USR_ORA_IF=<network interface> - Specifies the network interface (NIC) for the network resource.
HOSTING_MEMBERS=<node1>:<node2> - Specifies the cluster nodes hosting the Enterprise Controller.
ACTION_SCRIPT=/u01/app/11.2.0/grid/bin/usrvip
PLACEMENT=favored

Removing a Standby Enterprise Controller Node

You can remove a standby Enterprise Controller node from the cluster.

To Remove a Standby Enterprise Controller Node

As root, log on to the standby Enterprise Controller node.
Use the ecadm command with the ha-unconfigure-standby subcommand to remove the node from the High Availability configuration.

The node is removed from the cluster. You can uninstall the Enterprise Controller on the node using the normal Enterprise Controller uninstall procedure.

Checking the Status of the Enterprise Controller Cluster

You can check the status of the cluster from any Enterprise Controller node.

To Check the Status of the Enterprise Controller Cluster

As root, log on to an Enterprise Controller node.
Use the ecadm command with the ha-status subcommand and the -d option to check the status of the cluster.

The node's status is displayed.

Shutting Down the Enterprise Controller Temporarily Without Relocating

You can stop the active node without making a different node active. The user interface and the command-line interface will be unusable while all Enterprise Controller nodes are shut down.

To Temporarily Shut Down the Active Enterprise Controller Without a Relocate

As root, log on to the active Enterprise Controller node.
Use the ecadm command with the stop-no-relocate subcommand to stop the active node without bringing up a new node.

The active node is stopped.
Use the ecadm command with the start subcommand to start the active node.

The active node is restarted.

Accessing the Cluster Management UI

You can view the cluster configuration from the user interface.

To Access the Cluster Management UI

Click the Enterprise Controller in the Administration section of the Navigation pane.
Click Manage Cluster Configuration in the Actions pane.

The Cluster Management UI is displayed.

Proxy Controller High Availability

Each asset is managed by a specific Proxy Controller. If a Proxy Controller fails or is uninstalled, you will be notified and given the option to migrate the failed Proxy Controller's assets to another Proxy Controller. You can also move an asset from one functional Proxy Controller to another.

To migrate an asset to a new Proxy Controller, the destination Proxy Controller must either be connected to the networks of the assets being moved, or be associated with those networks and have them enabled.

Migrating Assets Between Proxy Controllers

Each asset is managed by a Proxy Controller. You can migrate an asset from one functional Proxy Controller to another to balance job load or if you intend to uninstall a Proxy Controller.

To Migrate Assets Between Proxy Controllers

Select the source Proxy Controller in the Administration section of the Navigation pane.
Click the Managed Assets tab.
Select one or more assets to move, then click the Migrate Assets icon.

If another Proxy Controller is available that can manage the assets, the Asset Migration wizard is displayed.

If no other Proxy Controller is available that can manage the assets, an error message is displayed.
Select the destination Proxy Controller from the list of Proxy Controllers, or select Auto Balance across Proxy Controllers to automatically select a destination Proxy Controller.
Click Migrate.

A job is launched to migrate the selected assets to the destination Proxy Controller. The migration status is displayed in the job and in the Managed Assets tab.

Migrating Assets from a Failed Proxy Controller

Each asset is managed by a Proxy Controller. If a Proxy Controller fails, an alert is sent giving you the option of migrating assets from the failed Proxy Controller to another Proxy Controller.

If you expect the Proxy Controller to come back online, leave the assets under its management. However, if you expect the Proxy Controller not to come back online, you can migrate them to another available Proxy Controller. This action also removes the Proxy Controller.

To Migrate Assets from a Failed Proxy Controller

Open the alert indicating that a Proxy Controller has failed.
Click Migrate Assets.

If another Proxy Controller is available that can manage the assets, the Asset Migration wizard is displayed.

If no other Proxy Controller is available that can manage the assets, an error message is displayed.
Select the destination Proxy Controller from the list of Proxy Controllers, or select Auto Balance across Proxy Controllers to automatically select a destination Proxy Controller.
Click Migrate.

A job is launched to migrate the selected assets to the destination Proxy Controller. The migration status is displayed in the job and in the Managed Assets tab.