3 Oracle High Availability Service

This chapter provides details on the Oracle High Availability Service metrics.

Oracle High Availability Service Alert Log

This section provides details on the Oracle high availability service alert log metrics.

Alert Log Name

Shows the name and full path of the Cluster Ready Services (CRS) alert log.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Data Source

Not Available

User Action

Specific to your site.

Oracle High Availability Service Alert Log Error

Collects CRS-1012, CRS-1201, CRS-1202 and CRS-1401, CRS-1402, CRS-1602 and CRS-1603 messages in the Cluster Ready Services (CRS) alert log at the host level.

CRS-1201, CRS-1401, CRS-1012 alert log messages trigger warning alerts.

CRS-1202, CRS-1402, CRS-1602 and CRS-1603 alert log messages trigger critical alerts.

Metric Summary

The following table shows how often the metric's value is collected and compared against the default thresholds. The 'Consecutive Number of Occurrences Preceding Notification' column indicates the consecutive number of times the comparison against thresholds should hold TRUE before an alert is generated.

Table 3-1 Metric Summary Table

Target Version	Evaluation and Collection Frequency	Upload Frequency	Operator	Default Warning Threshold	Default Critical Threshold	Consecutive Number of Occurrences Preceding Notification	Alert Text
All Versions	Every 5 Minutes	After Every Sample	MATCH	CRS-(1201\|1401\|1012)	CRS-(1202\|1402\|1602\|1603)	1*	%clusterwareErrStack% See %alertLogName% for details.

* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.

If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.

Data Source

Not Available

User Action

Specific to your site.

CRS Resource Alert Log Error

Collects CRS-1203, CRS-1205 and CRS-1206 messages in the Cluster Ready Services (CRS) alert log at the host level and issues 'CRS Resource Alert Log Error' alerts at critical level.

Metric Summary

Table 3-2 Metric Summary Table

Target Version	Evaluation and Collection Frequency	Upload Frequency	Operator	Default Warning Threshold	Default Critical Threshold	Consecutive Number of Occurrences Preceding Notification	Alert Text
All Versions	Every 5 Minutes	After Every Sample	MATCH	Not Defined	CRS-120(3\|5\|6)	1*	%resourceErrStack% See %alertLogName% for details.

* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.

If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.

Data Source

Not Available

User Action

Specific to your site.

OCR Alert Log Error

Collects CRS-1009 messages in the Cluster Ready Services (CRS) alert log at the host level and issues 'OCR Alert Log Error' type alerts. OCR refers to Oracle Cluster Registry.

Metric Summary

Table 3-3 Metric Summary Table

Target Version	Evaluation and Collection Frequency	Upload Frequency	Operator	Default Warning Threshold	Default Critical Threshold	Consecutive Number of Occurrences Preceding Notification	Alert Text
All Versions	Every 5 Minutes	After Every Sample	MATCH	Not Defined	CRS-1009	1*	%ocrErrStack% See %alertLogName% for details.

* Once an alert is triggered for this metric, it must be manually cleared.

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Time/Line Number" object.

If warning or critical threshold values are currently set for any "Time/Line Number" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Time/Line Number" object, use the Edit Thresholds page.

Data Source

Not Available

User Action

Specific to your site.

Alert Log Name

This metric collects certain Cluster Ready Services (CRS) error messages and issues either WARNING or CRITICAL alerts based on the error codes.

CRS Nodeapp Status

This metric monitors the status of the following: Node Applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS).

Nodeapp Status

Monitors the status of the following: Node Applications (nodeapps), Virtual Internet Protocol (IP), Global Services Daemon (GSD), and Oracle Notification System (ONS). A critical alert is raised for the nodeapp if its status is 'OFFLINE NOT RESTARTING'. A warning alert is raised for the nodeapp if its status is either 'UNKNOWN or OFFLINE'.

Metric Summary

Table 3-4 Metric Summary Table

Target Version	Evaluation and Collection Frequency	Upload Frequency	Operator	Default Warning Threshold	Default Critical Threshold	Consecutive Number of Occurrences Preceding Notification	Alert Text
All Versions	Every 5 Minutes	After Every Sample	MATCH	UNKNOWN\|OFFLINE	OFFLINE NOT RESTARTING	1	CRS resource %nodeapps% is %status%

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Nodeapp" object.

If warning or critical threshold values are currently set for any "Nodeapp" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Nodeapp" object, use the Edit Thresholds page.

Data Source

Not Available

User Action

Refer to the Real Application Clusters Administration and Deployment Guide for Node Applications startup and troubleshooting information.

CRS Virtual IP Relocation Status

This metric monitors whether there is a Virtual Internet Protocol (IP) relocation taking place. When a Virtual IP is relocated from the host (node) on which it was originally configured, a critical alert is generated.

Current Node

Shows the current host (node) on which the Virtual Internet Protocol is configured.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Data Source

Not available

User Action

None.

Virtual IP Relocated

Shows whether the Virtual Internet Protocol has relocated from the host (node) where it was originally configured. The value is TRUE if relocation happened. Otherwise it is FALSE. When the value is TRUE, a critical alert is raised.

Metric Summary

Table 3-5 Metric Summary Table

Target Version	Evaluation and Collection Frequency	Upload Frequency	Operator	Default Warning Threshold	Default Critical Threshold	Consecutive Number of Occurrences Preceding Notification	Alert Text
All Versions	Every 5 Minutes	After Every Sample	=	Not Defined	TRUE	1	CRS resource %vip% was relocated to %current_node%

Multiple Thresholds

For this metric you can set different warning and critical threshold values for each "Virtual IP Name" object.

If warning or critical threshold values are currently set for any "Virtual IP Name" object, those thresholds can be viewed on the Metric Detail page for this metric.

To specify or change warning or critical threshold values for each "Virtual IP Name" object, use the Edit Thresholds page.

Data Source

Not available

User Action

Specific to your site.

Response

This metric provides the status of the host, that is, whether it is up or down.

Status

The metric indicates whether the host is reachable or not. A host could be unreachable due to various reasons. The network is down or the Management Agent on the host is down (which could be because the host itself is shutdown).

Incident

This category of metrics provides information on the Incident target

Alert Log Error Trace File

The alert log error trace file is the name of an associated server trace file generated when the problem generating this incident occurred. If no additional trace file was generated, this field will be blank.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version	Collection Frequency
All Versions	Every 5 Minutes

Data Source

The alert log error trace file name is extracted from the database alert log.

User Action

The alert log error trace file name is provided so that the user can look in this file for more information about the problem that occurred.

Alert Log Name

The fully specified (includes directory path) name of the current XML alert log file.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

This name is retrieved by searching the OMS ADR_HOME/alert directory for the most recent (current) log file.

User Action

The alert log file name is provided so that the user can look in this file for more information about the problem that occurred.

ECID

The Execution Context ID (ECID) tracks requests as they move through the application server. This information is useful for diagnostic purposes because it can be used to correlate related problems encountered by a single user attempting to accomplish a single task.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

The ECID is extracted from the database alert log.

User Action

Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench. When packaging problems using Support Workbench, the ECID will be used by Support Workbench to correlate and include any additional problems in the package.

Impact

An optional field (may be empty) assessing the impact of the problem that occurred.

Metric Summary

The following table shows how often the metric's value is collected.

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

The impact is extracted from the database alert log.

User Action

This field is purely informational. Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench.

Incident ID

The Incident ID is a number that uniquely identifies a diagnostic incident (single occurrence of a problem).

Metric Summary

The following table shows how often the metric's value is collected.

Target Version	Collection Frequency
All Versions	Every 15 Minutes

Data Source

The incident ID is extracted from the database alert log.

User Action

Diagnostic incidents usually indicate software errors and should be reported to Oracle using the Enterprise Manager Support Workbench. Problems are one or more occurrences of the same incident. Using Support Workbench, the incident ID can be used to select the correct Problem to package and send to Oracle. Using the command line tool ADRCI, the incident ID can also be used with the show incident command to get details about the incident.

Generic Incident Status

This metric reflects the number of Generic Incident incidents witnessed the last time Enterprise Manager scanned the alert log.

Metric Summary for Database Control and Cloud Control

Target Version	Evaluation and Collection Frequency	Upload Frequency	Operator	Default Warning Threshold	Default Critical Threshold	Consecutive Number of Occurrences Preceding Notification	Alert Text
11.1.0.x; 11.2.0.x	Every 5 Minutes	After Every Sample	>	Not Defined	0	1	%value% distinct types of incidents have been found in the alert log.

Data Source

Incident metric

User Action

User Support Workbench in Enterprise Manager to examine the details of the incidents.

Generic Internal Error Status

This metric reflects the number of Generic Internal Error incidents witnessed the last time Enterprise Manager scanned the alert log.

Metric Summary for Database Control and Cloud Control

Target Version	Evaluation and Collection Frequency	Upload Frequency	Operator	Default Warning Threshold	Default Critical Threshold	Consecutive Number of Occurrences Preceding Notification	Alert Text
11.1.0.x; 11.2.0.x	Every 5 Minutes	After Every Sample	>	Not Defined	0	1	Generic internal errors have been found in the alert log.

Data Source

Incident metric

User Action

User Support Workbench in Enterprise Manager to examine the details of the incidents.