Proactive Database Monitoring

Oracle Database makes it easy to monitor the health and performance of your database. It monitors the vital signs (or metrics) related to database health and performance, analyzes the workload running against the database, and automatically identifies any issues that need your attention as an administrator. The identified issues are presented as alerts and performance findings on the Database Home page. You can also configure Oracle Enterprise Manager Database Control (Database Control) to notify you of issues by e-mail.

This section discusses the following topics:

About Alerts
Performance Self-Diagnostics: Automatic Database Diagnostic Monitor
Monitoring General Database State and Workload
Managing Alerts

About Alerts

Alerts help you monitor your database. Most alerts notify you of when particular metric thresholds are exceeded. For each alert, you can set critical and warning threshold values. These threshold values are meant to be boundary values that when exceeded, indicate that the system is in an undesirable state. For example, when a tablespace becomes 97 percent full, this can be considered undesirable, and Oracle Database will generate a critical alert.

Other alerts correspond to database events such as Snapshot Too Old or Resumable Session suspended. These types of alerts indicate that the event has occurred.

In addition to notification, you can set alerts to perform some action such as running a script. For instance, scripts that shrink tablespace objects can be useful for a Tablespace Usage warning alert.

By default, Oracle Database issues several alerts, including the following:

Tablespace Space Used (warning at 85 percent full, critical at 97 percent full)
Current Open Cursors Count (warning when goes above 1200)
Session Limit Usage (warning at 90 percent, critical at 97 percent)
Broken Job Count and Failed Job Count (warning when goes above 0)
Dump Area Used (warning at 95 percent full)
Archive Area Used (warning at 80 percent full)

You can modify these alerts and others by setting their metrics.

For more information, see "Managing Alerts".

Performance Self-Diagnostics: Automatic Database Diagnostic Monitor

Oracle Database includes a self-diagnostic engine called Automatic Database Diagnostic Monitor (ADDM). ADDM makes it possible for Oracle Database to diagnose its own performance and determine how any identified problems can be resolved.

To facilitate automatic performance diagnosis using ADDM, Oracle Database periodically collects snapshots of the database state and workload. Snapshots are sets of historical data for specific time periods that are used for performance comparisons by ADDM. The default collection interval for a snapshot is one hour. Snapshots provide a statistical summary of the state of the system at a point in time. These snapshots are stored in Automatic Workload Repository (AWR), residing in the SYSAUX tablespace. The snapshots are stored in this repository for a set time (8 days by default) before they are purged to make room for new snapshots.

ADDM analyzes data captured in AWR to determine the major problems in the system, and in many cases, recommends solutions and quantifies expected benefits. ADDM analysis results are represented as a set of findings.

Generally, the performance problems that ADDM can call attention to include the following:

Resource contention (bottlenecks), such as when your database is using large amounts of CPU time or memory due to high load SQL statements
Poor connection management, such as when your application is making too many logins to the database
Lock contention in a multiuser environment, such as when one user process acquires a lock to safely update data in a table, causing other user processes that need to acquire locks against the same table to wait, resulting in a slower database performance

Monitoring General Database State and Workload

The Database Home page (Figure 10-1) enables you to monitor the state and workload of your database. It provides a central place for general database state information and is updated periodically.

Figure 10-1 Database Home Page

Description of "Figure 10-1 Database Home Page"

To monitor the general database state and workload:

Go to the Database Home page.

See "Accessing the Database Home Page".
(Optional) Click the Refresh button to update the information displayed.

By default, the Database Home page automatically refreshes every 60 seconds. You can prevent automatic refresh by selecting Manually in the View Data list at the top right-hand corner of the page. You must then click Refresh to view the latest information.

The date and time that data was last collected from the database is displayed to the left of the Refresh button.
Get a quick overview of the database state in the General section, which includes the following information:
- Status of the database instance, Up or Down
  
  Click the Status link to drill down to database availability details.
- Time the database was last started
- Instance name
- Oracle Database version
- Host name
  
  Click the Host link to drill down to host details.
- Listener name
  
  Click the Listener link to drill down to listener details.
Click View All Properties to see the Oracle home path and whether the database is read-only or read/write.
View CPU utilization in the Host CPU section, which includes the following information:
- Bar chart
  
  This chart shows the percentage of CPU time used by the database and other processes. The chart legend contains links for the database instance and for other CPU processes.
  
  Click the Other link in the chart legend to see how the utilization of CPU, memory, and disk I/O change over time.
  
  Click the instance name link in the chart legend to see the Top Activity page. It includes a graph of active sessions over time, details about SQL statements issued, and the most active sessions.
- CPU load
  
  This is the average number of processes waiting to be scheduled for the CPU in the previous minute.
  
  Click the Load link to see how the utilization of CPU, memory, and disk I/O change over time.
- Paging
  
  This is the number of memory pages (fixed-length block of instructions, data, or both) that are paged out (moved out of active memory) each second.
  
  Click the Paging link to see how the utilization of CPU, memory, and disk I/O change over time.
Investigate the Active Sessions section, where you can further explore the cause of performance problems, such as your database taking up most of the CPU time on the server. This section displays a bar graph with the following information:
- Waits
  
  This is the value for all wait classes combined, excluding user I/O and idle wait events. Wait classes are groupings of wait events based on the type of wait.
  
  If other processes are taking up most of your CPU time, then this indicates that some other application running on the database host computer could be causing performance problems.
  
  Click the Wait link to go to the Performance page to view potential problems inside and outside the database.
- User I/O
  
  This is the average number of active sessions waiting for user I/O. User I/O means that the workload originating from the user causes the database to read data from disk or write data to disk.
  
  Click the User I/O link to go to the Performance page to view potential problems inside and outside the database.
- CPU
  
  This is the average active sessions using CPU.
  
  Click the CPU link to see a chart showing more detailed information about active sessions over time.
View the Diagnostic Summary section, which includes the following information:
- ADDM Findings
  
  This shows the count of ADDM findings from the most recent ADDM run. Click the number adjacent to the ADDM Findings link to go to the ADDM page.
- Period Start Time
  
  This is the start time of the time period most recently analyzed by ADDM. It is shown only if there are ADDM findings.
- Alert Log
  
  This is the timestamp of the most recent alert log entry that describes an ORA- error.
  
  Click the Alert Log link to go to the Alert Log Errors page, which shows a list of log entries that contain errors.
- Active Incidents
  
  This shows the count of active incidents, which are occurrences of critical errors in the database. You are encouraged to investigate critical errors and report them to Oracle Support Services. Click the count to go to the Support Workbench home page.
- Database Instance Health
  
  Click Database Instance Health to display the Database Instance Health page, which includes graphical timelines of incidents, ADDM findings, and alerts. You can use these graphs for identifying correlations between incidents and alerts generated and performance issues on the system.
View the SQL Response Time section.

This is the current response time of a tracked set of SQL statements as compared to the response time for the reference collection. A reference collection, or SQL Tuning Set, is set of SQL statements that represents the typical SQL workload on your production system. If the current response time and reference collection response time are equal, then the system is running as it should. If the current response time is greater than the reference collection response time, then one or more SQL statements are performing more slowly than they should. The lower the current response time, the more efficiently the tracked SQL statements are running.

Click the SQL Response Time link to see response time metrics for the previous 24 hours. If the reference collection is empty, then click Reset Reference Collection to go to a page where you can create a reference collection.
View the Space Summary section.

If the number adjacent to the Segment Advisor Recommendations label is not zero, it means the Segment Advisor has identified candidate segments for space defragmentation. Click the number to view recommendations for how to defragment these segments.
View the Alerts section, which includes the following items:
- Category list
  
  Optionally choose a category from the list to view alerts only in that category.
- Critical
  
  This is the number of metrics that have exceeded critical thresholds plus the number of other critical alerts, such as those caused by incidents (critical errors).
- Warning
  
  This is the number of metrics that have exceeded warning thresholds.
- Alerts table
  
  Click the message to learn more about the alert.
View the ADDM Performance Analysis section, if present. This section contains the following items:
- Period Start Time
  
  This is the start time of the period most recently analyzed by ADDM.
- Period Duration in minutes
  
  This is the duration of the period most recently analyzed by ADDM.
- Instance name
- ADDM findings table
  
  This table lists the ADDM findings, their estimated impact on database performance, a description of the finding, and the number of times the finding occurred in snapshots collected during the previous 24 hours. For example, a finding with Occurrences listed as 34 of 43 has occurred in 34 of the last 43 snapshots.
Click the finding to view finding details, to view recommendations, and in some cases to implement recommendations or start advisors.

Description of the illustration perfanalysishpcr.gif

To view database performance over time:

At the top of the Database Home page, click Performance.

The Performance page appears, displaying a summary of CPU utilization, average active sessions, instance disk I/O, and instance throughput for the recent time period.
Use the Additional monitoring links to drill down to Top Activity and other data.

The types of actions you can take to improve host performance depends on your system, and can include eliminating unnecessary processes, adding memory, or adding CPUs.

Managing Alerts

The following topics describe how to manage alerts:

Viewing Metrics and Thresholds
Setting Metric Thresholds
About Responding to Alerts
Clearing Alerts
Setting Up Direct Alert Notification

Viewing Metrics and Thresholds

To effectively diagnose performance problems, statistics must be available. Oracle generates many types of cumulative statistics for the system, sessions, and individual SQL statements. Oracle also tracks cumulative statistics on segments and services. A metric is defined as the rate of change in some cumulative statistic. Metrics are computed and stored in Automatic Workload Repository, and are displayed on the All Metrics page, which can be viewed by clicking All Metrics under Related Links on the Database Home page.

To view metrics for your database:

On the Database Home page under Related Links, click All Metrics.

The All Metrics page appears.
Click a specific metric link.

A details page appears, with more information about the metric. Online Help for this page describes the metric.

Description of the illustration metrics_11g.gif

For each of these metrics, you are able to define warning and critical threshold values, and whenever the threshold is exceeded, Oracle Database issues an alert. Alerts are displayed on the Database Home page under the Alerts heading (or Related Alerts for nondatabase alerts such as a component of Oracle Net).

shows two warning alerts for the threshold Tablespace Space Used.

Figure 10-2 Alerts Section of Database Home Page

This image of the alerts section shows all current alerts.

Description of "Figure 10-2 Alerts Section of Database Home Page"

Setting thresholds is discussed in "Setting Metric Thresholds". Actions you might take to respond to alerts are discussed in "About Responding to Alerts".

When the condition that triggered the alert is resolved and the metric value is no longer outside the boundary, Oracle Database clears the alert. Metrics are important for measuring the health of the database and serve as input for self-tuning and recommendations made by Oracle Database advisors.

Setting Metric Thresholds

Oracle Database provides a set of predefined metrics, some of which have predefined thresholds. There may be times when you want to set thresholds for other metrics, or you want to alter existing threshold settings.

One means of setting a threshold is described in "Changing Space Usage Alert Thresholds for a Tablespace", where you set warning and critical thresholds on the amount of space consumed in a tablespace. A more general means of setting thresholds is available using the Edit Thresholds page.

To set metric thresholds:

Go to the Database Home page.

See "Accessing the Database Home Page".
Under the Related Links heading, click Metric and Policy Settings.

The Metric and Policy Settings page appears.

This page displays the existing thresholds for metrics and any response actions that have been specified.

Description of the illustration editthreshold_11g.gif
In the View list, do one of the following:
- Select Metrics with thresholds to view only those metrics with thresholds, either predefined by Oracle or previously set by you.
- Select All Metrics to view all metrics, whether or not they have thresholds defined.
To set or modify a warning threshold for a particular metric, enter the value you want in the Warning Threshold field for that metric.
To set or modify a critical threshold for a particular metric, enter the value you want in the Critical Threshold field for that metric.
To disable or reenable metric collection for a particular metric, or to change its collection schedule, complete the following steps:
1. Click the Collection Schedule link for the metric.
  
  The Edit Collection Settings page for that metric appears.
2. Click Disable to disable collection for this metric, or click Enable to enable it.
3. Choose the scale for your collection schedule from the Frequency Type list.
4. Enter a number in the Repeat Every field.
5. Do one of the following:
  - Click Continue to save your choices and return to the Metric and Policy Settings page.
  - Click Cancel to return to the Metric and Policy Settings page without saving your choices.
Click a single-pencil icon to use the Edit Advanced Settings page to make changes to Corrective Actions, (Monitoring) Template Override, and Advanced Threshold Settings.
Click a triple-pencil icon to set different threshold values for different instances of the object type being measured.

For example, for each tablespace you can set different warning and critical levels for the Tablespace Space Used metric.
Do one of the following:
- Click OK to save your changes and return to the Database Home page.
- Click Cancel to return to the Database Home page without saving your changes.

About Responding to Alerts

When you receive an alert, follow any recommendations it provides, or consider running ADDM or another advisor, as appropriate, to get more detailed diagnostics of system or object activity.

For example, if you receive a Tablespace Space Usage alert, you might take a corrective measure by running the Segment Advisor on the tablespace to identify possible objects for shrinking. You can then shrink the objects to create available (free) space. See "Reclaiming Unused Space".

Additionally, as a response, you can set a corrective script to run as described in "Setting Metric Thresholds".

Clearing Alerts

Most alerts are cleared (removed) automatically when the cause of the problem disappears. Other alerts, such as Generic Alert Log Error, are sent to you for notification and need to be acknowledged by the database administrator.

After taking the necessary corrective measures, you can acknowledge an alert by clearing or purging it. Clearing an alert sends the alert to the Alert History, which can be viewed from the Database Home page under Related Links. Purging an alert removes it from the Alert History.

To clear or purge an alert:

On the Database Home page under Diagnostic Summary, click the Alert Log link.

The Alert Log Errors page appears.
From the View Data list, select the period for which you want information.
Click Refresh to refresh the page with the latest information.
Do one of the following:
- Click Show Open Alerts to hide alerts that have been cleared.
- Click Show Open and Cleared Alerts to see all alerts.
Note:
You will see only one or the other of these buttons, depending on what is currently displayed.
Select one or more alerts by clicking their Select options.
Click Clear to clear the alert.
Click Purge to purge the alert.
Click Clear Every Open Alert to clear all open alerts.
Click Purge Every Alert to purge all alerts.

Setting Up Direct Alert Notification

Database Control displays all alerts on the Database Home page. However, you can optionally specify that Database Control provide you direct notification when specific alerts arise. For example, if you specify that you want e-mail notification for critical alerts, and you have a critical threshold set for the system response time for each call metric, then you might be sent an e-mail message similar to the following:

Host Name=mydb.us.example.com
Metric=Response Time per Call
Timestamp=08-NOV-2006 10:10:01 (GMT -7:00)
Severity=Critical
Message=Response time per call has exceeded the threshold. See the
latest ADDM analysis.
Rule Name=
Rule Owner=SYSMAN

The e-mail message contains a link to the host name and the latest ADDM analysis.

By default, alerts in critical state such as DB Down, Generic Alert Log Error Status, and Tablespace Space Used are set up for notification. However, to receive these notifications, you must set up your e-mail information.

To set up your e-mail information:

From any Database Control page, click the Setup link, which is visible in the header and footer areas.
On the Setup page, select Notification Methods.
Enter the required information into the Mail Server section of the Notifications Methods page. Click Help at the bottom of the page for assistance.

There are other methods of notification, including scripts and SNMP (Simplified Network Management Protocol) program interrupts (traps). The latter can be used to communicate with third-party applications.

At this point, you have set up a method of notification, but you have not set up an e-mail address to receive the notification. To do so, complete the following steps.
From any Database Control page, click the Preferences link, which is visible in the header and footer areas.
On the Preferences page, select General. Click Add Another Row in the E-mail Addresses section to enter your e-mail address.
Click Test to verify that e-mail messages can be sent using the specified information. After the test completes, click OK.
(Optional) To edit notification rules, such as to change the severity state for receiving notification, select Rules under the heading Notification on the left-hand side of the page.

The Notification Rules page appears. Click Help for more information about this page.