A Administering Oracle Database on AIX

This appendix contains information about administering Oracle Database on AIX. It contains the following sections:

Memory and Paging
Disk I/O Issues
CPU Scheduling and Process Priorities
Backing Up Raw Devices
Oracle Real Application Clusters
Address Models on AIX
AIX Kernel Modes
AIX Dynamic Logical Partitioning (DLPAR)

Memory and Paging

Memory contention occurs when processes require more memory than is available. To cope with the shortage, the system pages programs and data between memory and disks.

Controlling Buffer-Cache Paging Activity

Excessive paging activity decreases performance substantially. This can become a problem with database files created on journaled file systems (JFS and JFS2). In this situation, a large number of SGA data buffers might also have analogous file system buffers containing the most frequently referenced data. The behavior of the AIX file buffer cache manager can have a significant impact on performance. It can cause an I/O bottleneck, resulting in lower overall system throughput.

On AIX, tuning buffer-cache paging activity is possible but you must do it carefully and infrequently. Use the /usr/sbin/vmo command to tune the following AIX system parameters:

Parameter	Description
minfree	The minimum free-list size. If the free-list space in the buffer falls below this size, the system uses page stealing to replenish the free list.
maxfree	The maximum free-list size. If the free-list space in the buffer exceeds this size, the system stops using page stealing to replenish the free list.
minperm%	Specifies the point (in percentage of total number of memory frames) below which the page-stealer will steal file or computational pages regardless of repaging rates.
maxperm%	Specifies the point (in percentage of total number of memory frames) above which the page-stealing algorithm steals only file pages.

See Also:

For more information about AIX system parameters, see the AIX 5L Performance Management Guide.

Tuning the AIX File Buffer Cache

The purpose of the AIX file buffer cache is to reduce disk access frequency when journaled file systems are used. If this cache is too small, disk usage increases and potentially saturates one or more disks. If the cache is too large, memory is wasted.

See Also:

For more information about the implications of increasing the AIX file buffer cache, see "Controlling Buffer-Cache Paging Activity".

You can configure the AIX file buffer cache by adjusting the minperm% and maxperm% parameters. In general, if the buffer hit ratio is low (less than 90 percent), as determined by the sar -b command, increasing the minperm% parameter value might help. If maintaining a high buffer hit ratio is not critical, decreasing the minperm% parameter value increases the physical memory available. Refer to the AIX documentation for more information about increasing the size of the AIX file buffer cache.

The performance gain cannot be quantified easily, because it depends on the degree of multiprogramming and the I/O characteristics of the workload.

Tuning the minp erm% and maxperm% Parameters

AIX provides a mechanism for you to loosely control the ratio of page frames used for files rather than those used for computational (working or program text) segments by adjusting the minperm% and maxperm% parameter values according to the following guidelines:

If the percentage of real memory occupied by file pages falls below the minperm% value, the virtual memory manager (VMM) page-replacement algorithm steals both file and computational pages, regardless of repage rates.
If the percentage of real memory occupied by file pages rises above the maxperm% value, the virtual memory manager page-replacement algorithm steals both file and computational pages.
If the percentage of real memory occupied by file pages is between the minperm% and maxperm% parameter values, the virtual memory manager normally steals only file pages, but if the repaging rate for file pages is higher then the repaging rate for computational pages, the computational pages are stolen as well.

Use the following algorithm to calculate the default values:

minperm (in pages) = ((number of page frames)-1024) * 0.2

Then convert the minperm to a minperm% by dividing it by the total number of real-memory page frames.
maxperm (in pages) = ((number of page frames)-1024) * 0.8

Then convert the maxperm to a maxperm% by dividing it by the total number of real-memory page frames.

Use the following command to change the value of the minperm% parameter to 5 percent of the total number of page frames, and the value of the maxperm% parameter to 20 percent of the total number of page frames:

# /usr/sbin/vmo -r -o minperm%=5 -o maxperm%=20

The default values are 20 percent and 80 percent, respectively.

To optimize for quick response when opening new database connections, adjust the minfree parameter to maintain enough free pages in the system to load the application into memory without adding additional pages to the free list. To determine the real memory size (resident set size, working set) of a process, use the following command:

$ ps v process_id

Set the minfree parameter to this value or to 8 frames, whichever is larger.

If the database files are on raw devices, or if you are using Direct I/O, you can set the minperm% and maxperm% parameters to low values, for example 5 percent and 20 percent, respectively. This is because the AIX file buffer cache is not used either for raw devices or for Direct I/O. The memory might be better used for other purposes, such as for the Oracle System Global Area.

Allocating Sufficient Paging Space (Swap Space)

Inadequate paging space (swap space) usually causes the system to hang or suffer abnormally slow response times. On AIX, you can dynamically add paging space on raw disk partitions. The amount of paging space you should configure depends on the amount of physical memory present and the paging space requirements of your applications. Use the lsps command to monitor paging space use and the vmstat command to monitor system paging activities. To increase the paging space, use the smit pgsp command.

On platforms where paging space is pre-allocated, Oracle recommends that you set the paging space to a value larger than the amount of RAM. But on AIX paging space is not allocated until needed. The system uses swap space only if it runs out of real memory. If the memory is sized correctly, there is no paging and the page space can be small. Workloads where the demand for pages does not fluctuate significantly perform well with a small paging space. Workloads likely to have peak periods of increased paging require enough paging space to handle the peak number of pages.

As a general rule, an initial setting for the paging space is half the size of RAM plus 4 GB, with an upper limit of 32 GB. Monitor the paging space use with the lsps -a command, and increase or decrease the paging space size accordingly. The metric %Used in the output of lsps -a is typically less than 25% on a healthy system. A properly sized deployment should require very little paging space and an excessive amount of swapping is an indication that the RAM on the system might be undersized.

CAUTION:

Do not undersize the paging space. If you do, the system can terminate active processes when it runs out of space. However, over-sizing the paging space has little or no negative impact.

Controlling Paging

Constant and excessive paging indicates that the real memory is over-committed. In general, you should:

Avoid constant paging unless the system is equipped with very fast expanded storage that makes paging between memory and expanded storage much faster than Oracle can read and write data between the SGA and disks.
Allocate limited memory resource to where it is most beneficial to system performance. It is sometimes a recursive process of balancing the memory resource requirements and trade-offs.
If memory is not adequate, build a prioritized list of memory-requiring processes and elements of the system. Assign memory to where the performance gains are the greatest. A prioritized list might look like:
1. OS and RDBMS kernels
2. User and application processes
3. Redo log buffer
4. PGAs and shared pool
5. Database block buffer caches

For instance, if you query Oracle dynamic performance tables and views and find that both the shared pool and database buffer cache require more memory, assigning the limited spare memory to the shared pool might be more beneficial than assigning it to the database block buffer caches.

The following AIX commands provide paging status and statistics:

vmstat -s
vmstat interval [repeats]
sar -r interval [repeats]

Setting the Database Block Size

You can configure the Oracle database block size for better I/O throughput. On AIX, you can set the value of the DB_BLOCK_SIZE initialization parameter to between 2 KB and 32 KB, with a default of 4 KB. If the Oracle database is installed on a journaled file system, then the block size should be a multiple of the file system blocksize (4 KB on JFS, 16 K to 1 MB on GPFS). For databases on raw partitions, the Oracle database block size is a multiple of the operating system physical block size (512 bytes on AIX).

Oracle recommends smaller Oracle database block sizes (2 KB or 4 KB) for online transaction processing (OLTP) or mixed workload environments and larger block sizes (8 KB, 16 KB, or 32 KB) for decision support system (DSS) workload environments.

Tuning the Log Archive Buffers

By increasing the LOG_BUFFER size you might be able to improve the speed of archiving the database, particularly if transactions are long or numerous. Monitor the log file I/O activity and system throughput to determine the optimum LOG_BUFFER size. Tune the LOG_BUFFER parameter carefully to ensure that the overall performance of normal database activity does not degrade.

Note:

The LOG_ARCHIVE_BUFFER_SIZE parameter was obsoleted with Oracle8i.

I/O Buffers and SQL*Loader

For high-speed data loading, such as using the SQL*Loader direct path option in addition to loading data in parallel, the CPU spends most of its time waiting for I/O to complete. By increasing the number of buffers, you can usually push the CPU usage harder, thereby increasing overall throughput.

The number of buffers (set by the SQL*Loader BUFFERS parameter) you choose depends on the amount of available memory and how hard you want to push CPU usage. See Oracle Database Utilities for information about adjusting the file processing options string for the BUFFERS parameter.

The performance gains depend on CPU usage and the degree of parallelism that you use when loading data.

See Also:

For more generic information about the SQL*Loader utility, see Oracle Database Utilities.

BUFFER Parameter for the Import Utility

Set the BUFFER parameter for the Import utility to a large value to optimize the performance of high-speed networks when they are used. For instance, if you use the IBM RS/6000 Scalable POWERparallel Systems (SP) switch, you should set the BUFFER parameter to a value of at least 1 MB.

Disk I/O Issues

Disk I/O contention can result from poor memory management (with subsequent paging and swapping), or poor distribution of tablespaces and files across disks.

Make sure that the I/O activity is distributed evenly across multiple disk drives by using AIX utilities such as filemon, sar, iostat, and other performance tools to identify any disks with high I/O activity.

AIX Logical Volume Manager

The AIX Logical Volume Manager (LVM) can stripe data across multiple disks to reduce disk contention. The primary objective of striping is to achieve high performance when reading and writing large sequential files. Effective use of the striping features in the LVM allows you to spread I/O more evenly across disks, resulting in greater overall performance.

Note:

Do not add logical volumes to Automatic Storage Management (ASM) disk groups. ASM works best when you add raw disk devices to disk groups. If you are using ASM, do not use LVM for striping. Automatic Storage Management implements striping and mirroring.

Design a Striped Logical Volume

When you define a striped logical volume, you must specify the following items:

Item	Recommended Settings
Drives	At least two physical drives. The drives should have minimal activity when performance-critical sequential I/O is executed. Sometimes you might need to stripe the logical volume between two or more adapters.
Stripe unit size	Although the stripe unit size can be any power of two from 2 KB to 128 KB, stripe sizes of 32 KB and 64 KB are good values for most workloads. For Oracle database files, the stripe size must be a multiple of the database block size.
Size	The number of physical partitions allocated to the logical volume must be a multiple of the number of disk drives used.
Attributes	Cannot be mirrored. Set the `copies` attribute to a value of 1.

Other Considerations

Performance gains from effective use of the LVM can vary greatly, depending on the LVM you use and the characteristics of the workload. For DSS workloads, you can see substantial improvement. For OLTP-type or mixed workloads, you can still expect significant performance gains.

Using Journaled File Systems Compared to Raw Logical Volumes

Note the following considerations when you are deciding whether to use journaled file systems or raw logical volumes:

File systems are continually being improved, as are various file system implementations. In some cases, file systems provide better I/O performance than raw devices.
File Systems require some additional configuration (AIX minservers and maxservers parameter) and add a small CPU overhead because Asynchronous I/O on file systems is serviced outside of the kernel.
Different vendors implement the file system layer in different ways to exploit the strengths of different disks. This makes it difficult to compare file systems across platforms.
The introduction of more powerful LVM interfaces substantially reduces the tasks of configuring and backing up logical disks based on raw logical volumes.
The Direct I/O and Concurrent I/O feature included in AIX 5L improves file system performance to a level comparable to raw logical volumes.

If you use a journaled file system, it is easier to manage and maintain database files than if you use raw devices. In earlier versions of AIX, file systems supported only buffered read and write and added extra contention because of imperfect inode locking. These two issues are solved by the JFS2 Concurrent I/O feature and the GPFS Direct I/O feature, enabling file systems to be used instead of raw devices, even when optimal performance is required.

Note:

To use the Oracle Real Application Clusters option, you must place data files in an ASM disk group on raw devices or on a GPFS file system. You cannot use JFS or JFS2. Direct I/O is implicitly enabled when you use GPFS.

File System Options

AIX 5L includes Direct I/O and Concurrent I/O support. Direct I/O and Concurrent I/O support allows database files to exist on file systems while bypassing the operating system buffer cache and removing inode locking operations that are redundant with the features provided by Oracle Database.

Where possible, Oracle recommends enabling Concurrent I/O or Direct I/O on file systems containing Oracle data files. The following table lists file systems available on AIX and the recommended setting.

File System	Option	Description
JFS	dio	Concurrent I/O is not available on JFS. Direct I/O (dio) is available, but performance is degraded compared to JFS2 with Concurrent I/O.
JFS large file	none	Oracle does not recommend using JFS large file for Oracle Database because its 128 KB alignment constraint prevents you from using Direct I/O.
JFS2	cio	Concurrent I/O (cio) is a better setting than Direct I/O (dio) on JFS2 because it has support for multiple concurrent readers and writers on the same file.
GPFS	N/A	Oracle Database silently enables Direct I/O on GPFS for optimum performance. GPFS' Direct I/O already supports multiple readers and writers on multiple nodes. Therefore, Direct I/O and Concurrent I/O are the same thing on GPFS.

Considerations for JFS and JFS2

Starting with Oracle Database 10g, you can use the FILESYSTEMIO_OPTIONS initialization parameter to configure a database to use either direct I/O or concurrent I/O when accessing datafiles, depending on the file system that is used to store them. When you specify the value SETALL for this parameter:

Datafiles on a JFS file system are accessed using direct I/O
Datafiles on a JFS2 file system are accessed using concurrent I/O

Note:

With this setting for the FILESYSTEMIO_OPTIONS parameter, you do not have to use the cio or dio mount options for these file systems to use concurrent or direct I/O. This means that you can place the software files (the Oracle home directory) and the database files on the same file system if required.

Oracle recommends that you use the value SETALL instead of the value DIRECTIO, because the DIRECTIO value disables asynchronous I/O. The default value for this parameter is ASYNC.

Note:

External utilities or tools cannot read datafiles on JFS2 file systems if you use the values SETALL or DIRECTIO for the FILESYSTEMIO_OPTIONS parameter. For more information, refer to the AIX documentation which covers the open() restriction in cio mode.

Although you do not have to use a separate file system for database files to use direct or concurrent I/O, you can still obtain performance benefits by placing Oracle Database log files on a separate JFS2 file system created using the agblksize=512 option and mounted with the cio option. This delivers logging performance within a few percentage points of the performance of a raw device.

Considerations for GPFS

If you are using GPFS, you can use the same file system for all purposes including the Oracle home directory, database files, and logs. For optimal performance, you should use a large GPFS block size (typically at least 512 KB). GPFS is designed for scalability and there is no requirement to create multiple GPFS file systems as long as the amount of data fits in a single GPFS file system.

Moving from a Journaled File System to Raw Logical Volumes

To move from a journaled file system to raw devices without having to manually reload all of the data, perform the following as the root user:

Create a raw device (preferable in a BigVG) using the new raw logical volume device type (-T O), which allows putting the first Oracle block at offset zero for optimal performance:
```
# mklv -T O -y new_raw_device VolumeGroup NumberOfPartitions
```
Note:
The raw device should be larger than the existing file. Be sure to mind the size of the new raw device to prevent wasting space.
Set the permissions on the raw device.
Use dd to convert and copy the contents of the JFS file to the new raw device, as follows:
```
# dd if=old_JFS_file of=new_raw_device bs=1m
```
Note:
If you do not use the -T O option with the mklv command, then you must specify an offset in this dd command. Use the $ORACLE_HOME/bin/offset utility to determine the correct offset value.
Rename the data file.

Moving from Raw Logical Volumes to a Journaled File System

The first Oracle block on a raw logical volume is not necessarily at offset zero, whereas the first Oracle block on a file system is always at offset zero. To determine the offset and locate the first block on a raw logical volume, use the $ORACLE_HOME/bin/offset command. The offset can be 4096 bytes or zero on AIX logical volumes created with the mklv -T O option.

When you have determined the offset, you can copy over data from a raw logical volume to a file system using the dd command and skipping the offset. The following example assumes an offset of 4096 bytes:

# dd if=old_raw_device bs=4k skip=1|dd of=new_file bs=256k

You can instruct Oracle Database to use a number of blocks smaller than the maximum capacity of a raw logical volume. If you do, you must add a count clause to make sure to copy only data that contains Oracle blocks. The following example assumes an offset of 4096 bytes, an Oracle block size of 8 KB, and 150000 blocks:

# dd if=old_raw_device bs=4k skip=1|dd bs=8k count=150000|dd of=new_file bs=256k

Using Asynchronous I/O

Oracle Database takes full advantage of asynchronous I/O (AIO) provided by AIX, resulting in faster database access.

AIX 5L supports asynchronous I/O (AIO) for database files created both on file system partitions and on raw devices. AIO on raw devices is implemented fully into the AIX kernel, and does not require database processes to service the AIO requests. When using AIO on file systems, the kernel database processes (aioserver) control each request from the time a request is taken off the queue until it completes. The kernel database processes are also used with I/O with virtual shared disks (VSDs) and HSDs with FastPath disabled. By default, FastPath is enabled. The number of aioserver servers determines the number of AIO requests that can be executed in the system concurrently, so it is important to tune the number of aioserver processes when using file systems to store Oracle Database data files.

Note:

If you are using AIO with VSDs and HSDs with AIO FastPath enabled (the default), the maximum buddy buffer size must be greater than or equal to 128 KB.

Use one of the following commands to set the number of servers. This applies only when using asynchronous I/O on file systems rather than raw devices:

smit aio
chdev -l aio0 -a maxservers=’m’ -a minservers=’n’

See Also:
For more information about SMIT, see the System Management Interface Tool (SMIT) online help, and for more information about the smit aio and chdev commands, see the man pages.

Note:

Starting with AIX 5L version 5.2, there are two AIO subsystems available: Legacy AIO (aio0) and POSIX AIO (posix_aio0). Oracle Database 10g release 1 (10.1.0) uses Legacy AIO (aio0). Both AIO subsystems have the same performance characteristics and the rootpre.sh script that you run before installing Oracle Database enables both.

Set the minimum value to the number of servers to be started at system boot. Set the maximum value to the number of servers that can be started in response to a large number of concurrent requests. These parameters apply to file systems only, they do not apply to raw devices.

The default value for the minimum number of servers is 1. The default value for the maximum number of servers is 10. These values are usually too low to run Oracle Database on large systems with 4 CPUs or more, if you are not using kernelized AIO. Oracle recommends that you set the parameters to the values listed in the following table:

Parameter	Value
minservers	Oracle recommends an initial value equal to the number of CPUs on the system or 10, whichever is lower.
maxservers	Starting with AIX 5L version 5.2, this parameter counts the maximum number of AIO servers per CPU, whereas on previous versions of AIX it was a system-wide value. If you are using GPFS, set maxservers to worker1threads divided by the number of CPUs. This is the optimal setting and increasing maxservers will not lead to additional I/O performance. If you are using JFS/JFS2, set the initial value to (10 * number of logical disks / number of CPUs) and monitor the actual number of aioservers started during a typical workload using the `pstat` or `ps` commands. If the actual number of active aioservers is equal to the maxservers, then increase the maxservers value.
maxreqs	Set the initial value to (4 * number of logical disks * queue depth). You can determine the queue depth (typically 3), by running the following command: $ lsattr -E -l `hdiskxx`

If the value of the maxservers or maxreqs parameter is set too low, you will see the following warning messages repeated:

Warning: lio_listio returned EAGAINPerformance degradation may be seen.

You can avoid these errors by increasing the value of the maxservers parameter. To display the number of AIO servers running, enter the following commands as the root user:

# pstat -a | grep -c aios
# ps -k | grep aioserver

Check the number of active AIO servers periodically and change the values of the minservers and maxservers parameters if necessary. The changes take place when the system restarts.

I/O Slaves

I/O Slaves are specialized Oracle processes that perform only I/O. They are rarely used on AIX, as asynchronous I/O is the default and recommended way for Oracle to perform I/O operations on AIX. I/O Slaves are allocated from shared memory buffers. I/O Slaves use a set of initialization parameters, listed in the following table.

Parameter	Range of Values	Default Value
DISK_ASYNCH_IO	TRUE/FALSE	TRUE
TAPE_ASYNCH_IO	TRUE/FALSE	TRUE
BACKUP_TAPE_IO_SLAVES	TRUE/FALSE	FALSE
DBWR_IO_SLAVES	0 - 999	0
DB_WRITER_PROCESSES	1-20	1

Generally, you do not need to adjust the parameters in the preceding table. However, on large workloads, the database writer might become a bottleneck. If it does, increase DB_WRITER_PROCESSES. As a general rule, do not increase the number of database writer processes above one for each 2 CPUs in the system or partition.

There are times when you need to turn off asynchronous I/O, for example, if instructed to do so by Oracle Support for debugging. You can use the DISK_ASYNCH_IO and TAPE_ASYNCH_IO parameters to switch off asynchronous I/O for disk or tape devices. Because the number of I/O slaves for each process type defaults to zero, by default no I/O Slaves are deployed.

Set the DBWR_IO_SLAVES parameter to greater than 0 only if the DISK_ASYNCH_IO, or the TAPE_ASYNCH_IO parameter is set to FALSE. Otherwise, the database writer process (DBWR) becomes a bottleneck. In this case, the optimal value on AIX for the DBWR_IO_SLAVES parameter is 4.

Using the DB_FILE_MULTIBLOCK_READ_COUNT Parameter

If Direct I/O or Concurrent I/O is specified using FILESYSTEMIO_OPTIONS or when using raw devices, the file system does not perform any read-ahead on sequential scans. The read ahead is performed by Oracle Database as specified by the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter.

Setting a large value for the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter usually yields better I/O throughput on sequential scans. On AIX, this parameter ranges from 1 to 512, but using a value higher than 16 usually does not provide additional performance gain.

Set this parameter so that its value when multiplied by the value of the DB_BLOCK_SIZE parameter produces a number larger than the LVM stripe size. Such a setting causes more disks to be used.

Using Write Behind

The write behind feature enables the operating system to group write I/Os together up to the size of a partition. Doing this increases performance because the number of I/O operations is reduced. The file system divides each file into 16 KB partitions to increase write performance, limit the number of dirty pages in memory, and minimize disk fragmentation. The pages of a particular partition are not written to disk until the program writes the first byte of the next 16 KB partition. To set the size of the buffer for write behind to eight 16 KB partitions, enter the following command:

# /usr/sbin/ioo -o numclust=8

To disable write behind, enter the following command:

# /usr/sbin/ioo -o numclust=0

Tuning Sequential Read Ahead

The information in this section applies only to file systems, and only when neither Direct I/O nor Concurrent I/O are used.

The Virtual Memory Manager (VMM) anticipates the need for pages of a sequential file. It observes the pattern in which a process accesses a file. When the process accesses two successive pages of the file, the VMM assumes that the program will continue to access the file sequentially, and schedules additional sequential reads of the file. These reads overlap the program processing and make data available to the program sooner. Two VMM thresholds, implemented as kernel parameters, determine the number of pages it reads ahead:

minpgahead

The number of pages read ahead when the VMM first detects the sequential access pattern
maxpgahead

The maximum number of pages that VMM reads ahead in a sequential file

Set the minpgahead and maxpgahead parameters to appropriate values for your application. The default values are 2 and 8 respectively. Use the /usr/sbin/ioo command to change these values. You can use higher values for the maxpgahead parameter in systems where the sequential performance of striped logical volumes is of paramount importance. To set the minpgahead parameter to 32 pages and the maxpgahead parameter to 64 pages, enter the following command as the root user:

# /usr/sbin/ioo –r –o minpgahead=32 –maxpgahead=64

Set both the minpgahead and maxpgahead parameters to a power of two. For example, 2, 4, 8,...512, 1042... and so on.

Tuning Disk I/O Pacing

Disk I/O pacing is an AIX mechanism that allows the system administrator to limit the number of pending I/O requests to a file. This prevents disk I/O intensive processes from saturating the CPU. Therefore, the response time of interactive and CPU-intensive processes does not deteriorate.

You can achieve disk I/O pacing by adjusting two system parameters: the high-water mark and the low-water mark. When a process writes to a file that already has a pending high-water mark I/O request, the process is put to sleep. The process wakes up when the number of outstanding I/O requests falls below or equals the low-water mark.

You can use the smit command to change the high and low-water marks. Determine the water marks through trial-and-error. Use caution when setting the water marks because they affect performance. Tuning the high and low-water marks has less effect on disk I/O larger than 4 KB.

You can determine disk I/O saturation by analyzing the result of iostat, in particular, the percentage of iowait and tm_act. A high iowait percentage combined with high tm_act percentages on specific disks is an indication of disk saturation. Note that a high iowait alone is not necessarily an indication of I/O bottleneck.

Minimizing Remote I/O Operations

Oracle Real Application Clusters running on the SP architecture uses VSDs or HSDs as the common storage that is accessible from all instances on different nodes. If an I/O request is to a VSD where the logical volume is local to the node, local I/O is performed. The I/O traffic to VSDs that are not local goes through network communication layers.

For better performance, it is important to minimize remote I/O as much as possible. Redo logs of each instance should be placed on the VSDs that are on local logical volumes. Each instance should have its own undo segments that are on VSDs mapped to local logical volumes if updates and insertions are intensive.

In each session, each user is allowed only one temporary tablespace. The temporary tablespaces should each contain at least one data file local to each of the nodes.

Carefully design applications and databases (by partitioning applications and databases, for instance) to minimize remote I/O.

Resilvering with Oracle Database

If you disable mirror write consistency (MWC) for an Oracle data file allocated on a raw logical volume (LV), the Oracle Database crash recovery process uses resilvering to recover after a system crash. This resilvering process prevents database inconsistencies or corruption.During crash recovery, if a data file is allocated on a logical volume with more than one copy, the resilvering process performs a checksum on the data blocks of all of the copies. It then performs one of the following:

If the data blocks in a copy have valid checksums, the resilvering process uses that copy to update the copies that have invalid checksums.
If all copies have blocks with invalid checksums, the resilvering process rebuilds the blocks using information from the redo log file. It then writes the data file to the logical volume and updates all of the copies.

On AIX, the resilvering process works only for data files allocated on raw logical volumes for which MWC is disabled. Resilvering is not required for data files on mirrored logical volumes with MWC enabled, because MWC ensures that all copies are synchronized.If the system crashes while you are upgrading a previous release of Oracle Database that used data files on logical volumes for which MWC was disabled, enter the syncvg command to synchronize the mirrored LV before starting Oracle Database. If you do not synchronize the mirrored LV before starting the database, Oracle Database might read incorrect data from an LV copy.

Note:

If a disk drive fails, resilvering does not occur. You must enter the syncvg command before you can reactivate the LV.

Caution:

Oracle supports resilvering for data files only. Do not disable MWC for redo log files.

CPU Scheduling and Process Priorities

The CPU is another system component for which processes might contend. Although the AIX kernel allocates CPU effectively most of the time, many processes compete for CPU cycles. If your system has more than one CPU (SMP), there might be different levels of contention on each CPU.

Changing Process Running Time Slice

The default value for the runtime slice of the AIX RR dispatcher is ten milliseconds. Use the schedtune command to change the time slice. However, be careful when using this command. A longer time slice causes a lower context switch rate if the applications’ average voluntary switch rate is lower. As a result, fewer CPU cycles are spent on context-switching for a process and the system throughput should improve.

However, a longer runtime slice can deteriorate response time, especially on a uniprocessor system. The default runtime slice is usually acceptable for most applications. When the run queue is high and most of the applications and Oracle shadow processes are capable of running a much longer duration, you might want to increase the time slice by entering the following command:

# /usr/sbin/schedo -o timeslice=n

In the previous command, choosing a value for n of 0 results in a slice of 10 milliseconds (ms), choosing a value of 1 results in a slice of 20 ms, choosing a value of 2 results in a slice of 30 ms, and so on. The time slice is calculated using the following formula:

(n+1) * 10

Using Processor Binding on SMP Systems

Binding certain processes to a processor can improve performance substantially on an SMP system. Processor binding is available and fully functional on AIX 5L.

However, starting with AIX 5L version 5.2, specific improvements in the AIX scheduler allow Oracle Database processes to be scheduled optimally without the need for processor binding. Therefore, Oracle no longer recommends binding processes to processors when running on AIX 5L version 5.2 or later.

Backing Up Raw Devices

Oracle recommends that you use RMAN to back up raw devices. If you do use the dd command to back up raw devices, use it with caution, as follows.

The offset of the first Oracle block on a raw device may be 0, 4K depending on the device subtype. You can use the offset command to determine the proper offset.

When creating a logical volume, Oracle recommends using an offset of zero, which is possible if you use the -T O option. However, existing raw logical volumes created with earlier versions of Oracle Database typically have a non-zero offset. Use the $ORACLE_HOME/bin/offset command to determine the offset of a logical volume containing an Oracle datafile. The following example shows how to back up and restore a raw device whose first Oracle block is at offset 4K:

$ dd if=/dev/raw_device of=/dev/rmt0.1 bs=256k

To restore the raw device from tape, enter commands similar to the following:

$ dd if=/dev/rmt0.1 of=/dev/raw_device count=63 seek=1 skip=1 bs=4k
$ mt -f /dev/rmt0.1 bsf 1
$ dd if=/dev/rmt0.1 of=/dev/raw_device seek=1 skip=1 bs=256k

Oracle Real Application Clusters

The following sections provide information about Oracle Real Application Clusters.

UDP Tun ing

If you are using an IP-based interconnect, such as GigaBit Ethernet, Oracle Real Application Clusters uses user datagram protocol (UDP) for interprocess communications on AIX. You can tune UDP kernel parameters to improve Oracle performance. You can modify kernel UDP buffering on AIX by changing the udp_sendspace and udp_recvspace parameters. The udp_sendspace value must always be greater than the value of the Oracle Database DB_BLOCK_SIZE initialization parameter. Otherwise, one or more of the Oracle Real Application Clusters instances will fail at startup.

Set the value of the udp_sendspace parameter to the product of DB_BLOCK_SIZE by DB_FILE_MULTIBLOCK_READ_COUNT. For example, if the value of the DB_BLOCK_SIZE parameter is 16 KB and the value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter is 16, set the udp_sendspace parameter value to 256 KB, that is 262144.
Set the value of the udp_recvspace parameter to at least four times the value of the udp_sendspace parameter.

You can use the following commands to change the settings:

$ /usr/sbin/no -o udp_sendspace=65536
$ /usr/sbin/no -o udp_recvspace=655360

To make the changes persist after system restart, edit the /etc/rc.net file. The following is an example of the entry in the rc.net file:

if [ -f /usr/sbin/no ] ; then

/usr/sbin/no -o ipqmaxlen=512

/usr/sbin/no -o udp_sendspace=65536

/usr/sbin/no -o udp_recvspace=655360

fi

To monitor the suitability of the udp_recvspace parameter settings, enter the following command:

$ netstat -p udp | grep "socker buffer overflows"

If the number of overflows is not zero, increase the value of the udp_recvspace parameter. You can use the following command to reset error counters before monitoring again:

$ netstat -Zs -p udp

See Also:

For more information about AIX tuning parameters, see the AIX 5L Performance Management Guide.

Network Tuning for Transparent Application Failover

If you are experiencing a failover time of more than 10 minutes with Transport Application Failover, consider tuning the network parameters rto_length, rto_low, and rto_high to reduce the failover time.

The lengthy Transparent Application Failover time is caused by a TCP timeout and retransmission problem in which clients connected to a crashed node do not receive acknowledgement from the failed instance. Consequently, the client continues to retransmit the same packet using an Exponential Backoff algorithm (refer to TCP/IP documentation for more information).

On AIX, the default timeout value is set to approximately 9 minutes. You can use the no command to tune this value using the rto_length, rto_low, and rto_high parameters. Using these parameters, you can control how often and how many times a client should retransmit the same packet before it gives up. The rto_low (default is 1 second) and rto_high (default is 64 seconds) parameters control how often to transmit the packet, while the rto_length (default is 13) parameter controls how many times to transmit the packet.

For example, using the Exponential Backoff algorithm with the AIX default values, the timeout value is set to approximately 9.3 minutes. However, using the same algorithm, and setting rto_length to 7, the timeout value is reduced to 2.5 minutes.

Note:

Check the quality of the network transmission before setting any of the parameters described in this section. To check the quality of the network transmission, use the netstat command. Bad quality network transmissions might require a longer timeout value.

Oracle Real Application Clusters Compatibility and Clusterware

With Oracle Database 10g, Real Application Clusters (RAC) uses the group services provided by the AIX 5L RSCT Peer Domains (RPD). RAC no longer relies on specific services provided by HACMP. In particular, there is no need to configure the PGSD_SUBSYS variable in the information repository.RAC is compatible with HACMP, however, HACMP is not required for RAC installations. HACMP is typically present when concurrent raw logical volumes are used instead of a GPFS file system. PSSP is not supported for this release of Oracle Database.If you are using an IP-based interconnect, such as Gigabit Ethernet, IEEE 802.3ad, EtherChannel, or IP over SP Switch, RAC determines the interfaces to use from the IP address values specified by the CLUSTER_INTERCONNECTS initialization parameter.

Oracle Real Application Clusters and Interconnect Configuration

When you install Oracle Cluster Ready Services (CRS), you must specify the IP addresses of the private interconnects on each node. When you install RAC, it uses the private interfaces that you specified during the CRS installation. If you want to use an alternative cluster interconnect for RAC, change the value specified for the CLUSTER_INTERCONNECTS parameter.

When the interconnect (IPC) used by Real Application Clusters is based on the Internet Protocol (IP), RAC takes advantage of the fault tolerance and link aggregation provided by AIX 5L using the IEEE 802.3ad Link Aggregation or EtherChannel technologies. This replaces the Fault Tolerant IPC feature (FT-IPC) that was used in previous versions of Real Application Clusters.

Link Aggregation using 802.3ad provides the same level of fault tolerance and adds support for bandwidth aggregation. It also simplifies the configuration of Real Application Clusters.

The CLUSTER_INTERCONNECTS parameter specifies the IP addresses that RAC should use for the interconnect. This parameter typically contains only the IP address of the interface created through IEEE 802.3ad Link Aggregation or EtherChannel. For more information, refer to the AIX System Management Guide: Communications and Networks: EtherChannel and IEEE 802.3ad Link Aggregation.

Address Models on AIX

A 32-bit application program on AIX can be operating on one of three different virtual address models: Default Address Space Model, Large Address Space Model, and the Very Large Address Space Model. The Very Large Address-Space Model is available on AIX 5L and later versions. The address-space model the 32-bit application program runs on determines how AIX assigns the 16 available segments, and consequently how many segments the application has available for heap and shared memory. These address space models apply only to 32-bit applications; for 64-bit applications that have access to 2^36 segments, the allocation of segments between data or shared memory is not an issue, because there are plenty of segments available. For 32-bit Oracle clients that require additional heap space, you can set the address space model according to the needs of your application.

Setting the Address Space Model

By setting the proper MAXDATA values, the address space can be split between data and shared memory as required for the 32-bit Oracle application. The following table shows examples of some of the options and their effects on Oracle memory usage:

Address Space Model	MAXDATA	Max Datasize	Max Shared Memory Size	Heap@	Shared Library	Comments
Default	not specified	256 MB	2.5 GB	0x2	shared	Default oracle linking.
Large	MAXDATA=0x80000000	2 GB	512 MB	0x3	shared	This is the maximum value of MAXDATA allowed in the Large Address Space model.
Very Large	MAXDATA=0@DSA	256 MB	3.0 GB	0x2	private	AIX 5L, version 5.2 or later. MAXDATA=0@DSA is a special case which moves the shared library text from 0xC to the 0x2 private text segment.
Large	MAXDATA=0x20000000	512 MB	2 GB	0x3	shared	None.
Very Large	MAXDATA=0x90000000@DSA	2.25 GB	256 MB	0x3	shared	AIX 5L version 5.1 or later.

You can control the address model used by AIX by setting MAXDATA in one of the following ways:

By setting options on the ld command at link time.
By editing the header of the executable.
By exporting the LDR_CNTRL environment variable at run time (overrides the options set using the first or second methods). If you use the LDR_CNTRL environment variable to control the memory options, the same value must be set for all Oracle processes, for example, the Oracle Net listener and SQL*Plus.

To enable the large address space model, which allows your program to have more than 256 MB of user data by allocating it from the shared memory segments, you would normally use the -bmaxdata link flag. For example, the -bmaxdata:0x80000000 option would allow you to use eight 256 MB segments (segments 3-10) for user space (2 GB); this is the maximum and probably not what you would normally need to use.

It is also possible to patch an executable to enable the large address space model without relinking. The following script [obsoleted by ldedit] sets the MAXDATA value to 0x80000000, which is the maximum value:

/usr/bin/echo '\0200\0\0\0' | dd of=executable_file_name bs=4 count=1 seek=19 \ conv=notrunc

In AIX 5L version 5.2, you can use the ldedit command to set the MAXDATA value. For example, enter the following command to change the MAXDATA value to 0x80000000@DS:

/usr/ccs/bin/ldedit -bmaxdata:0x80000000/dsa a.out

AIX Kernel Modes

AIX 5L is the latest version of the AIX operating system from IBM. It is designed to exploit advanced 64-bit system and software architectures. A feature in AIX 5L is the option of running the kernel in either 64-bit or 32-bit mode. In AIX 5L version 5.2, the 32-bit kernel is installed by default, but there is an option to select the 64-bit kernel at installation time, or to select it after the installation. For information about changing the AIX kernel mode, refer to OracleMetaLink Note:169426.1.

To run a 64 bit kernel or application, you need 64-bit hardware. To determine if your hardware supports 64-bit mode, enter the following command:

$ /usr/bin/getconf HARDWARE_BITMODE

If the system output returns the number 64, it can run 64-bit applications or the 64 bit kernel, or both.

To determine the mode of a kernel already running, enter the following command:

$ /usr/bin/getconf KERNEL_BITMODE

If the system returns the number 64, the 64-bit kernel is running.

Note:

Oracle Database 10g runs only on 64-bit hardware. However, you can install it on systems using either the 32-bit or 64-bit kernel.

In AIX 5L, the bit-mode (32-bit or 64-bit) of the application and kernel are independent. The only requirement to run 64-bit applications is that the hardware supports 64-bit mode. You can run 64-bit applications using either the 32-bit or 64-bit kernel, or 32-bit applications using either the 32-bit or 64-bit kernel.

If the application requires the use of a kernel extension, then the bit-mode of the extension and the kernel must match. Oracle Database 10g does not require a kernel extension

On smaller systems, both bit-modes of the kernel provide similar performance, however, the 64-bit kernel provides enhanced functionality and scalability. The 64-bit kernel addresses bottlenecks which could limit throughput on 32-bit systems. The 64-bit kernel also improves scalability by allowing you to use larger sizes of physical memory. The 32-bit kernel is limited to 96 GB of physical memory. The 64-bit kernel is optimized for running 64-bit applications on POWER4 systems.

AIX Dynamic Logical Partitioning (DLPAR)

A logical partition (LPAR) is the division of a system’s processors, memory, and I/O adapters into multiple environments, where each environment can be operated independently with its own operating system and applications. Logical partitioning on POWER4 pSeries servers uses hardware partitioning.

Dynamic Logical Partitioning (DLPAR) extends the capability of LPAR by providing the ability to logically attach and detach a managed system's resources (like processors, memory, or I/O adapters) to and from a hardware partition without its operating system being rebooted.

Oracle 10g can run in an AIX DLPAR logical partition and can take advantage of additional resources, such as processors, memory or I/O, which are dynamically added to the system. Also, if resources such as processors or memory are no longer required to meet the performance requirement of the instance, they can be removed while the Oracle instance continues to run and process transactions. When the instance starts, you need to set the Oracle dynamic SGA parameters (such as SGA_MAX_SIZE) correctly, so that the SGA size can be increased dynamically to take advantage of additional memory which might be added to the DLPAR.

Each partition must have at least one processor, 256 MB memory, one I/O adapter associated with boot device, and a network adapter.

The movement of an LPAR resource (processor, memory or I/O adapter) from one hardware partition to another is managed with Hardware Management Console (HMC). The Hardware Management Console (HMC) is a dedicated appliance connected to the managed system through serial connection to perform hardware management tasks. The resource movement requests can be initiated either through selections made on a graphical user interface of the HMC, which can be accessed locally from the console itself, or from a remote Web-based System Manager (WebSM) client. These resource requests can also be initiated by issuing HMC command line functions.

For more information about creating and managing LPARs and moving resources across partitions, refer to the eServer Hardware Management Console for pSeries Installation and Operations Guide, document number SA38-0590 from the following Web site:

http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/HW_hmc.htm

A Administering Oracle Database on AIX

Memory and Paging

Controlling Buffer-Cache Paging Activity

Tuning the AIX File Buffer Cache

Tuning the minperm% and maxperm% Parameters

Allocating Sufficient Paging Space (Swap Space)

Controlling Paging

Setting the Database Block Size

Tuning the Log Archive Buffers

I/O Buffers and SQL*Loader

BUFFER Parameter for the Import Utility

Disk I/O Issues

AIX Logical Volume Manager

Design a Striped Logical Volume

Other Considerations

Using Journaled File Systems Compared to Raw Logical Volumes

File System Options

Moving from a Journaled File System to Raw Logical Volumes

Moving from Raw Logical Volumes to a Journaled File System

Using Asynchronous I/O

I/O Slaves

Using the DB_FILE_MULTIBLOCK_READ_COUNT Parameter

Using Write Behind

Tuning Sequential Read Ahead

Tuning Disk I/O Pacing

Minimizing Remote I/O Operations

Resilvering with Oracle Database

CPU Scheduling and Process Priorities

Changing Process Running Time Slice

Using Processor Binding on SMP Systems

Backing Up Raw Devices

Oracle Real Application Clusters

Oracle Real Application Clusters Compatibility and Clusterware

Oracle Real Application Clusters and Interconnect Configuration

Address Models on AIX

Setting the Address Space Model

AIX Kernel Modes

AIX Dynamic Logical Partitioning (DLPAR)

Tuning the minp erm% and maxperm% Parameters