Oracle® Application Server 10g High Availability Guide
10g (9.0.4) Part No. B10495-02 |
|
![]() |
![]() |
In this release of Oracle Application Server 10g, 10g (9.0.4), work has been done to improve and extend the high availability solutions for Oracle Application Server. Several new solutions for the Oracle Application Server 10g Infrastructure have been tested and are described in this book. All of these solutions seek to ensure that applications that you deploy on Oracle Application Server 10g meet the required availability to achieve your business goals. The solutions and procedures described in this book seek to eliminate single points of failure of any Oracle Application Server components with no or minimal outage in service.
This chapter explains high availability and its importance from the perspective of Oracle Application Server.
The availability of a system or any component in that system is defined by the percentage of time that it works normally. A system works normally when it meets its correctness and performance specifications. For example, a system that works normally for twelve hours per day is 50% available. A system that has 99% availability is down 3.65 days per year on average. System administrators can expect critical systems to have 99.99% or even 99.999% availability. This means that the systems experience as little as four to five minutes of downtime per year.
Availability may not be constant over time. For example, availability may be higher during the daytime when most transactions occur, and lower during the night and on weekends. In the event of an unexpected disaster, such as a fire or earthquake, a system may go down suddenly for a period of time. However, because the Internet provides a global set of users, it is a common requirement that systems always be available.
Redundant components can improve availability, but only if a spare component takes over immediately for a failed component. If it takes ten minutes to detect a component failure and twenty additional minutes to start the spare component, then the system experiences a 50% reduction in availability for that hour of service.
Oracle Application Server is designed to provide a wide variety of high availability solutions, ranging from load balancing and basic clustering to providing maximum system availability during catastrophic hardware and software failures.
Oracle Application Server consists of many components that can be deployed in distributed topologies. The underlying paradigm used to enable high availability for Oracle Application Server is clustering, which unites various Oracle Application Server components in certain permutations to offer scalable and unified functionality, and redundancy should any of the individual components fail.
Before you continue, we recommend that you read the book Oracle Application Server 10g Concepts to gain an understanding of the different components in Oracle Application Server. The descriptions there will allow you to understand the rest of the text in this guide more efficiently.
Oracle Application Server has several solutions and techniques to achieve high availability, which are all described in this guide. They allow you to achieve the following goals:
Redundancy
A highly available system requires its sub-systems to be redundant. All Oracle Application Server components can be deployed redundantly using the procedures and solutions described in this book. Depending on the type of components, they can be deployed in an active-active configuration or active-passive configuration.
In active-active configuration, multiple instances of a component service client requests at the same time. If one instance fails, the requests being serviced by that instance can be fulfilled by other active instances; the failure and failover of that instance is transparent to clients. An active-active configuration can usually be achieved by clustering instances of components together.
In active-passive configuration, requests are usually serviced by one instance of a component. Upon failure of that component, another instance is made active to respond to the request workload.
Death Detection and Auto Restart
Software processes belonging to Oracle Application Server components, local or distributed, are managed by a central process management system. This system is able to detect the death of processes and restart them even if they are distributed over multiple machines. The system allows customization of parameter values that define process death and restart (such as number of heartbeats). The processes implementing the process management system are themselves redundant as each has a shadow process.
Clustering
Clustering components of a system together allows the components to be viewed functionally as a single entity from the perspective of a client. A cluster increases the scalability, availability, and manageability of the components.
Several types of clusters can exist with Oracle Application Server components. Procedures to create and configure these clusters are comprehensively documented in this book.
State Replication and Routing
For stateful client requests, Oracle Application Server can replicate client state in order to enable stateful failover of requests in the event that processes servicing these requests fail. For J2EE requests, replicating client state for J2EE applications can be done declaratively or programmatically, depending on the mechanism being used. For most other components, state-based routing using cookies is available.
Connection Failure Management
Clients often connect to services on the server and reuse these connections. When a process implementing one of these services on the server is restarted, the connection may need to be re-established.
Oracle Application Server components ensure that if a reused connection fails, the connection is retried before a failure condition is propagated to the rest of the system. This allows clients to be transparent to any failures.
Backup and Recovery
Oracle Application Server provides facilities for backing up system state and using this backup to recover from failures. In certain circumstances, a component or system failure may not be repairable. The Oracle Application Server Backup and Recovery Tool can be used to back up the system at certain intervals and restore a backup when an unrepairable failure occurs.
For specific problems localized to the HTTP listener and J2EE container, a runtime configuration management system allows these components to be check pointed quickly and also allows for undo operations for configuration errors.
Disaster Recovery
Natural and physical disasters can happen to areas where an Oracle Application Server site hosting critical applications is physically located. A solution for recovering from such disasters is documented in this guide. This solution is a site-to-site recovery solution that allows the backing up of the state of an entire Oracle Application Server site and recovering it to another site that is physically distant from the first.
Table 1-1 depicts the various types of failures that are possible with the Oracle Application Server system and the strategies that are used to prevent or solve the failures. For the purpose of discussion, maintenance activities during planned downtime is also included.
Table 1-1 System downtime, failures, and availability solutions
As depicted, solutions exist to prevent or recover from unplanned system failures to unintentional human errors. These solutions enable Oracle Application Server to be robust and reliable, and offer high availablity to the applications that it hosts.
This guide has been organized into several chapters using the layers of the middle tier and Oracle Application Server Infrastructure as a baseline. When the term "middle tier" is mentioned in this book, the reference is made generically to the Oracle Application Server middle tier installation types. However, where Oracle Application Server Clusters are discussed, only the J2EE and Web Cache installation type is inferred as this is the only middle tier installation type that can be part of an Oracle Application Server Cluster.
Chapters 2 and 3 contain the description and configuration of the middle tier for high availability, respectively. Chapters 3 and 5 have the similar organization of information but for the Infrastructure. Chapter 6 contains the setup and operational information for the site-to-site Oracle Application Server Disaster Recovery solution.
The following table provides a list of cross-references to high availability information in other documents in the Oracle library. This information mostly pertains to high availability of various Oracle Application Server components.
Table 1-2 Cross-references to high availability information in Oracle documentation
Component | Location of Information |
---|---|
Overall high availability concepts | In the high availability chapter of Oracle Application Server 10g Concepts. |
Oracle installer | In the chapter for installing in a high availability environment in Oracle Application Server 10g Installation Guide. |
Oracle Application Server Backup and Recovery Tool | In the backup and restore part of Oracle Application Server 10g Administrator's Guide. |
Oracle Application Server Web Cache
|
Oracle Application Server Web Cache Administrator's Guide
|
Identity Management service replication | In "Advanced Configurations" chapter of Oracle Application Server Single Sign-On Administrator's Guide. |
Identity Management high availability deployment | In "Directory Replication and High Availability" chapter of Oracle Internet Directory Administrator's Guide.
In "Oracle Identity Management Deployment Planning" chapter of Oracle Identity Management Concepts and Deployment Planning Guide. |
Database high availability | Oracle High Availability Architecture and Best Practices |
Distributed Configuration Management commands | Distributed Configuration Management Reference Guide
|
Oracle Process Management and Notification commands | Oracle Process Manager and Notification Server Administrator's Guide
|
OC4J high availability | Oracle Application Server Containers for J2EE Services Guide
Oracle Application Server Containers for J2EE User's Guide Oracle Application Server Containers for J2EE Enterprise JavaBeans Developer's Guide |
Java Object Cache
|
Oracle Application Server Web Services Developer's Guide
|
Load balancing to OC4J processes | Oracle HTTP Server Administrator's Guide
|
Oracle Application Server Wireless high availability | Oracle Application Server Wireless Administrator's Guide
|
Oracle Application Server Reports Services high availability | Oracle Application Server Reports Services Publishing Reports to the Web
|
Oracle Application Server Discoverer high availability | Oracle Application Server Discoverer Configuration Guide
|
Oracle Application Server Forms Services high availability | Oracle Application Server Forms Services Deployment Guide
|
Oracle Application Server InterConnect ini file information | Oracle Application Server InterConnect User's Guide
|
In addition, references to these and other documentation are noted in the text of this guide, where applicable.