Oracle® Fusion Middleware Release Notes 11g Release 1 (11.1.1) for Linux x86 Part Number E10133-04 |
|
|
View PDF |
This chapter describes issues associated with Oracle Fusion Middleware high availability and enterprise deployment. It includes the following topics:
This section describes general issue and workarounds. It includes the following topic:
Section 6.1.1, "Discoverer Managed Server Starts in Admin Mode on Unpacked Machine"
Section 6.1.2, "mod_wl Not Supported for OHS Routing to Managed Server Cluster"
If you use the pack and unpack commands for Managed Server for Oracle Portal, Oracle Forms, Oracle Reports, and Oracle Discoverer in a cluster, make sure to copy the applications from the first node to the second node (because these are externally staged applications). For details about using the pack and unpack commands, see Oracle WebLogic Server Creating Templates and Domains Using the Pack and Unpack Commands.
Oracle Fusion Middleware supports only mod_wls_ohs
and does not support mod_wl
for Oracle HTTP Server routing to a cluster of managed servers.
For Oracle Fusion Middleware high availability deployments, Oracle strongly recommends following only the configuration procedures documented in the Oracle Fusion Middleware High Availability Guide and the Oracle Fusion Middleware Enterprise Deployment Guides.
This section describes configuration issues and their workarounds. It includes the following topics:
Section 6.2.1, "XEngine Not Installed on Second Node in a Clustered Environment"
Section 6.2.2, "jca.retry.count Doubled in a Clustered Environment"
Section 6.2.4, "WebLogic Server Restart after Abrupt Machine Failure"
Section 6.2.5, "Port Translation with Oracle Portal Loopback"
Section 6.2.6, "Fusion Middleware Control May Display Incorrect Status"
Section 6.2.7, "Accumulated BPEL Instances Cause Performance Decrease"
Section 6.2.8, "Extra Message Enqueue when One a Cluster Server is Brought Down and Back Up"
Section 6.2.9, "Duplicate Unrecoverable Human Workflow Instance Created with Oracle RAC Failover"
Section 6.2.11, "Load Balancer Issue when Two Nodes Each have a Managed Server"
Section 6.2.12, "No High Availability Support for SOA B2B TCP/IP"
In a clustered environment, the XEngine does not get installed on the second node when the node is on another computer. This is because the XEngine extraction occurs only when you run the Configuration Wizard (which is not run automatically on the second node). The workaround is to perform the XEngine extraction manually in this case. After completing the XEngine extraction, you must restart the server.
In a clustered environment, each node maintains its own in-memory Hasmap for inbound retry. The jca.retry.count
property is specified as 3 for the inbound retry feature. However, each node tries three times. As a result, the total retry count becomes 6 if the clustered environment has two nodes.
All the machines in a cluster must be in the same time zone. WAN clusters are not supported by Oracle Fusion Middleware high availability. Even machines in the same time zone may have issues when started by command line. Oracle recommends using Node Manager to start the servers.
If Oracle WebLogic Server does not restart after abrupt machine failure when JMS messages and transaction logs are stored on NFS mounted directory, the following errors may appear in the server log files:
<MMM dd, yyyy hh:mm:ss a z> <Error> <Store> <BEA-280061> <The persistent store "_WLS_server_soa1" could not be deployed: weblogic.store.PersistentStoreException: java.io.IOException: [Store:280021]There was an error while opening the file store file "_WLS_SERVER_SOA1000000.DAT" weblogic.store.PersistentStoreException: java.io.IOException: [Store:280021]There was an error while opening the file store file "_WLS_SERVER_SOA1000000.DAT" at weblogic.store.io.file.Heap.open(Heap.java:168) at weblogic.store.io.file.FileStoreIO.open(FileStoreIO.java:88)
If an of abrupt machine failure occurs, WebLogic Server restart or whole server migration may fail if the transaction logs or JMS persistence store directory is mounted using NFS. WebLogic Server maintains locks on files used for storing JMS data and transaction logs to protect from potential data corruption if two instances of the same WebLogic Server are accidently started. NFS protocol is stateless, and the storage device does not become aware of machine failure, therefore, the locks are not released by the storage device. As a result, after abrupt machine failure, followed by a restart, any subsequent attempt by WebLogic Server to acquire locks on the previously locked files may fail. Refer to your storage vendor documentation for additional information on the locking of files stored in NFS mounted directories on the storage device.
Use one of the following two solutions to unlock the logs and data files.
Solution 1
Manually unlock the logs and JMS data files and start the servers by creating a copy of the locked persistence store file and using the copy for subsequent operations. To create a copy of the locked persistence store file, rename the file, and then copy it back to its original name. The following sample steps assume that transaction logs are stored in the /shared/tlogs
directory and JMS data is stored in the /shared/jms
directory.
cd /shared/tlogs mv _WLS_SOA_SERVER1000000.DAT _WLS_SOA_SERVER1000000.DAT.old cp _WLS_SOA_SERVER1000000.DAT.old _WLS_SOA_SERVER1000000.DAT cd /shared/jms mv SOAJMSFILESTORE_AUTO_1000000.DAT SOAJMSFILESTORE_AUTO_1000000.DAT.old cp SOAJMSFILESTORE_AUTO_1000000.DAT.old SOAJMSFILESTORE_AUTO_1000000.DAT mv UMSJMSFILESTORE_AUTO_1000000.DAT UMSJMSFILESTORE_AUTO_1000000.DAT.old cp UMSJMSFILESTORE_AUTO_1000000.DAT.old UMSJMSFILESTORE_AUTO_1000000.DAT
With this solution, the WebLogic file locking mechanism continues to provide protection from any accidental data corruption if multiple instances of the same servers were accidently started. However, the servers must be restarted manually after abrupt machine failures. File stores will create multiple consecutively numbered .DAT files when they are used to store large amounts of data. All files may need to be copied and renamed when this occurs.
Solution 2
Disable WebLogic file locking by disabling the native I/O wlfileio2 driver. The following sample steps move the shared object for the driver to a backup location, effectively removing it.
cd WL_HOME/server/native/platform/cpu_arch mv libwlfileio2.so /shared/backup
With this solution, since the WebLogic locking is disabled, automated server restarts and failovers succeed. In addition, this may result in performance degradations. Be very cautious when using this solution. Always configure the database based leasing option, which enforces additional locking mechanism using database tables, and prevents automated restart of more than one instance of same WebLogic Server. Additional procedural precautions must be implemented to avoid any human error and ensure that one and only one instance of a server is manually started at any given point of time. Similarly, extra precautions must be taken to ensure that no two domains have a store with the same name that references the same directory.
In a high availability Portal implementation, it is often required to configure the Parallel Page Engine to loopback requests through a load balancer. When configuring the load balancer for Portal Loopback ensure that it is not configured with Port Translation. For example:
The correct configuration: Load Balancer Listens for Requests on Port 7777 and Passes them onto Web Cache Port 7777.
The incorrect configuration: Load Balancer Listens for Requests on Port 8888 and Passes them onto Web Cache Port 7777.
In some instances, Oracle WebLogic Fusion Middleware Control may display the incorrect status of a component immediately after the component has been restarted or failed over.
In a scaled out clustered environment, if a large number of BPEL instances are accumulated in the database, it causes the database's performance to decrease, and the following error is generated: MANY THREADS STUCK FOR 600+ SECONDS.
To avoid this error, remove old PBEL instances from the database.
In a non-XA environment, MQSeries Adapters do not guarantee the only once delivery of the messages from inbound adapters to the endpoint in case of local transaction. In this scenario, if an inbound message is published to the endpoint, and before committing the transaction, the SOA server is brought down, inbound message are rolled back and the same message is again dequeued and published to the endpoint. This creates an extra message in outbound queue.
In an XA environment, MQ Messages are actually not lost but held by Queue Manager due to an inconsistent state. To retrieve the held messages, restart the Queue Manager.
As soon as Oracle Human Workflow commits its transaction, the control passes back to BPEL, which almost instantaneously commits its transaction. Between this window, if the Oracle RAC instance goes down, on failover, the message is retried and can cause duplicate tasks. The duplicate task can show up in two ways - either a duplicate task appears in worklistapp, or an unrecoverable BPEL instance is created. This BPEL instance appears in BPEL Recovery. It is not possible to recover this BPEL instance as consumer, because this task has already completed.
The following information refers to Chapter 10, "Managing the Topology," of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite.
When performing a planned stop of the Administration Server's node (rebooting or shutting down the Admin Server's machine), it may occur that the OS NFS service is disabled before the Administration Server itself is stopped. This (depending on the configuration of services at the OS level) can cause the detection of missing files in the Administration Server's domain directory and trigger their deletion in the domain directories in other nodes. This can result in the framework deleting some of the files under domain_dir/fmwconfig/
. This behavior is typically not observed for unplanned downtimes, such as machine panic, power loss, or machine crash. To avoid this behavior, shutdown the Administration Server before performing reboots or, alternatively, use the appropriate OS configuration to set the order of services in such a way that NFS service is disabled with later precedence than the Administration Server's process. See your OS administration documentation for the corresponding required configuration for the services' order.
If the cluster configuration is made up of the following:
Unicast messaging for cluster communication
Clustered Servers running on different physical machines
No ListenAddress specified for the servers in the cluster
Be sure to do the following:
Define a custom network channel for cluster-broadcast protocol on each of the clustered servers. The channel must have the same name on each server.
Set its ListenAddress/port to one of the IP/Port of the machine where the server is running.
Set the Unicast Broadcast Channel name for the cluster to be the newly defined custom channel. This channel should be outbound-enabled.
High availability failover support is not available for SOA B2B TCP/IP protocol. This effects primarily deployments using HL7 over MLLP. For inbound communication in a clustered environment, all B2B servers are active and the address exposed for inbound traffic is a load balancer virtual server. Also, in an outage scenario where an active managed server is no longer available, the persistent TCP/IP connection is lost and the client is expected to reestablish the connection.
The routing of requests from Oracle HTTP Server to composites' end points exposed in the WLS_SOA servers begins as soon as the WLS_SOA servers change to status "running." The soa-infra application may be unavailable regardless of whether the WLS_SOA server is running. Additionally, composite deployment and syncing across a cluster can take some time after the start of the WLS_SOA server. This may lead to Oracle HTTP Server starting to route to soa-infra context URL while the required composites are not yet available in a server that is being started (after a failover, server migration or simple restart in the node). Oracle recommends including the appropriate retry code in the clients when invoking the end points to overcome the possible 404 HTTP error codes. After the full composite syncing completes, the errors should stop.
This section describes documentation errata. It includes the following topic:
Section 6.3.1, "Documentation Errata for the Fusion Middleware High Availability Guide"
Section 6.3.2, "Documentation Errata for the Fusion Middleware Enterprise Deployment Guides"
This section contains Documentation Errata for Oracle Fusion Middleware High Availability Guide.
Several manuals in the Oracle Fusion Middleware 11g documentation set have information on Oracle Fusion Middleware system requirements, prerequisites, specifications, and certification information.
The latest information on Oracle Fusion Middleware system requirements, prerequisites, specifications, and certification information can be found in the following documents on Oracle Technology Network:
http://www.oracle.com/technology/software/products/ias/files/fusion_certification.html
This document contains information related to hardware and software requirements, minimum disk space and memory requirements, and required system libraries, packages, or patches.
Oracle Fusion Middleware Certification information at:
http://www.oracle.com/technology/software/products/ias/files/fusion_certification.html
This document contains information related to supported installation types, platforms, operating systems, databases, JDKs, and third-party products.
Section 5.11.1.5, "Synchronizing System Clocks," of the Oracle Fusion Middleware High Availability Guide, states that "Oracle recommends synchronizing system clocks on each of the cluster nodes for high availability SOA deployments."
Synchronizing system clocks in a cluster is not a recommendation; it is a mandatory requirement.
Oracle Access Manager components use proprietary protocols called Oracle Access Protocol (OAP) and Oracle Identity Protocol (OIP) to communicate with each other.
Oracle Access Protocol (OAP) enables communication between Access System components (for example, Policy Manager, Access Manager, and WebGate) during user authentication and authorization. This protocol was formerly known as NetPoint Access Protocol (NAP) or COREid Access Protocol.
Oracle Identity Protocol (OIP) governs communications between Identity System components (for example, Identity Server, WebPass) and a Web server. This protocol was formerly known as NetPoint Identity Protocol (NIP) or COREid Identity Protocol.
In the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle Identity Management, Section 1.4.2 "Understanding the Application Tier" includes a reference to the NAP (Network Access Protocol) port, which should be a reference to the Oracle Access Protocol (OAP) port. This section also includes a reference to the NIP (Network Identity Protocol) port, which should be a reference to the Oracle Identity Protocol (OIP) port.
In the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle Identity Management, Section 1.4.3 "Understanding the Web Tier" includes a reference to the Network Access Protocol (NAP), which should be a reference to the Oracle Access Protocol (OAP).
In the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle Identity Management, Table 2-2 "Ports Used in the Oracle Identity Management Enterprise Deployment Topology" includes two references to the NAP protocol, which should be references to the OAP protocol. This table also includes two references to the NIP protocol, which should be references to the OIP protocol.
In the Oracle Fusion Middleware High Availability Guide, before section "12.6.5.6.4 Restart WLS_REPORTS and WLS_REPORTS1," the following procedure for creating an Oracle Reports server cluster is missing:
By creating a Reports cluster with a database reports queue it is possible to link all of the Reports servers to the same queue. The benefit of this procedure is that when a server has spare capacity, it can take and execute the next report in the queue, thereby distributing the load. It also ensures that if a cluster member becomes unavailable, another Reports server can detect this and run any reports on which the failed server was working.
Create a Reports cluster by adding a cluster entry to the rwservlet.properties
file on both APPHOST1 and APPHOST2.
Cluster APPHOST1
Edit the rwservlet.properties
file located in the DOMAIN_HOME/user_projects/domains/ReportsDomain/servers/WLS_REPORTS/stage/reports/reports/configuration directory.
Add the following line:
<cluster clustername="cluster_reports" clusternodes="rep_wls_reports1_APPHOST2_reports2"/>
Note:
The value ofclusternodes
is the value which appears in the <server>
tag in the rwservlet.properties
file located on APPHOST2.Note:
Theclusternodes
parameter should list all of the Reports servers in the cluster (comma separated) EXCEPT the local Reports server.Cluster APPHOST2
Edit the rwservlet.properties
file located in the DOMAIN_HOME/user_projects/domains/ReportsDomain/servers/WLS_REPORTS1/stage/reports/reports/configuration directory.
Add the following line:
<cluster clustername="cluster_reports" clusternodes="rep_wls_reports_APPHOST1_reports1"/>
Note:
The value ofclusternodes
is the value which appears in the <server>
tag in the rwservlet.properties
file located on APPHOST1.Note:
Theclusternodes
parameter should list all of the Reports servers in the cluster (comma separated) EXCEPT the local Reports server.This section contains Documentation Errata for Oracle Fusion Middleware Enterprise Deployment Guides.
Section 6.3.2.1, "Quartz Requires Synchronizing System Clocks in a Cluster"
Section 6.3.2.2, "Changes Required in for Deploying FOD in and EDG SOA Topology"
Section 6.3.2.3, "Configuration Changes Propagation Information Missing from SOA EDG"
Section 6.3.2.4, "Converting Discussions Forum from Multicast to Unicast"
In Chapter 2, "Database and Environment Preconfiguration," of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite, the following information is missing:
Quartz
Oracle SOA Suite uses Quartz to maintain its jobs and schedules in the database. For the Quartz jobs to be run on different Oracle SOA nodes in a cluster, it is required that the system clocks on the cluster nodes be synchronized.
The following information is missing from section 10.2, "Deploying Composites and Artifacts in SOA Enterprise Deployment Topology" of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite:
When deploying SOA Fusion Order Demo, the following additional steps are required in addition to the deployment steps provided in the FOD's README file).
Change the nostage property to false in the build.xml
file of the Web applications so that ear files are copied to each node. Edit the CreditCardAuthorization
and OrderApprvalHumanTask
build.xm
l files, located at FOD_dir\CreditCardAuthorization\bin
and FOD_dir\OrderApprovalHumanTask\bin
directories, and change the following field:
<target name="deploy-application"> <wldeploy action="deploy" name="${war.name}" source="${deploy.ear.source}" library="false" nostage="false" user="${wls.user}" password="${wls.password}" verbose="false" adminurl="${wls.url}" remote="true" upload="true" targets="${server.targets}" /> </target>
To:
<target name="deploy-application"> <wldeploy action="deploy" name="${war.name}" source="${deploy.ear.source}" library="false" nostage="true" user="${wls.user}" password="${wls.password}" verbose="false" adminurl="${wls.url}" remote="true" upload="true" targets="${server.targets}" /> </target>
Change the target for the Web applications so that deployments are targeted to the SOA Cluster and not to an individual server. Edit the build.properties
file for FOD, located in the FOD_Dir/bin
directory, and change the following field:
# wls target server (for shiphome set to server_soa, for ADRS use AdminServer) server.targets=SOA_Cluster (the SOA cluster name in your SOA EDG)
Change the JMS seed templates so that instead of regular Destinations, Uniform Distributed Destinations are used and the JMS artifacts are targeted to the EDG JMS Modules. Edit the createJMSResources.seed
file, located in the FOD_DIR\bin\templates
directory, and change:
# lookup the SOAJMSModule - it's a system resource jmsSOASystemResource = lookup("SOAJMSModule","JMSSystemResource") jmsResource = jmsSOASystemResource.getJMSResource() cfbean = jmsResource.lookupConnectionFactory('DemoSupplierTopicCF') if cfbean is None: print "Creating DemoSupplierTopicCF connection factory" demoConnectionFactory = jmsResource.createConnectionFactory('DemoSupplierTopicCF') demoConnectionFactory.setJNDIName('jms/DemoSupplierTopicCF') demoConnectionFactory.setSubDeploymentName('SOASubDeployment') . topicbean = jmsResource.lookupTopic('DemoSupplierTopic') if topicbean is None: print "Creating DemoSupplierTopic jms topic" demoJMSTopic = jmsResource.createTopic("DemoSupplierTopic") demoJMSTopic.setJNDIName('jms/DemoSupplierTopic') demoJMSTopic.setSubDeploymentName('SOASubDeployment')
To:
# lookup the SOAJMSModule - it's a system resource jmsSOASystemResource = lookup("SOAJMSModuleUDDs","JMSSystemResource") jmsResource = jmsSOASystemResource.getJMSResource() cfbean = jmsResource.lookupConnectionFactory('DemoSupplierTopicCF') if cfbean is None: print "Creating DemoSupplierTopicCF connection factory" demoConnectionFactory = jmsResource.createConnectionFactory('DemoSupplierTopicCF') demoConnectionFactory.setJNDIName('jms/DemoSupplierTopicCF') demoConnectionFactory.setSubDeploymentName('SOAJMSSubDM') . topicbean = jmsResource.lookupTopic('DemoSupplierTopic') if topicbean is None: print "Creating DemoSupplierTopic jms topic" demoJMSTopic = jmsResource.createDistributedTopic("DemoSupplierTopic") demoJMSTopic.setJNDIName('jms/DemoSupplierTopic') demoJMSTopic.setSubDeploymentName('SOAJMSSubDM')
The following information is missing from Chapter 10, "Managing the Topology" of the Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite.
Configuration Changes being applied to the SOA and BAM components in an EDG Topology:
If you are using Oracle SOA Suite in a clustered environment, any configuration property changes you make in Oracle Enterprise Manager on one node must be made on all nodes. Configuration properties are set in Oracle Enterprise Manager through the following options of the SOA Infrastructure men:
Administration > System MBean Browser SOA Administration > any property selections Services and References > Properties tab.
In addition, consider the following when making configuration changes to BAM Server in a BAM EDG Topology:
Since server migration is used, the BAM Server is moved to a different node's domain directory. It is necessary to pre-create the BAM Server configuration in the failover node. The BAM Server configuration files are located in the following directory:
ORACLE_BASE/admin/<domain_name>/mserver/<domain_name>/servers/<servername>/tmp/_WL_user/oracle-bam_11.1.1 /*/APP-INF/classes/config/
Where '*' represents a directory name randomly generated by Oracle WebLogic Server during deployment, for example, 3682yq
.
In order to create the files in preparation for possible failovers, you can force a server migration and copy the files from the source node. For example, for BAM:
Configure the driver for WLS_BAM1 in BAMHOST1.
Force a failover of WLS_BAM1 to BAMHOST2. Verify the directory structure for the BAM Server in the failover node:
cd ORACLE_BASE/admin/<domain_name>/mserver/
<domain_name>/servers/<servername>/tmp/_WL_user/oracle-bam_11.1.1
/*/APP-INF/classes/config/
Where '*' represents a directory name randomly generated by Oracle WebLogic Server during deployment, for example, 3682yq
.
Do a remote copy of the BAM Server configuration file from BAMHOST1 to BAMHOST2:
BAMHOST1> scp
ORACLE_BASE/admin/<domain_name>/mserver/<domain_name>
/servers/<servername>/tmp
/_WL_user/oracle-bam_11.1.1/*/APP-INF/classes/config/* oracle@BAMHOST2:
ORACLE_BASE/admin/<domain_name>/mserver/<domain_name>/servers/<servername>/tmp
/_WL_user/oracle-bam_11.1.1/*/APP-INF/classes/config/
The procedure for converting Discussions Forum from multicast to unicast is missing from Chapter 6, Configuring High Availability for Oracle ADF and WebCenter Applications, in the the Oracle Fusion Middleware High Availability Guide.
To convert Discussions Forum from multicast to unicast:
Step 1: Enable system properties in the Oracle Coherence configuration files
To override the default Oracle Coherence settings, set the relevant system properties in the coherence .xml files. For Discussions Forum, edit the tangosol-coherence-override.xml
file. This file is part of the coherence.jar
deployed with Discussions Forum. Make the following changes to this file:
Extract the tangosol-coherence-override.xml
file from coherence.jar
in the WLS_Services deployment directory (jar xvf coherence.jar).
Add the following lines to the file within the cluster-config element:
<unicast-listener> <well-known-addresses> <socket-address id="1"> <address system-property="tangosol.coherence.wka1"></address> <port system-property="tangosol.coherence.wka1.port">8088</port> </socket-address> <socket-address id="2"> <address system-property="tangosol.coherence.wka2"></address> <port system-property="tangosol.coherence.wka2.port">8088</port> </socket-address> ....etc.... <socket-address id="9"> <address system-property="tangosol.coherence.wka9"></address> <port system-property="tangosol.coherence.wka9.port">8088</port> </socket-address> </well-known-addresses> </unicast-listener>
Jar up the file and restart the server (jar cvf coherence.jar *).
Step 2: Add the startup parameters
To add the relevant startup parameters:
In the Oracle WebLogic Server Administration Console, select Servers, WLS_Services1, Configuration, and then Server Start.
In the Arguments box, add the following:
-Dtangosol.coherence.wka1=Host1 -Dtangosol.coherence.wka2=Host2 -Dtangosol.coherence.localhost=Host1 -Dtangosol.coherence.wka1.port=8088 -Dtangosol.coherence.wka2.port=8088
Where Host1 is where WLS_Services1 is running.
Repeat steps 1 and 2 for WLS_Services2, swapping Host1 for Host2 and Host2 for Host1.
Restart the WLS_Services servers.
Step 3: Validate the changes
To validate the changes:
Log on to the Discussions Forum Administration panel.
Select Cache Settings in the left pane.
At the bottom of the screen, ensure that Clustering is set to enabled.
Repeat steps 1 through 3 for all members of the cluster.
As servers join the cluster they appear at the top of the screen.