16 Oracle RTD Batch Framework

Oracle RTD Batch Framework is a set of components that can be used to provide batch facilities in an Inline Service. This enables the Inline Service to be used not just for processing interactive Integration Point requests, but also for running a batch of operations of any kind. Typically, a batch will read a set of input rows from a database table, flat file, or spreadsheet, process each input row in turn, and optionally write one or more rows to an output table for each input row.

The following examples describe in outline form how you can use Oracle RTD batch processing facilities in a non-interactive setting:

Create a "learning" batch to train models to learn from historical data about the effectiveness of offers previously presented to customers.
Create an "offer selection" batch which starts with a set of customers, and selects the best product to offer to each customer.
Create a "customer selection" batch which starts with a single product, and selects the best customers to whom to offer the product.
Create a batch set of e-mails where Oracle RTD selects the right content for the e-mails

Within an Inline Service, the Inline Service developer defines one or more Java classes implementing the BatchJob interface, with one BatchJob for each named batch that the Inline Service wishes to support. In the Inline Service, each of the BatchJob implementations is registered with the Oracle RTD Batch framework, making the job types available to be started by an external batch administration application. External applications may start, stop, and query the status of registered batch jobs through a BatchAdminClient class provided by the Batch Framework. The Batch Console, released with Oracle RTD, is a command-line utility that enables you to perform these batch-related administrative tasks.

Note:

The following terms are referenced throughout the Oracle RTD documentation:

RTD_HOME: This is the directory into which the Oracle RTD client-side tools are installed.
RTD_RUNTIME_HOME: This is the application server specific directory in which the application server runs Oracle RTD.

For more information, see the chapter "About the Oracle RTD Run-Time Environment" in Oracle Fusion Middleware Administrator's Guide for Oracle Real-Time Decisions.

The topics in this section are the following:

Section 16.1, "Batch Framework Architecture"
Section 16.2, "Implementing Batch Jobs"
Section 16.3, "Administering Batch Jobs"

16.1 Batch Framework Architecture

This section presents an overview of the components of the batch framework architecture and shows how batch facilities can be used across cluster servers.

16.1.1 Batch Framework Components

The following diagram shows the components of the batch framework architecture on a single Oracle RTD instance.

Surrounding text describes bf_arch_ovwx.gif.

The main batch framework components and their functions are:

Batch Admin Client

The Batch Admin Client provides a set of Java APIs that can be used by Java client applications to manage batches registered on remote Real-Time Decision Servers. This includes starting and stopping batches, and obtaining batch status information.

Customers may create their own batch client application using the APIs provided in the Batch Admin Client.

The Batch Console is a client side command line utility that manages batches registered on remote Real-Time Decision Servers. Internally, the Batch Console uses the APIs provided by the Batch Admin Client.
Batch Manager

This is a cluster-wide singleton service, that executes client batch commands from client code from the Batch Admin Client.The Batch Manager manages each Batch Agent in the cluster.

The Batch Manager also executes commands from the Batch Console.
Batch Agent

The batch agent is the interface between a batch job and the batch framework. It is a service that registers batches with the Batch Manager when the batch-enabled Inline Service is deployed, and executes batch commands on behalf of the Batch Manager.

In a clustered environment, all the batch framework components appear in each Oracle RTD instance. However, the Batch Manager is only active in one of the instances, and that active Batch Manager controls all the Batch Admin Client and Batch Agent requests in the cluster.

16.1.2 Use of Batch Framework in a Clustered Environment

The following diagram illustrates an example of the use of the batch framework in a clustered environment.

Surrounding text describes bf_arch_cluster2.gif.

A batch client application, such as the Batch Console, communicates with the Batch Manager, by invoking batch management commands, such as to start, stop, or pause a job.

Developers using Decision Studio can create and deploy Inline Services with batches to any instance where Oracle RTD is installed, such as that on Cluster server 2.

Note:

In a clustered environment, Inline Services are deployed to all servers running the Decision Service.

The diagram shows the Batch Agent on the Cluster server 2 instance registering batches with the Batch Manager.

The Batch Manager can then run batch jobs on any instance, such as that on Cluster server 3, so long as they were previously registered.

16.2 Implementing Batch Jobs

This section presents an overview of the runtime object model required to implement batches.

In order for an Inline Service to be batch-enabled, it must contain one or more batch job Java classes implementing the BatchJob interface, and register them with the batch framework.

Note:

The examples that appear in this section reference the CrossSell Inline Service released with Oracle RTD, which contains the batch job CrossSellSelectOffers.

This section consists of the following topics:

Section 16.2.1, "Implementing the BatchJob Interface"
Section 16.2.2, "Registering Batch Jobs with the Batch Framework"

16.2.1 Implementing the BatchJob Interface

You start the implementation of a batch job in Decision Studio by creating a Java class that implements the BatchJob interface.

First, you create Java packages and classes under the src branch of the Inline Service.

The following image shows the "batch processing" Java class OfferSelectJob.java declared in the package crosssell.batch:

Surrounding text describes sr_bj_src.gif.

The easiest way to create the Java classes is to subclass from BatchJobBase, provided with the batch framework.

The principal methods of a batch job are called in the following sequence when the job is started:

init()

Called once by the framework before starting the batch's processing loop.
getNextInput()

Returns the next input row to be processed by the batch.
executeRow()

The BatchJob implements this method to process the input row that was returned by getNextInput. Generally, this is called in a different thread from getNextInput.
flushOutputs()

Called by the framework to allow the BatchJob to flush its output table buffers.
cleanup()

Called by the framework after the batch is finished or is being stopped. Cleans up any resources allocated by the batch job, such as the result set created by its init() method.

For full details of the methods of the BatchJob interface, see the following Javadoc entry:

RTD_HOME\client\Batch\javadocs\com\sigmadynamics\batch\BatchJob.html

Batch Job Example

An example of a batch job, OfferSelectJob.java, appears in the CrossSell Inline Service released with Oracle RTD. This batch job selects the best offer for a set of customers, and saves the offers to a table.

16.2.2 Registering Batch Jobs with the Batch Framework

This section describes how to register the batch jobs with the Oracle RTD batch framework. You must register the Java classes that contain the batch jobs as imported Java classes, then you must explicitly register the batch jobs with the batch framework using the batchAgent.registerBatch method.

This section consists of the following topics:

Section 16.2.2.1, "BatchAgent"
Section 16.2.2.2, "Registering the Imported Java Classes in the Inline Service"
Section 16.2.2.3, "Registering the Batch Jobs in the Inline Service"

16.2.2.1 BatchAgent

In a batch job, the batch agent is the interface between a batch job and the batch framework. You need to register the batch job with the batch framework.

An Inline Service can locate its batch agent through a getter in the Logic tab of its Application object. For example, in a context where the Inline Service has access to a session, you can use the following command to access the BatchAgent:

BatchAgent batchAgent = session().getApp().getBatchAgent();

16.2.2.2 Registering the Imported Java Classes in the Inline Service

You must register the Java classes in the Inline Service, as follows:

Click the Application object's Advanced button.
In the Imported Java Classes pane, enter one line for each batch job class in the Inline Service, of the form:
```
<package>.<class>
  
```
For example:
```
crosssell.batch.OfferSelectJob
```

16.2.2.3 Registering the Batch Jobs in the Inline Service

An inline service must register its BatchJob implementations in the Logic tab of the Application, in the Initialization Logic pane, using the batchAgent.registerBatch API.

The Inline Service can locate its batch agent - its interface to the Batch Framework - through a getter in its Application object. Enter a line such as the following:

BatchAgent batchAgent = getBatchAgent();

followed by an invocation of batchAgent.registerBatch for each batch job in the Inline Service.

For full details of the parameters for batchAgent.registerBatch, see the following Javadoc entry:

RTD_HOME\client\Batch\javadocs\com\sigmadynamics\batch\BatchAgent.html

In summary form, the parameters for batchAgent.registerBatch are as follows:

batchName: A short name used to register the batch class in the cluster. It should be unique across the cluster.
batchJobClass: The fully qualified name of the batch's BatchJob implementation class.
description: If non-null, a string describing the purpose of the batch.
parameterDescriptions: An optional set of properties describing the parameters supported by the batch.
parameterDefaults: An optional set of properties providing the default values for parameters supported by the batch.

For example, to register the following:

The batch CrossSellSelectOffers that uses the class crosssell.batch.OfferSelectJob

enter the following in the Initialization Logic for the Application:

BatchAgent batchAgent = getBatchAgent();
batchAgent.registerBatch("CrossSellSelectOffers", 
                                 "crosssell.batch.OfferSelectJob",
                                 OfferSelectJob.description, 
                                 OfferSelectJob.paramDescriptions,
                                 OfferSelectJob.paramDefaults);

16.3 Administering Batch Jobs

The main way to administer batch jobs is though the command-line Batch Console utility, for example, to start, stop, and query the statuses of batches.

This utility uses the BatchAdminClient Java interface. The BatchAdminClient Java interface also provides methods for starting and managing batches for use by external programs.

This section contains the following topics:

Section 16.3.1, "Using the BatchClientAdmin Interface"
Section 16.3.2, "Using the Batch Console"

16.3.1 Using the BatchClientAdmin Interface

The BatchAdminClient Java interface provides methods for starting and managing batches for use by external programs.

Table 16-1 lists the methods for the BatchAdminClient interface.

Table 16-1 BatchAdminClient Methods

Return Type	Description
int	clearBatchStatuses() Removes batch status information for all batches that have completed.
int	clearBatchStatuses(int numToKeep) Removes batch status information for the oldest batches that have completed.
int	clearBatchStatuses(java.lang.String batchName) Removes batch status information for all batches that have completed and have the specified batch name.
int	clearBatchStatuses(java.lang.String batchName, int numToKeep) Removes batch status information for all batches that have completed and have the specified batch name.
BatchStatusBrief[]	getActiveBatches() Returns an ordered list, possibly empty, of brief status information for all batch jobs currently running, paused, or waiting to run.
java.lang.String	getBatchDescription(java.lang.String batchName) Returns a string, possibly empty, describing the purpose of the batch.
java.lang.String[]	getBatchNames() Gets a list of batches registered with the batch framework.
java.util.Properties	getBatchParameterDefaults(java.lang.String batchName) Gets properties containing the default values of the startup parameters supported by the batch.
java.util.Properties	getBatchParameterDescriptions(java.lang.String batchName) Gets properties describing the parameters supported by the batch.
BatchStatusBrief[]	getJobHistory() Returns an ordered list, possibly empty, of brief status information for all batch jobs whose status information is still retained by the batch manager -- those descriptions that have not been discarded by clearBatchStatuses.
BatchStatusBrief[]	getJobHistory(int maxToShow) Returns an ordered list, possibly empty, of brief status information for all batch jobs whose status information is still retained by the batch manager -- those descriptions that have not been discarded by clearBatchStatuses.
BatchStatus	getStatus(java.lang.String batchID) Returns the status of a batch identified by the batchID that was returned when it was submitted by a call to startBatch().
void	pauseBatch(java.lang.String batchID) Stops a batch and does not clean up its resources, so it can be resumed.
void	restartBatch(java.lang.String batchID) Restarts a stopped batch.
void	resumeBatch(java.lang.String batchID) Continues a paused batch.
java.lang.String	startBatch(java.lang.String batchName) Starts a batch in the default concurrency group with default start parameters.
java.lang.String	startBatch(java.lang.String batchName, BatchRequest startParameters) Starts a batch in the default concurrency group with the supplied start parameters.
java.lang.String	startBatch(java.lang.String batchName, java.lang.String concurrencyGroup) Starts a batch in the specified concurrency group using default start parameters.
java.lang.String	startBatch(java.lang.String batchName, java.lang.String concurrencyGroup, BatchRequest startParameters) Starts a batch in the specified concurrency group using the supplied start parameters.
void	stopBatch(java.lang.String batchID) Stops a batch and cleans up its resources by calling BatchJob.cleanup().
void	stopBatch(java.lang.String batchID, boolean discardSandboxes) Stops a batch, cleans up its resources (by calling BatchJob.cleanup()), and optionally discards any learning data and output table records generated by the batch since its last checkpoint.

For full details of the BatchAdminClient interface, see the following Javadoc entry:

RTD_HOME\client\Batch\javadocs\com\sigmadynamics\batch\client\BatchAdminClient.html

16.3.2 Using the Batch Console

The Batch Console is a command-line utility, batch-console.jar. Use the Batch Console to start, stop, and query the status of batches.

To start the Batch Console, run the following commands:

cd BATCH_HOME

Typically, BATCH_HOME is C:\OracleBI\RTD\client\Batch.
java [-Djavax.net.ssl.trustStore="<trust_store_location>"] -jar batch-console.jar -user <batch_user_name> -pw <batch_user_password> [-url <RTD_server_URL>] [-help]
Notes:
1. You must enter batch user name and password information. If you do not specify values for the -user and -pw parameters, you will be prompted for them.
2. <RTD_server_URL> (default value http://localhost:8080) is the address of the Decision Service. In a cluster, it is typically the address of the load balancer's virtual address representing the Decision Service's J2EE cluster.
3. Use the -Djavax.net.ssl.trustStore="<trust_store_location>" parameter only if SSL is used to connect to the Real-Time Decision Server, where <trust_store_location> is the full path of the truststore file. For example, -Djavax.net.ssl.trustStore="C:\OracleBI\RTD\etc\ssl\sdtrust.store". In this case, <RTD_server_URL> should look like https://localhost:8443.
4. If you enter -help, with or without other command line parameters, a usage list appears of all the Batch Console command line parameters, including -help.

To see a list of the interactive commands within Batch Console, enter ? at the command prompt:

command <requiredParam>  -- [alias] Description
 
?                        -- Show this usage text
help                     -- Show this usage text
exit                     -- Terminate this program
quit                     -- Terminate this program
batchNames               -- [bn]      Show all registered Batch
batchDesc <batchName>    -- [bd]      Show Batch Description
paramDesc <batchName>    -- [pd]      Show a batch's Parameter Descriptions
paramDef <batchName>     -- [pdef]    Show a batch's Parameter Default values
addProp <key> <value>    -- [ap]      Add one Property for next job start
removeProp <key>         -- [rp]      Remove one startup Property
showAddedProps           -- [sap]     Show all Added startup Properties
removeAddedProps         -- [rap]     Remove all Added startup Properties
startJob <batchName>     -- [start]   Start a batch job, returning a jobID
startInGroup <batchName> <groupName>
                         -- [startg]  Start a batch job in a Concurrency Group
status <jobID>           -- [sts]     Show a job's detailed runtime Status
activeJobs               -- [jobs]    Show brief status of all running, 
                                      paused, waiting jobs
jobHistory               -- [hist]    Show brief status of all submitted jobs
stopJob <jobID>          -- [stop]    Stop a job, without abililty to resume
stopJobDiscardSandbox <jobID>
                         -- [stopds]  Stop a job, without abililty
                                      to resume, discard learning sandboxes
restartJob <jobID>       -- [restart] Restart a batch job
pauseJob <jobID>         -- [pause]   Pause a job
resumeJob <jobID>        -- [resume]  Resume a paused job
discardStatusAll         -- [dsa]     Discard status information 
                                      for all non-active jobs
discardStatusOld <numToKeep>
                         -- [dso]     Discard Status for oldest non-active jobs
discardStatusName <batchName>
                         -- [dsn]     Discard Status for non-active
                                      jobs of named batch
discardStatusNameOld <batchName> <numToKeep>
                         -- [dsno]    Discard Status for oldest 
                                      non-active jobs of named batch

The rest of this section contains the following topics:

Section 16.3.2.1, "Notes on Batch Console Commands"
Section 16.3.2.2, "Running Jobs Sequentially"
Section 16.3.2.3, "Running Jobs Concurrently"

16.3.2.1 Notes on Batch Console Commands

To get a list of registered batches, enter bn or batchNames.
To get the default parameter values for a batch, enter paramDef <batchName> or pdef <batchName>.

For example, your batch may have the parameter values:
- sqlCustomers - to select the customers to process
- rowsBetweenStatusUpdates - to control how often to update the batch status
The default values for these parameters could be as follows:
- sqlCustomers = SELECT Id FROM Customers WHERE Id < 300
- rowsBetweenStatusUpdates = 1000
To supply parameter values for the next batch invocation, use the addProp command, or its alias, ap.

For example, you can override the sqlCustomers parameter to include all customers, with the following command:
- ap sqlCustomers SELECT Id FROM Customers
And if you want to update the batch status after every 1500 customers are processed, enter the following command:
- ap rowsBetweenStatusUpdates 1500
You can view all such explicitly added parameters with the showAddedProps command, or its alias, sap.

For example, if you used the preceding ap commands, the sap output would be:
```
Property                       Value
--------                       -----
rowsBetweenStatusUpdates       1500
sqlCustomers                   SELECT Id FROM Customers
```
To start a batch, use the startJob command, or its alias, start.

The output will be similar to the following:
- batchID=batch-2
The returned batchID, also known as a job-ID, identifies this job instance. You can use it to query the status of the job.

To see the runtime status of the job, pass its batchID value to the status command, or to its alias, sts.

sts batch-2

The out put will be similar to the following:

ID         Name                   State         Rows Errors Restarts 
--         ----                   -----         ---- ------ -------- 
batch-2    MyBatchJob1            Running      4,500      0        0 

 SubmitDateTime     WaitTime     RunTime      Group    Server
 --------------     --------     ------       -----    ------
 06/24/08-10:25:37  0m, 0s       0m, 0s       Default  RTDServer

If you run the status command later, you can see that the job finished without errors, after processing 50,000 customers in 9 minutes and 44 seconds:

ID         Name                   State         Rows Errors Restarts 
--         ----                   -----         ---- ------ -------- 
batch-2    MyBatchJob1            Finished    50,000      0        0 

 SubmitDateTime     WaitTime     RunTime      Group    Server
 --------------     --------     ------       -----    ------
 06/24/08-10:25:37  0m, 0s       9m, 44s      Default  RTDServer

16.3.2.2 Running Jobs Sequentially

When jobs are submitted to be started they are assigned to a concurrency group. If not specified, the default concurrency group is assigned, named Default.Jobs in the same concurrency group run sequentially, one at a time, in the sequence that they were submitted to be started. So if you start a second job before the first finishes, the second job will wait to start until after the first one finishes.This section shows the starting of the batch MyBatchJob1, and then the starting of two other batches, MyBatchJob2, and MyBatchJob3.Before starting MyBatchJob1, use the sap command to verify the console has the parameter values set for the two parameters, rowsBetweenStatusUpdates, and sqlCustomers.

After starting MyBatchJob1, clear these parameters using the removeAddedProps command (rap), so that the next two jobs will use default values for all their parameters.The jobs command shows a brief status of all running and waiting jobs. It shows the first job running, and the other two waiting.

command: batchNames
        MyBatchJob1
        MyBatchJob2
        MyBatchJob3
        MyBatchJob4
        MyBatchJob5
command: showAddedProps
        Property                       Value
        --------                       -----
        rowsBetweenStatusUpdates       1500
        sqlCustomers                   SELECT Id FROM Customers
command: start MyBatchJob1
        batchID=batch-3
command: removeAddedProps
command: start MyBatchJob2
        batchID=batch-4
command: start MyBatchJob3
        batchID=batch-5
command: jobs
  ID         Name          State    Group    Server
  --         ----          -----    -----    ------
  batch-3    MyBatchJob1   Running  Default  RTDServer
  batch-4    MyBatchJob2   Waiting  Default  none
  batch-5    MyBatchJob3   Waiting  Default  none

16.3.2.3 Running Jobs Concurrently

The startInGroup command, or its alias, startg, may be used to assign a job to a specific concurrency group. Starting two jobs in different groups allows them to run at the same time.

For example:

command: startg MyBatchJob4 myGroup1
        batchID=batch-6
command: startg MyBatchJob5 myGroup2
        batchID=batch-7
command: jobs
  ID         Name           State    Group     Server
  --         ----           -----    -----     ------
  batch-6    MyBatchJob4    Running  myGroup1  RTDServer
  batch-7    MyBatchJob5    Running  myGroup2  RTDServer

Note:

Jobs assigned to the same concurrency group may run on different servers, but the jobs cannot run concurrently. Only jobs in different groups are allowed to run concurrently.