PREPROCESSOR

Caution:

There are security implications to consider when using the PREPROCESSOR clause. See Oracle Database Security Guide for more information.

If the file you want to load contains data records that are not in a format supported by the ORACLE_LOADER access driver, then use the PREPROCESSOR clause to specify a user-supplied preprocessor program that will execute for every data file. Note that the program specification must be enclosed in a shell script if it uses arguments (see the description of "file_spec").

The preprocessor program converts the data to a record format supported by the access driver and then writes the converted record data to standard output (stdout), which the access driver reads as input. The syntax of the PREPROCESSOR clause is as follows:

directory_spec

Specifies the directory object containing the name of the preprocessor program to execute for every data file. The user accessing the external table must have the EXECUTE privilege for the directory object that is used. If directory_spec is omitted, then the default directory specified for the external table is used.

Caution:

For security reasons, Oracle strongly recommends that a separate directory, not the default directory, be used to store preprocessor programs. Do not store any other files in the directory in which preprocessor programs are stored.

The preprocessor program must reside in a directory object, so that access to it can be controlled for security reasons. The OS system manager must create a directory corresponding to the directory object and must verify that OS-user ORACLE has access to that directory. DBAs must ensure that only approved users are allowed access to the directory object associated with the directory path. Although multiple database users can have access to a directory object, only those with the EXECUTE privilege can run a preprocessor in that directory. No existing database user with read-write privileges to a directory object will be able to use the preprocessing feature. DBAs can prevent preprocessors from ever being used by never granting the EXECUTE privilege to anyone for a directory object.

See Also:

Oracle Database SQL Language Reference for information about granting the EXECUTE privilege

file_spec

The name of the preprocessor program. It is appended to the path name associated with the directory object that is being used (either the directory_spec or the default directory for the external table). The file_spec cannot contain an absolute or relative directory path.

If the preprocessor program requires any arguments (for example, gunzip -c), then you must specify the program name and its arguments in an executable shell script (or on Windows operating systems, in a batch (.bat) file). The shell script must reside in directory_spec. Keep the following in mind when you create a shell script for use with the PREPROCESSOR clause:

  • The full path name must be specified for system commands such as gunzip.

  • The preprocessor shell script must have EXECUTE permissions

  • The data file listed in the external table LOCATION clause should be referred to by $1. (On Windows operating systems, the LOCATION clause should be referred to by %1.)

  • On Windows operating systems, the first line in the .bat file must be the following:

    @echo off
    

    Otherwise, by default, Windows systems echo the contents of the batch file (which will be treated as input by the external table access driver).

See Example 15-2 for an example of using a shell script.

It is important to verify that the correct version of the preprocessor program is in the operating system directory.

Example 15-1 shows a sample use of the PREPROCESSOR clause when creating an external table. Note that the preprocessor file is in a separate directory from the data files and log files.

Example 15-1 Specifying the PREPROCESSOR Clause

SQL> CREATE TABLE xtab (recno varchar2(2000))
     2    ORGANIZATION EXTERNAL (
     3    TYPE ORACLE_LOADER
     4    DEFAULT DIRECTORY data_dir
     5    ACCESS PARAMETERS (
     6    RECORDS DELIMITED BY NEWLINE
     7    PREPROCESSOR execdir:'zcat'
     8    FIELDS (recno char(2000)))
     9    LOCATION ('foo.dat.gz'))
   10    REJECT LIMIT UNLIMITED;
Table created.

Example 15-2 shows how to specify a shell script on the PREPROCESSOR clause when creating an external table.

Example 15-2 Using the PREPROCESSOR Clause with a Shell Script

SQL> CREATE TABLE xtab (recno varchar2(2000))
     2    ORGANIZATION EXTERNAL (
     3    TYPE ORACLE_LOADER
     4    DEFAULT DIRECTORY data_dir
     5    ACCESS PARAMETERS (
     6    RECORDS DELIMITED BY NEWLINE
     7    PREPROCESSOR execdir:'uncompress.sh'
     8    FIELDS (recno char(2000)))
     9    LOCATION ('foo.dat.gz'))
   10    REJECT LIMIT UNLIMITED;
Table created.

Using Parallel Processing with the PREPROCESSOR Clause

External tables treats each data file specified on the LOCATION clause as a single granule. To make the best use of parallel processing with the PREPROCESSOR clause, the data to be loaded should be split into multiple files (granules). This is because external tables limits the degree of parallelism to the number of data files present. For example, if you specify a degree of parallelism of 16, but have only 10 data files, then in effect the degree of parallelism is 10 because 10 slave processes will be busy and 6 will be idle. It is best to not have any idle slave processes. So if you do specify a degree of parallelism, then ideally it should be no larger than the number of data files so that all slave processes are kept busy.

See Also:

Restrictions When Using the PREPROCESSOR Clause

  • The PREPROCESSOR clause is not available on databases that use the Oracle Database Vault feature.

  • The PREPROCESSOR clause does not work in conjunction with the COLUMN TRANSFORMS clause.