Skip Headers
Oracle® Text Application Developer's Guide
11g Release 1 (11.1)
Part Number B28303-02
Home
Book List
Index
Master Index
Contact Us
Next
View PDF
Contents
List of Tables
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documents
Conventions
1
Understanding Oracle Text Application Development
1.1
Introduction to Oracle Text
1.2
Document Collection Applications
1.2.1
Flowchart of Text Query Application
1.3
Catalog Information Applications
1.3.1
Flowchart for Catalog Query Application
1.4
Document Classification Applications
1.5
XML Search Applications
1.5.1
Using Oracle Text
1.5.2
Using the Oracle XML DB Framework
1.5.3
Combining Oracle Text features with Oracle XML DB
1.5.3.1
Using the Text-on-XML Method
1.5.3.2
Using the XML-on-Text Method
2
Getting Started with Oracle Text
2.1
Overview of Getting Started with Oracle Text
2.2
Creating an Oracle Text User
2.3
Query Application Quick Tour
2.3.1
Building Web Applications with the Oracle Text Wizard
2.3.1.1
Oracle JDeveloper
2.3.1.2
Oracle Text Wizard Addins
2.3.1.3
Oracle Text Wizard Instructions
2.4
Catalog Application Quick Tour
2.5
Classification Application Quick Tour
2.5.1
Steps for Creating a Classification Application
3
Indexing with Oracle Text
3.1
About Oracle Text Indexes
3.1.1
Types of Oracle Text Indexes
3.1.2
Structure of the Oracle Text CONTEXT Index
3.1.2.1
Merged Word and Theme Index
3.1.3
The Oracle Text Indexing Process
3.1.3.1
Datastore Object
3.1.3.2
Filter Object
3.1.3.3
Sectioner Object
3.1.3.4
Lexer Object
3.1.3.5
Indexing Engine
3.1.4
Partitioned Tables and Indexes
3.1.4.1
Querying Partitioned Tables
3.1.5
Creating an Index Online
3.1.6
Parallel Indexing
3.1.7
Indexing and Views
3.2
Considerations For Indexing
3.2.1
Location of Text
3.2.1.1
Supported Column Types
3.2.1.2
Storing Text in the Text Table
3.2.1.3
Storing File Path Names
3.2.1.4
Storing URLs
3.2.1.5
Storing Associated Document Information
3.2.1.6
Format and Character Set Columns
3.2.1.7
Supported Document Formats
3.2.1.8
Summary of DATASTORE Types
3.2.2
Document Formats and Filtering
3.2.2.1
No Filtering for HTML
3.2.2.2
Filtering Mixed-Format Columns
3.2.2.3
Custom Filtering
3.2.3
Bypassing Rows for Indexing
3.2.4
Document Character Set
3.2.4.1
Character Set Detection
3.2.4.2
Language Detection
3.2.4.3
Mixed Character Set Columns
3.2.5
Document Language
3.2.5.1
Language Features Outside BASIC_LEXER
3.2.5.2
Indexing Multi-language Columns
3.2.6
Indexing Special Characters
3.2.6.1
Printjoin Characters
3.2.6.2
Skipjoin Characters
3.2.6.3
Other Characters
3.2.7
Case-Sensitive Indexing and Querying
3.2.8
Language-Specific Features
3.2.8.1
Automatic Language Detection with AUTO_LEXER
3.2.8.2
Indexing Themes
3.2.8.3
Base-Letter Conversion for Characters with Diacritical Marks
3.2.8.4
Alternate Spelling
3.2.8.5
Composite Words
3.2.8.6
Korean, Japanese, and Chinese Indexing
3.2.9
Fuzzy Matching and Stemming
3.2.10
Better Wildcard Query Performance
3.2.11
Document Section Searching
3.2.12
Stopwords and Stopthemes
3.2.12.1
Automatic Language Detection and Stoplists
3.2.12.2
Multi-Language Stoplists
3.2.13
Index Performance
3.2.14
Query Performance and Storage of LOB Columns
3.2.15
Mixed Query Performance
3.3
Creating Oracle Text Indexes
3.3.1
Summary of Procedure for Creating a Text Index
3.3.2
Creating Preferences
3.3.2.1
Datastore Examples
3.3.2.2
NULL_FILTER Example: Indexing HTML Documents
3.3.2.3
PROCEDURE_FILTER Example
3.3.2.4
BASIC_LEXER Example: Setting Printjoin Characters
3.3.2.5
MULTI_LEXER Example: Indexing a Multi-Language Table
3.3.2.6
BASIC_WORDLIST Example: Enabling Substring and Prefix Indexing
3.3.3
Creating Section Groups for Section Searching
3.3.3.1
Example: Creating HTML Sections
3.3.4
Using Stopwords and Stoplists
3.3.4.1
Multi-Language Stoplists
3.3.4.2
Stopthemes and Stopclasses
3.3.4.3
PL/SQL Procedures for Managing Stoplists
3.3.5
Creating a CONTEXT Index
3.3.5.1
CONTEXT Index and DML
3.3.5.2
Default CONTEXT Index Example
3.3.5.3
Incrementally Creating an Index with ALTER INDEX and CREATE INDEX
3.3.5.4
Creating a CONTEXT Index Incrementally with POPULATE_PENDING
3.3.5.5
Custom CONTEXT Index Example: Indexing HTML Documents
3.3.5.6
CONTEXT Index Example: Query Processing with FILTER BY and ORDER BY
3.3.6
Creating a CTXCAT Index
3.3.6.1
CTXCAT Index and DML
3.3.6.2
About CTXCAT Sub-Indexes and Their Costs
3.3.6.3
Creating CTXCAT Sub-indexes
3.3.6.4
Creating CTXCAT Index
3.3.7
Creating a CTXRULE Index
3.3.7.1
Step One: Create a Table of Queries
3.3.7.2
Step Two: Create the CTXRULE Index
3.3.7.3
Step Three: Classify a Document
3.4
Maintaining Oracle Text Indexes
3.4.1
Viewing Index Errors
3.4.2
Dropping an Index
3.4.3
Resuming Failed Index
3.4.4
Re-creating an Index
3.4.4.1
Re-creating a Global Index
3.4.5
Rebuilding an Index
3.4.6
Dropping a Preference
3.5
Managing DML Operations for a CONTEXT Index
3.5.1
Viewing Pending DML
3.5.2
Synchronizing the Index
3.5.2.1
Example
3.5.2.2
Maxtime Parameter for SYNC_INDEX
3.5.2.3
Locking Parameter for SYNC_INDEX
3.5.3
Optimizing the Index
3.5.3.1
CONTEXT Index Structure
3.5.3.2
Index Fragmentation
3.5.3.3
Document Invalidation and Garbage Collection
3.5.3.4
Single Token Optimization
3.5.3.5
Viewing Index Fragmentation and Garbage Data
3.5.3.6
Examples: Optimizing the Index
4
Querying with Oracle Text
4.1
Overview of Queries
4.1.1
Querying with CONTAINS
4.1.1.1
CONTAINS SQL Example
4.1.1.2
CONTAINS PL/SQL Example
4.1.1.3
Structured Query with CONTAINS
4.1.2
Querying with CATSEARCH
4.1.2.1
CATSEARCH SQL Query
4.1.2.2
CATSEARCH Example
4.1.3
Querying with MATCHES
4.1.3.1
MATCHES SQL Query
4.1.3.2
MATCHES PL/SQL Example
4.1.4
Word and Phrase Queries
4.1.4.1
CONTAINS Phrase Queries
4.1.4.2
CATSEARCH Phrase Queries
4.1.5
Querying Stopwords
4.1.6
ABOUT Queries and Themes
4.1.6.1
Querying Stopthemes
4.1.7
Query Expressions
4.1.7.1
CONTAINS Operators
4.1.7.2
CATSEARCH Operator
4.1.7.3
MATCHES Operator
4.1.8
Case-Sensitive Searching
4.1.8.1
Word Queries
4.1.8.2
ABOUT Queries
4.1.9
Query Feedback
4.1.10
Query Explain Plan
4.1.11
Using a Thesaurus in Queries
4.1.12
Document Section Searching
4.1.13
Using Query Templates
4.1.14
Query Rewrite
4.1.15
Query Relaxation
4.1.16
Query Language
4.1.17
Alternative and User-defined Scoring
4.1.18
Alternative Grammar
4.1.19
Query Analysis
4.1.20
Other Query Features
4.2
The CONTEXT Grammar
4.2.1
ABOUT Query
4.2.2
Logical Operators
4.2.3
Section Searching
4.2.4
Proximity Queries with NEAR and NEAR_ACCUM Operators
4.2.5
Fuzzy, Stem, Soundex, Wildcard and Thesaurus Expansion Operators
4.2.6
Using CTXCAT Grammar
4.2.7
Stored Query Expressions
4.2.7.1
Defining a Stored Query Expression
4.2.7.2
SQE Example
4.2.8
Calling PL/SQL Functions in CONTAINS
4.2.9
Optimizing for Response Time
4.2.9.1
Other Factors that Influence Query Response Time
4.2.10
Counting Hits
4.2.10.1
SQL Count Hits Example
4.2.10.2
Counting Hits with a Structured Predicate
4.2.10.3
PL/SQL Count Hits Example
4.2.11
Using DEFINESCORE and DEFINEMERGE for User-defined Scoring
4.3
The CTXCAT Grammar
4.3.1
Using CONTEXT Grammar with CATSEARCH
5
Presenting Documents in Oracle Text
5.1
Highlighting Query Terms
5.1.1
Text highlighting
5.1.2
Theme Highlighting
5.1.3
CTX_DOC Highlighting Procedures
5.1.3.1
Markup Procedure
5.1.3.2
Highlight Procedure
5.1.3.3
Concordance
5.2
Obtaining Lists of Themes, Gists, and Theme Summaries
5.2.1
Lists of Themes
5.2.1.1
In-Memory Themes
5.2.1.2
Result Table Themes
5.2.2
Gist and Theme Summary
5.2.2.1
In-Memory Gist
5.2.2.2
Result Table Gists
5.2.2.3
Theme Summary
5.3
Document Presentation and Highlighting
5.3.1
Highlighting Example
5.3.2
Document List of Themes Example
5.3.3
Gist Example
6
Classifying Documents in Oracle Text
6.1
Overview of Document Classification
6.1.1
Classification Applications
6.2
Classification Solutions
6.3
Rule-Based Classification
6.3.1
Rule-based Classification Example
6.3.2
CTXRULE Parameters and Limitations
6.4
Supervised Classification
6.4.1
Decision Tree Supervised Classification
6.4.1.1
Decision Tree Supervised Classification Example
6.4.2
SVM-Based Supervised Classification
6.4.2.1
SVM-Based Supervised Classification Example
6.5
Unsupervised Classification (Clustering)
6.5.1
Clustering Example
7
Tuning Oracle Text
7.1
Optimizing Queries with Statistics
7.1.1
Collecting Statistics
7.1.1.1
Example
7.1.2
Re-Collecting Statistics
7.1.3
Deleting Statistics
7.2
Optimizing Queries for Response Time
7.2.1
Other Factors that Influence Query Response Time
7.2.2
Improved Response Time with FIRST_ROWS(n) Hint for ORDER BY Queries
7.2.2.1
About the FIRST_ROWS Hint
7.2.3
Improved Response Time using Local Partitioned CONTEXT Index
7.2.3.1
Range Search on Partition Key Column
7.2.3.2
ORDER BY Partition Key Column
7.2.4
Improved Response Time with Local Partitioned Index for Order by Score
7.3
Optimizing Queries for Throughput
7.3.1
CHOOSE and ALL ROWS Modes
7.3.2
FIRST_ROWS Mode
7.4
Composite Domain Index (CDI) in Oracle Text
7.4.1
Performance Tuning with CDI
7.5
Solving Index and Query Bottlenecks Using Tracing
7.6
Using Parallel Queries
7.6.1
Parallel Queries on a Local Context Index
7.6.2
Parallelizing Queries Across Oracle RAC Nodes
7.7
Tuning Queries with Blocking Operations
7.8
Frequently Asked Questions About Query Performance
7.8.1
What is
Query Performance
?
7.8.2
What is the fastest type of text query?
7.8.3
Should I collect statistics on my tables?
7.8.4
How does the size of my data affect queries?
7.8.5
How does the format of my data affect queries?
7.8.6
What is a
functional
versus an
indexed
lookup?
7.8.7
What tables are involved in queries?
7.8.8
Does sorting the results slow a text-only query?
7.8.9
How do I make an ORDER BY score query faster?
7.8.10
Which Memory Settings Affect Querying?
7.8.11
Does out of line LOB storage of wide base table columns improve performance?
7.8.12
How can I make a CONTAINS query on more than one column faster?
7.8.13
Is it OK to have many expansions in a query?
7.8.14
How can local partition indexes help?
7.8.15
Should I query in parallel?
7.8.16
Should I index themes?
7.8.17
When should I use a CTXCAT index?
7.8.18
When is a CTXCAT index NOT suitable?
7.8.19
What optimizer hints are available, and what do they do?
7.9
Frequently Asked Questions About Indexing Performance
7.9.1
How long should indexing take?
7.9.2
Which index memory settings should I use?
7.9.3
How much disk overhead will indexing require?
7.9.4
How does the format of my data affect indexing?
7.9.5
Can parallel indexing improve performance?
7.9.6
How can I improve index performance for creating local partitioned index?
7.9.7
How can I tell how much indexing has completed?
7.10
Frequently Asked Questions About Updating the Index
7.10.1
How often should I index new or updated records?
7.10.2
How can I tell when my indexes are getting fragmented?
7.10.3
Does memory allocation affect index synchronization?
8
Searching Document Sections in Oracle Text
8.1
About Oracle Text Document Section Searching
8.1.1
Enabling Oracle Text Section Searching
8.1.1.1
Create a Section Group
8.1.1.2
Define Your Sections
8.1.1.3
Index Your Documents
8.1.1.4
Section Searching with the WITHIN Operator
8.1.1.5
Path Searching with INPATH and HASPATH Operators
8.1.2
Oracle Text Section Types
8.1.2.1
Zone Section
8.1.2.2
Field Section
8.1.2.3
Stop Section
8.1.2.4
MDATA Section
8.1.2.5
SDATA Section
8.1.2.6
Attribute Section
8.1.2.7
Special Sections
8.2
HTML Section Searching with Oracle Text
8.2.1
Creating HTML Sections
8.2.2
Searching HTML Meta Tags
8.2.2.1
Example: Creating Sections for
<META>
Tags
8.3
XML Section Searching with Oracle Text
8.3.1
Automatic Sectioning
8.3.2
Attribute Searching
8.3.2.1
Creating Attribute Sections
8.3.2.2
Searching Attributes with the INPATH Operator
8.3.3
Creating Document Type Sensitive Sections
8.3.4
Path Section Searching
8.3.4.1
Creating an Index with PATH_SECTION_GROUP
8.3.4.2
Top-Level Tag Searching
8.3.4.3
Any-Level Tag Searching
8.3.4.4
Direct Parentage Searching
8.3.4.5
Tag Value Testing
8.3.4.6
Attribute Searching
8.3.4.7
Attribute Value Testing
8.3.4.8
Path Testing
8.3.4.9
Section Equality Testing with HASPATH
9
Working With a Thesaurus in Oracle Text
9.1
Overview of Oracle Text Thesaurus Features
9.1.1
Oracle Text Thesaurus Creation and Maintenance
9.1.1.1
CTX_THES Package
9.1.1.2
Thesaurus Operators
9.1.1.3
ctxload Utility
9.1.2
Using a Case-sensitive Thesaurus
9.1.3
Using a Case-insensitive Thesaurus
9.1.4
Default Thesaurus
9.1.5
Supplied Thesaurus
9.1.5.1
Supplied Thesaurus Structure and Content
9.1.5.2
Supplied Thesaurus Location
9.2
Defining Terms in a Thesaurus
9.2.1
Defining Synonyms
9.2.2
Defining Hierarchical Relations
9.3
Using a Thesaurus in a Query Application
9.3.1
Loading a Custom Thesaurus and Issuing Thesaurus-based Queries
9.3.1.1
Advantage
9.3.1.2
Limitations
9.3.2
Augmenting Knowledge Base with Custom Thesaurus
9.3.2.1
Advantage
9.3.2.2
Limitations
9.3.2.3
Linking New Terms to Existing Terms
9.3.2.4
Loading a Thesaurus with ctxload
9.3.2.5
Compiling a Loaded Thesaurus
9.4
About the Supplied Knowledge Base
9.4.1
Adding a Language-Specific Knowledge Base
9.4.1.1
Limitations
10
Administering Oracle Text
10.1
Oracle Text Users and Roles
10.1.1
CTXSYS User
10.1.2
CTXAPP Role
10.1.3
Granting Roles and Privileges to Users
10.2
DML Queue
10.3
The CTX_OUTPUT Package
10.4
The CTX_REPORT Package
10.5
Text Manager in Oracle Enterprise Manager
10.5.1
Using Text Manager
10.5.2
Viewing General Information for a Text Index
10.5.3
Checking Text Index Health
10.6
Servers and Indexing
10.7
Database Feature Usage Tracking in Oracle Enterprise Manager
10.7.1
Package Usage Statistics
10.7.2
Index Usage Statistics
10.7.3
SQL Operator Usage Statistics
10.8
Oracle Text on Oracle Real Application Clusters
11
Migrating Oracle Text Applications
11.1
Migrating to Oracle Text 11
g
Release 1 (11.1)
11.2
Migrating to Oracle Text 10
g
Release 2 (10.2)
11.2.1
New Filter (INSO_FILTER versus AUTO_FILTER)
11.2.1.1
Migrating to the AUTO_FILTER Filter Type
A
CONTEXT Query Application
A.1
Web Query Application Overview
A.2
The PSP Web Application
A.2.1
Web Application Prerequisites
A.2.2
Building the Web Application
A.2.3
PSP Sample Code
A.2.3.1
loader.ctl
A.2.3.2
loader.dat
A.2.3.3
search_htmlservices.sql
A.2.3.4
search_html.psp
A.3
The JSP Web Application
A.3.1
Web Application Prerequisites
A.3.2
JSP Sample Code
A.3.2.1
search_html.jsp
B
CATSEARCH Query Application
B.1
CATSEARCH Web Query Application Overview
B.2
The JSP Web Application
B.2.1
Building the JSP Web Application
B.2.2
JSP Sample Code
B.2.2.1
loader.ctl
B.2.2.2
loader.dat
B.2.2.3
catalogSearch.jsp
Glossary
Index