Creating a Model that Includes Text Mining
Learn how to create a model that includes text mining.
Oracle Data Mining supports unstructured text within columns of VARCHAR2, CHAR, CLOB, BLOB, and BFILE, as described in the following table:
                  
Table 7-2 Column Data Types That May Contain Unstructured Text
| Data Type | Description | 
|---|---|
| 
                                  
  | 
                              
                                  Oracle Data Mining interprets   | 
                           
| 
                                  
  | 
                              
                                  Oracle Data Mining interprets   | 
                           
| 
                                  
  | 
                              
                                  Oracle Data Mining interprets   | 
                           
| 
                                  
  | 
                              
                                  Oracle Data Mining interprets  Oracle Data Mining interprets   | 
                           
The settings described in the following table control the term extraction process for text attributes in a model. Instructions for specifying model settings are in "Specifying Model Settings".
Table 7-3 Model Settings for Text
| Setting Name | Data Type | Setting Value | Description | 
|---|---|---|---|
| 
                                  
  | 
                              
                                  
  | 
                              
                                  Name of an Oracle Text policy object created with   | 
                              
                                  Affects how individual tokens are extracted from unstructured text. See "Creating a Text Policy".  | 
                           
| 
                                  
  | 
                              
                                  
  | 
                              
                                  1 <= value <= 100000  | 
                              
                                  Maximum number of features to use from the document set (across all documents of each text column) passed to  Default is 3000.  | 
                           
A model can include one or more text attributes. A model with text attributes can also include categorical and numerical attributes.
To create a model that includes text attributes:
- 
                        
Create an Oracle Text policy object..
 - 
                        
Specify the model configuration settings that are described in "Table 7-3".
 - 
                        
Specify which columns must be treated as text and, optionally, provide text transformation instructions for individual attributes.
 - 
                        
Pass the model settings and text transformation instructions to
DBMS_DATA_MINING.CREATE_MODEL.Note:
All algorithms except O-Cluster can support columns of unstructured text.
The use of unstructured text is not recommended for association rules (Apriori).