Skip Headers
Oracle® Data Mining Concepts
11g Release 1 (11.1)

Part Number B28129-03
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

10 Apriori

This chapter describes Apriori, the algorithm used by Oracle Data Mining for calculating association rules.

See Also:

Chapter 8, "Association"

This chapter contains the following topics:

About Apriori

An association mining problem can be decomposed into two subproblems:

A rule consists of an antecedent and a consequent. The antecedent describes a condition. The consequent describes the result implied by the condition. For example, in the rule "ABC implies D," ABC is the antecedent, and D is the consequent. Oracle Data Mining association supports single consequent rules only.

The Apriori algorithm works by iteratively enumerating itemsets of increasing lengths subject to the minimum support threshold.

You can use model settings to specify the maximum length and the minimum support and confidence for rules. These settings apply to the association mining function. See Chapter 8, "Association".

Association rule mining is not recommended for finding associations involving rare events in problem domains with a large number of items. Classification models may be more suitable in such problem domains.

Apriori discovers patterns with frequency above the minimum support threshold. Therefore, in order to find associations involving rare events, the algorithm must run with very low minimum support values. However, doing so could potentially explode the number of enumerated itemsets, especially in cases with a large number of items. This could increase the execution time significantly.

Data for Association Rules

Association models are designed to use transactional data. Nulls in transactional data are assumed to represent values that are known but not present in the transaction. For example, three items out of hundreds of possible items might be purchased in a single transaction. The items that were not purchased are known but not present in the transaction.

Transactional data, by its nature, is sparse. Only a small fraction of the attributes are nonzero or non-null in any given row. Apriori interprets all null values as indications of sparsity.

Examples of sparse data include market basket and text mining data. In a market basket problem, there might be 1,000 products in the company's catalog, and the average size of a basket (the collection of items that a customer purchases in a typical transaction) might be 20 products. In this example, a transaction (case or record) has on average 20 out of 1000 attributes that are not null. This implies that the fraction of nonzero attributes in the table (or the density) is 20/1000, or 2%. This density is typical for market basket and text processing problems. Data that has a significantly higher density can require extremely large amounts of temporary space to build associations.

See Also:

"Transactions"

Equi-width binning is not recommended for association models. When Apriori uses equi-width binning, outliers cause most of the data to concentrate in a few bins, sometimes a single bin. As a result, the discriminating power of the algorithm can be significantly reduced.

Note:

Apriori is not affected by Automatic Data Preparation.

See Also:

Chapter 19, "Automatic and Embedded Data Preparation"

Oracle Data Mining Application Developer's Guide for information about transactional data, nested columns, and missing value treatment

Chapter 20, "Text Mining"