Mining Diverse Patterns

Mining Diverse Patterns

@(Pattern Discovery in Data Mining)

Mining Multi-level Association Rules

The intuition to set hierarchical min_sup: Level-reduced min-support (Items at the lower level are expected to have lower support)

Efficient mining: Shared multi-level mining (Use the lowest min-support to pass down the set of candidates)

Redundancy Filtering at Mining Multi-Level Associations: * Multi-level association mining may generate many redundant rules * Redundancy filtering: Some rules may be redundant due to “ancestor” relationships between items * (Suppose the 2% milk sold is about 14 of milk sold in gallons) 1. milk wheat bread [support = 8%, confidence = 70%] 2. 2% milk wheat bread [support = 2%, confidence = 72%]

A rule is redundant if its support is close to the “expected” value, according to its “ancestor” rule, and it has a similar confidence as its “ancestor”Rule (1) is an ancestor of rule (2), so rule(2) is to prune.

Customized Min-Supports for Different Kinds of Items * We have used the same min-support threshold for all the items or item sets to be mined in each association mining * In reality, some items (e.g., diamond, watch, …) are valuable but less frequent * It is necessary to have customized min-support settings for different kinds of items * One Method: Use group-based “individualized” min-support * E.g., {diamond, watch}: 0.05%; {bread, milk}: 5%; … * How to mine such rules efficiently? * Existing scalable mining algorithms can be easily extended to cover such cases

Mining Multi-dimensional Associations

Single-dimensional rules (e.g., items are all in “product” dimension)

buys(X, “milk”) buys(X, “bread”)

Multi-dimensional rules (i.e., items in 2 dimensions or predicates)

Inter-dimension association rules (no repeated predicates) age(X, “18-25”) buys(X, “coke”)Hybrid-dimension association rules (repeated predicates) age(X, “18-25”) buys(X, “coke”)Attributes can be categorical or numerical Categorical Attributes (e.g., profession, product: no ordering among values): Data cube for inter-dimension associationQuantitative Attributes: Numeric, implicit ordering among values— discretization, clustering, and gradient approachesMining Quantitative Associations

Mining Negative Correlations

Rare Pattern vs. Negative Pattern

Defining Negative Correlated Patterns

Support-based definition

Kulczynski measure-based difinision

Exercise

Mining Compressed Patterns

Given a table of patterns and their supports:

Why mining compressed patterns? Since there are too many scattered patterns but not so meaningful.

We can find that P1 and P2 are similar both in item-sets and support, and so do P1 and P5 with similar item-sets. But how to compressed those similar patterns?

We can also analyze about it that: * Closed patterns * P1, P2, P3, P4, P5(all have no identical supports) * Emphasizes too much on support * There is no compression * Max-patterns * P3: information loss * Desired output (a good balance): * P2, P3, P4

So we can define some compressing method

pattern distance measure

,闽南的花市,一开始是来自漳州百花村,

Mining Diverse Patterns

相关文章:

你感兴趣的文章:

标签云: