Mining Diverse Patterns
@(Pattern Discovery in Data Mining)
Mining Multi-level Association Rules
The intuition to set hierarchical min_sup: Level-reduced min-support (Items at the lower level are expected to have lower support)
Efficient mining: Shared multi-level mining (Use the lowest min-support to pass down the set of candidates)
Redundancy Filtering at Mining Multi-Level Associations: * Multi-level association mining may generate many redundant rules * Redundancy filtering: Some rules may be redundant due to “ancestor” relationships between items * (Suppose the 2% milk sold is about 14 of milk sold in gallons) 1. milk wheat bread [support = 8%, confidence = 70%] 2. 2% milk wheat bread [support = 2%, confidence = 72%]
A rule is redundant if its support is close to the “expected” value, according to its “ancestor” rule, and it has a similar confidence as its “ancestor”Rule (1) is an ancestor of rule (2), so rule(2) is to prune.
Customized Min-Supports for Different Kinds of Items * We have used the same min-support threshold for all the items or item sets to be mined in each association mining * In reality, some items (e.g., diamond, watch, …) are valuable but less frequent * It is necessary to have customized min-support settings for different kinds of items * One Method: Use group-based “individualized” min-support * E.g., {diamond, watch}: 0.05%; {bread, milk}: 5%; … * How to mine such rules efficiently? * Existing scalable mining algorithms can be easily extended to cover such cases
Mining Multi-dimensional Associations
Single-dimensional rules (e.g., items are all in “product” dimension)
buys(X, “milk”) buys(X, “bread”)
Multi-dimensional rules (i.e., items in 2 dimensions or predicates)
Inter-dimension association rules (no repeated predicates) age(X, “18-25”) buys(X, “coke”)Hybrid-dimension association rules (repeated predicates) age(X, “18-25”) buys(X, “coke”)Attributes can be categorical or numerical Categorical Attributes (e.g., profession, product: no ordering among values): Data cube for inter-dimension associationQuantitative Attributes: Numeric, implicit ordering among values— discretization, clustering, and gradient approachesMining Quantitative Associations
Mining Negative Correlations
Rare Pattern vs. Negative Pattern
Defining Negative Correlated Patterns
Support-based definition
Kulczynski measure-based difinision
Exercise
Mining Compressed Patterns
Given a table of patterns and their supports:
Why mining compressed patterns? Since there are too many scattered patterns but not so meaningful.
We can find that P1 and P2 are similar both in item-sets and support, and so do P1 and P5 with similar item-sets. But how to compressed those similar patterns?
We can also analyze about it that: * Closed patterns * P1, P2, P3, P4, P5(all have no identical supports) * Emphasizes too much on support * There is no compression * Max-patterns * P3: information loss * Desired output (a good balance): * P2, P3, P4
So we can define some compressing method
pattern distance measure
,闽南的花市,一开始是来自漳州百花村,