《textanalytics》课程简单总结（1）：两种word relations

3、挖掘Syntagmatic（组合）关系

问题定义：

解决该问题的关键是：the more random Xw is， the more difficult the prediction would be。

Entropy H(X) measures randomness of X：

High entropy，high randomness，，harder to predict。

Conditional Entropy

Conditional Entropy for Mining Syntagmatic Relations of one word：For each word W1– For every other word W2, compute conditional entropy H(XW1|XW2)– Sort all the candidate words in ascending order of H(XW1|XW2)– Take the top-ranked candidate words as words that have potentialsyntagmatic relations with W1

使用条件熵有个问题：while H(XW1|XW2) and H(XW1|XW3) are comparable,H(XW1|XW2) and H(XW3|XW2) aren’t!（仅仅能挖掘对于W1而言，最常和他一起出现的词有哪些，而不能挖掘整个语料库中哪些词对<不一定有W1>最常出现。）

Mutual Information I(X;Y): Measure EntropyReduction，mine the strongest K syntagmatic relations from a collection：

就是因为MI具有symmetric性：

Summary of Syntagmatic Relation Discovery ： Syntagmatic relation can be discovered by measuringcorrelations between occurrences of two words. Three concepts from Information Theory: – Entropy H(X): measures the uncertainty of a random variable X – Conditional entropy H(X|Y): entropy of X given we know Y– Mutual information I(X;Y): entropy reduction of X (or Y) due toknowing Y (or X) Mutual information provides a principled way for discoveringsyntagmatic relations

少一点预设的期待，那份对人的关怀会更自在

相关文章：

你感兴趣的文章：

标签云：