《textanalytics》课程简单总结(1):两种word relations

3、挖掘Syntagmatic(组合)关系

问题定义:

解决该问题的关键是:the more random Xw is, the more difficult the prediction would be。

Entropy H(X) measures randomness of X:

High entropy,high randomness,,harder to predict。

Conditional Entropy

Conditional Entropy for Mining Syntagmatic Relations of one word:For each word W1– For every other word W2, compute conditional entropy H(XW1|XW2)– Sort all the candidate words in ascending order of H(XW1|XW2)– Take the top-ranked candidate words as words that have potentialsyntagmatic relations with W1

使用条件熵有个问题:while H(XW1|XW2) and H(XW1|XW3) are comparable,H(XW1|XW2) and H(XW3|XW2) aren’t!(仅仅能挖掘对于W1而言,最常和他一起出现的词有哪些,而不能挖掘整个语料库中哪些词对<不一定有W1>最常出现。)

Mutual Information I(X;Y): Measure EntropyReduction,mine the strongest K syntagmatic relations from a collection:

就是因为MI具有symmetric性:

Summary of Syntagmatic Relation Discovery : Syntagmatic relation can be discovered by measuringcorrelations between occurrences of two words. Three concepts from Information Theory: – Entropy H(X): measures the uncertainty of a random variable X – Conditional entropy H(X|Y): entropy of X given we know Y– Mutual information I(X;Y): entropy reduction of X (or Y) due toknowing Y (or X) Mutual information provides a principled way for discoveringsyntagmatic relations

版权声明:本文为博主原创文章,未经博主允许不得转载。

少一点预设的期待,那份对人的关怀会更自在

《textanalytics》课程简单总结(1):两种word relations

相关文章:

你感兴趣的文章:

标签云: