3、挖掘Syntagmatic(组合)关系
问题定义:
解决该问题的关键是:the more random Xw is, the more difficult the prediction would be。
Entropy H(X) measures randomness of X:
High entropy,high randomness,,harder to predict。
Conditional Entropy
Conditional Entropy for Mining Syntagmatic Relations of one word:For each word W1– For every other word W2, compute conditional entropy H(XW1|XW2)– Sort all the candidate words in ascending order of H(XW1|XW2)– Take the top-ranked candidate words as words that have potentialsyntagmatic relations with W1
使用条件熵有个问题:while H(XW1|XW2) and H(XW1|XW3) are comparable,H(XW1|XW2) and H(XW3|XW2) aren’t!(仅仅能挖掘对于W1而言,最常和他一起出现的词有哪些,而不能挖掘整个语料库中哪些词对<不一定有W1>最常出现。)
Mutual Information I(X;Y): Measure EntropyReduction,mine the strongest K syntagmatic relations from a collection:
就是因为MI具有symmetric性:
Summary of Syntagmatic Relation Discovery : Syntagmatic relation can be discovered by measuringcorrelations between occurrences of two words. Three concepts from Information Theory: – Entropy H(X): measures the uncertainty of a random variable X – Conditional entropy H(X|Y): entropy of X given we know Y– Mutual information I(X;Y): entropy reduction of X (or Y) due toknowing Y (or X) Mutual information provides a principled way for discoveringsyntagmatic relations
版权声明:本文为博主原创文章,未经博主允许不得转载。
少一点预设的期待,那份对人的关怀会更自在