Machine Learning in Action 之二 朴素贝叶斯 C#实现文章分类

def trainNB0(trainMatrix,trainCategory):numTrainDocs = len(trainMatrix)numWords = len(trainMatrix[0])pAbusive = sum(trainCategory)/float(numTrainDocs)p0Num = ones(numWords); p1Num = ones(numWords)#change to ones()p0Denom = 2.0; p1Denom = 2.0#change to 2.0for i in range(numTrainDocs):if trainCategory[i] == 1:p1Num += trainMatrix[i]p1Denom += sum(trainMatrix[i])else:p0Num += trainMatrix[i]p0Denom += sum(trainMatrix[i])p1Vect = log(p1Num/p1Denom)#change to log()p0Vect = log(p0Num/p0Denom)#change to log()return p0Vect,p1Vect,pAbusivedef classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):p1 = sum(vec2Classify * p1Vec) + log(pClass1) #element-wise mult *提示一p0 = sum(vec2Classify * p0Vec) + log(1.0 – pClass1)if p1 > p0:return 1else:return 0

*提示一

p(Ci|w)=p(w|Ci)p(Ci)/p(w) 对乘积取自然对数 ln(p(w|Ci)p(Ci))=ln(p(w|Ci))+ln(p(Ci))

在下面例子中,因为每个分类在样本中的比例都一样的,这样不用再加上log(p(Ci))也不会影响最后的分类效果

用C#随便做个例子,实现文章类型的分类 随机词不如有针对性的词来的有效,,所以这里都是从所有三个分类里找到的词汇

1、创建词向量:中超/亚冠/国足/足协/英超/西甲/欧冠/意甲/德甲/篮球/NBA/CBA/高尔夫/乒乓/排球/网球/羽毛球/跑步/赛车/棋牌/台球/游泳/马术/拳击/田径/功夫/扑克/体育/球队/球员/训练/国家队/联赛/俱乐部/场地/翻盘/绝杀/热身/队友/冠军/亚军/季军/犯规/赛季/加时/反超/半场/争夺/战术/阵容/比赛/德比/恢复/进球/失球/奥斯卡/娱乐/影迷/电影/电视/音乐/戏剧/视频/演员/导演/明星/经纪人/歌手/连续剧/展映/粉丝/写真/演技/作秀/节目/艺人/超模/女星/模特/男星/性感/主创/院线/影业/拍摄/编剧/情节/影像/剧情/主演/上映/票房/开机/剧集/表演/收视/预告片/主持人/艾美奖/角色/剧院/乐迷/影迷/演出/专辑/乐坛/剧场/文艺/芭蕾/戏曲/舞蹈/军事/军队/军机/炸弹/军方/坦克/军舰/炸死/军演/战备/部队/军区/国防/士兵/舰船/潜艇/飞机/直升机/舰队/保卫/演习/武器/反击/打击/阅兵/对抗/防卫/海军/空军/陆军/武装/战略/空袭/冲突/装甲/步兵/作战/导弹/边防/侦察/战斗机/雷达/轰炸/防御/据点/火力/航空母舰/进攻/弹药/军营/包围/攻占/俘虏/参战/战友/战斗/入侵

2、搜狐上下载三类文章各10篇组成训练样本,计算出每篇文章的文档矩阵,标注每篇文章的类别标签

样本文件名格式: 编号_类别标签.txt

文档矩阵:

000000000000000000100000000000000000001100010001001010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000100000000000000011110001010000000000000000011000000110000000000100000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000011000000000000000000000001001000001001000000001000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000010010000100000000000000010010000001000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000010000010000010100000000111111111110000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000011010000001000010000000000001100001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000001010000110000000000000000100000001101000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000010000000000000001000000001100000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001010000110000000000000000000001011000010000110000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000011100000001000010110001001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000001001000100000000000000000000000010000100000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110000000111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000001000000000000000000000000100000000000000000000000000100100000010010000000000000000100000000000100000000000010000000000000000000000000000000100000000000000000000000000001000000000000000000000000000000000000000000000000000100010000010000000000000000000100000100000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000110010000000001001010000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000010100000000000100000000010000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000010000000000100000000100010000000000001000000000000000010000000000000000000111001100000000010000001001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000100000000000000100000000110000010000000000000110000000000000000000000000100001000100000010000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000110000000000000000000000000111001100100100010001111011000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000001000000000000000001000000101000100110001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000010000000000000000000000001101000001001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000010000000000000010010001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000111100000101000110100001000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000100000000000110010100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

类别标签向量:

你让我尝到了每时每刻想你的疼苦,

Machine Learning in Action 之二 朴素贝叶斯 C#实现文章分类

相关文章:

  • 【算法】直接插入排序C语言实现
  • 嵌入式 FAAC1.28 在海思HI3518C/HI3518A平台linux中的编译优化
  • Android 动画animation 深入分析
  • Mybatis极其(最)简(好)单(用)的一个分页插件
  • Ext JS Kitchen Sink [Learning by doing](2)ArrayGrid
  • API开发第三篇:PHP的设计模式之完美的单例模式
  • 使用NGUI时遇到物理引擎错误
  • [redis]redis命令汇总(二)
  • 你感兴趣的文章:

    标签云:

    亚洲高清电影在线, 免费高清电影, 八戒影院夜间, 八戒电影最新大片, 出轨在线电影, 午夜电影院, 在线影院a1166, 在线电影院, 在线观看美剧下载, 日本爱情电影, 日韩高清电影在线, 电影天堂网, 直播盒子app, 聚合直播, 高清美剧, 高清美剧在线观看 EhViewer-E站, E站, E站绿色版, qqmulu.com, qq目录网, qq网站目录,