CART算法实现之建树篇

function tree = make_tree(patterns, targets,Dlength, split_type, inc_node)%Build a tree recursively%Dlength为样本个数% 判断是否为单节点树if (length(unique(targets)) == 1), %There is only one type of targets, and this generates a warning, so deal with it separately tree.right= []; tree.left= []; tree.Raction = targets(1); tree.Laction = targets(1); tree.Resub= 0; tree.label= targets(1); returnend% Ni始终等于属性个数[Ni, M]= size(patterns);Nt= unique(targets); % 分类情况N= hist(targets, Nt); % N为分类个数统计情况tree.Resub= ( M – max(N) )/ Dlength;tree.label= Nt(N == max(N));% if T都属于同一类别or T中只剩下 一个样本% 结束条件% 分配类别的方法可以用当前节点中出现最多的类别% tree's last level| tree.Resub < 1e-3if ((sum(N < Dlength*inc_node) == length(Nt) – 1) | (M == 1)),%No further splitting is neccessary tree.right= []; tree.left= [];if (length(Nt) ~= 1),% 找到实例数最大的类作为该结点的类标记MLlabel = find(N == max(N));elseMLlabel= 1; end tree.Raction = Nt(MLlabel); tree.Laction = Nt(MLlabel);else %Split the node according to the splitting criterion%分裂标准 deltaI= zeros(1,Ni);% Ni=2 split_point = zeros(1,Ni);for i = 1:Ni,% 'CARTfunctions' 得到Gini最小的划分% 'feval' 调用函数% varargin提供了一种函数可变参数列表机制% split_point(i) = fminbnd('CARTfunctions', min(patterns(i,:)), max(patterns(i,:)), op, patterns, targets, i, split_type);split_point(i) = findminGini( patterns, targets, i );I(i)= feval('CARTfunctions', split_point(i), patterns, targets, i); end% 选择最优特征和最优切分点 [m, dim] = min(I); loc= split_point(dim); %So, the split is to be on dimention 'dim' at location 'loc' indices = 1:M;%M为样本个数 tree.Raction = ['patterns(' num2str(dim) ',indices) ~= ' num2str(loc)]; tree.Laction = ['patterns(' num2str(dim) ',indices) == ' num2str(loc)]; in_right= find(eval(tree.Raction)); in_left= find(eval(tree.Laction));if isempty(in_right) | isempty(in_left)%No possible split foundtree.right= [];tree.left= [];if (length(Nt) ~= 1),MLlabel= find(N == max(N));elseMLlabel= 1;endtree.Raction= Nt(MLlabel);tree.Laction= Nt(MLlabel); else%…It's possible to build new nodes% 递归 targets(in_right)… targets(in_left)tree.right = make_tree(patterns(:,in_right), targets(in_right), Dlength, split_type, inc_node);tree.left = make_tree(patterns(:,in_left), targets(in_left), Dlength, split_type, inc_node);endendfunction index = findminGini( patterns, targets, dim)%UNTITLED Summary of this function goes here% Detailed explanation goes here% Fin minimum Gini Nt= unique(patterns(dim,:)); % 测试属性的每个取值for i = 1:length(Nt)Gini(i) = CARTfunctions(Nt(i), patterns, targets, dim);end% 取最小基尼系数,,返回indexD= find(Gini == min(Gini));index = Nt(D(1));endfunction delta = CARTfunctions(split_point, patterns, targets, dim)%Calculate the difference in impurity for the CART algorithmUc= unique(targets);% D1,D2for i = 1:length(Uc), in= find(patterns(dim,:) == split_point); out= find(patterns(dim,:) ~= split_point); if numel(in) == 0,Pl(i) = 0 ; elsePl(i) = length(find(targets(in) == Uc(i)))/length(in); endif numel(out) == 0,Pr(i) = 0 ; elsePr(i) = length(find(targets(out) == Uc(i)))/length(out); end end%Gini=1-sum(pi(n)^2)Er= 1 – sum(Pr.^2);El= 1 – sum(Pl.^2); P= length(find(patterns(dim, 🙂 == split_point)) / length(targets);delta = P*El + (1-P)*Er;

每一发奋努力的背后,必有加倍的赏赐。

CART算法实现之建树篇

相关文章:

你感兴趣的文章:

标签云: