本站首页    管理页面    写新日志    退出


«June 2020»
123456
78910111213
14151617181920
21222324252627
282930


公告

数据仓库&数据挖掘

  对某一件事需要坚持方能真正完成这件事

            薛 峰  

           2009.02.03


我的分类(专题)

日志更新

最新评论

留言板

链接

Blog信息
blog名称:数据仓库与数据挖掘
日志总数:85
评论数量:14
留言数量:0
访问次数:687481
建立时间:2005年3月17日




数据挖掘部分算法的matlab实现 id3
随笔,  读书笔记

薛 峰 发表于 2005-6-27 14:22:13

 function D = ID3(train_features, train_targets, params, region)% Classify using Quinlan´s ID3 algorithm% Inputs:% features - Train features% targets     - Train targets% params - [Number of bins for the data, Percentage of incorrectly assigned samples at a node]% region     - Decision region vector: [-x x -y y number_of_points]%% Outputs% D - Decision sufrace[Ni, M]    = size(train_features);%Get parameters[Nbins, inc_node] = process_params(params);inc_node    = inc_node*M/100;%For the decision regionN           = region(5);mx          = ones(N,1) * linspace (region(1),region(2),N);my          = linspace (region(3),region(4),N)´ * ones(1,N);flatxy      = [mx(:), my(:)]´;%Preprocessing[f, t, UW, m]      = PCA(train_features, train_targets, Ni, region);train_features  = UW * (train_features - m*ones(1,M));;flatxy          = UW * (flatxy - m*ones(1,N^2));;%First, bin the data and the decision region data[H, binned_features]= high_histogram(train_features, Nbins, region);[H, binned_xy]      = high_histogram(flatxy, Nbins, region);%Build the tree recursivelydisp(´Building tree´)tree        = make_tree(binned_features, train_targets, inc_node, Nbins);%Make the decision region according to the treedisp(´Building decision surface using the tree´)targets = use_tree(binned_xy, 1:N^2, tree, Nbins, unique(train_targets));D = reshape(targets,N,N);%ENDfunction targets = use_tree(features, indices, tree, Nbins, Uc)%Classify recursively using a treetargets = zeros(1, size(features,2));if (size(features,1) == 1),    %Only one dimension left, so work on it    for i = 1:Nbins,        in = indices(find(features(indices) == i));        if ~isempty(in),            if isfinite(tree.child(i)),                targets(in) = tree.child(i);            else                %No data was found in the training set for this bin, so choose it randomally                n           = 1 + floor(rand(1)*length(Uc));                targets(in) = Uc(n);            end        end    end    breakend        %This is not the last level of the tree, so:%First, find the dimension we are to work ondim = tree.split_dim;dims= find(~ismember(1:size(features,1), dim));%And classify according to itfor i = 1:Nbins,    in      = indices(find(features(dim, indices) == i));    targets = targets + use_tree(features(dims, :), in, tree.child(i), Nbins, Uc);end    %END use_tree function tree = make_tree(features, targets, inc_node, Nbins)%Build a tree recursively[Ni, L]     = size(features);Uc          = unique(targets);%When to stop: If the dimension is one or the number of examples is smallif ((Ni == 1) | (inc_node > L)),    %Compute the children non-recursively    for i = 1:Nbins,        tree.split_dim  = 0;        indices         = find(features == i);        if ~isempty(indices),            if (length(unique(targets(indices))) == 1),                tree.child(i) = targets(indices(1));            else                H               = hist(targets(indices), Uc);                [m, T]          = max(H);                tree.child(i)   = Uc(T);            end        else            tree.child(i)   = inf;        end    end    breakend%Compute the node´s Ifor i = 1:Ni,    Pnode(i) = length(find(targets == Uc(i))) / L;endInode = -sum(Pnode.*log(Pnode)/log(2));%For each dimension, compute the gain ratio impuritydelta_Ib    = zeros(1, Ni);P           = zeros(length(Uc), Nbins);for i = 1:Ni,    for j = 1:length(Uc),        for k = 1:Nbins,            indices = find((targets == Uc(j)) & (features(i,:) == k));            P(j,k)  = length(indices);        end    end    Pk          = sum(P);    P           = P/L;    Pk          = Pk/sum(Pk);    info        = sum(-P.*log(eps+P)/log(2));    delta_Ib(i) = (Inode-sum(Pk.*info))/-sum(Pk.*log(eps+Pk)/log(2));end%Find the dimension minimizing delta_Ib [m, dim] = max(delta_Ib);%Split along the ´dim´ dimensiontree.split_dim = dim;dims           = find(~ismember(1:Ni, dim));for i = 1:Nbins,    indices       = find(features(dim, :) == i);    tree.child(i) = make_tree(features(dims, indices), targets(indices), inc_node, Nbins);end


阅读全文(9387) | 回复(2) | 编辑 | 精华
 


回复:数据挖掘部分算法的matlab实现 id3
随笔,  读书笔记

109414325(游客)发表评论于2010-4-19 21:12:00

就是啊 参数如何使用。。


个人主页 | 引用回复 | 主人回复 | 返回 | 编辑 | 删除
 


回复:数据挖掘部分算法的matlab实现 id3
随笔,  读书笔记

风之子(游客)发表评论于2009-6-16 21:55:10

请问参数如何调用

个人主页 | 引用回复 | 主人回复 | 返回 | 编辑 | 删除
 


» 1 »

发表评论:
昵称:
密码:
主页:
标题:
验证码:  (不区分大小写,请仔细填写,输错需重写评论内容!)



站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.031 second(s), page refreshed 144413129 times.
《全国人大常委会关于维护互联网安全的决定》  《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号