The following questions ask you to compare and contrast two learners:
iris-Virginia via two unless branches; (b)how are mis-classified
examples used in this structure? (c)suggest a pre-pruning method
based on support that would eliminate the right-hand-side rules;
(d)how might incremental
or global pruning be used in the example?

truth
+---+---+
| 0 | 1 |
+---+---+---+
detector | 0 | A | B |
| 1 | C | D |
+---+---+---+
A=40 B=10 C=20 D=30
Define accuracy. Compute it for this example. Define probability of false alarm. Compute it for this example. Define probability of detection. Compute it for this example. Define precision. Compute it for this example.
Here is a snip from nbc.awk that is relevant to the first three questions
function classify( i,temp,what,like,c) {
like = -100000; # smaller than any log
for(c in Classes) {
temp=log(Classes[c]/Total); #uses logs to stop numeric errors
for(i=1;i<NF;i++) {
if ( $i=="?" ) continue;
temp += log((Freq[c,i,$i]+1)/(Classes[c]+Attributes[i]));
};
if ( temp >= like ) {like = temp; what=c}
};
return what;
}
End of NBC.awk questions
Current rule:
If astigmatics = yes and tear production rate = normal then recommendation = hard
Instances covered by the current rule:
tear
prod.
lens rate Astigmatism prescription age
==== ====== ============ ============ =============
None Normal Yes Hypermetrope Pre-presbyopic
Hard Normal Yes Myope Presbyopic
None Normal Yes Hypermetrope Presbyopic
Hard Normal Yes Myope Pre-presbyopic
Hard Normal Yes Hypermetrope Young
Hard Normal Yes Myope Young
outlook,temperature,humidity,windy,playing
1 Sunny 85 86 False None
2 Sunny 80 90 True None
3 Sunny 72 95 False None
4 Rain 65 70 True None
5 Rain 71 96 True None
6 Rain 70 96 False Some
7 Rain 68 80 False Some
8 Rain 75 80 False Some
9 Sunny 69 70 False Lots
10 Sunny 75 70 True Lots
11 Overcast 83 88 False Lots
12 Overcast 64 65 True Lots
13 Overcast 72 90 True Lots
14 Overcast 81 75 False Lots
1. if outlook=sunny then polo,polo,polo,polo,polo,polo,polo,polo,nothing,nothing, nothing,nothing 2. if outlook=overcast then tennis,tennis, tennis, tennis 3. if outlook=raining then golf,golf,polo,polo
Your dialogue might be able to use these formulae:
N=a+b+...z
info([a,b,..z]) = [-a*log(a) - b*log(b) - .... -z*log(z) + N*log(N)]/N
BEGIN {
Command line arguments (none).
Internal globals:
Total=0 # count of all instances
# Classes # table of class names/frequencies
# Freg # table of counters for values in attributes in classes
# Seen # table of counters for values in attributes
# Attributes # table of number of values per attribute
}
Pass==1 {train()}
Pass==2 {print $NF "|" classify()}
function train( i,c) {
Total++;
c=$NF;
Classes[c]++;
for(i=1;i<=NF;i++) {
if ($i=="?") continue;
Freq[c,i,$i]++
if (++Seen[i,$i]==1) Attributes[i]++}
}
function classify( i,temp,what,like,c) {
like = -100000; # smaller than any log
for(c in Classes) {
temp=log(Classes[c]/Total); #uses logs to stop numeric errors
for(i=1;i<NF;i++) {
if ( $i=="?" ) continue;
temp += log((Freq[c,i,$i]+1)/(Classes[c]+Attributes[i]));
};
if ( temp >= like ) {like = temp; what=c}
};
return what;
}
1 2 3 4 5 6 7 8 9 10 11 12 male,male,male,male,male,male,male,male,male,female,female,female
Assuming a 4-way cross-val, what are the numbers of the instances that would go into the sub-samples assuming (i)stratified cross-val and (ii)non-stratified cross-validation?
forget() function do?;
(d) What are the assume types are restraints files?
Also, (e) a data miner has learnt that some range is critical for improving
the importance of some simulation. How could assume be made
aware that those ranges are important?
Here's some awk code (file=wk4g2.awk)
/<pre>/,/<\/pre>/ {
sub(/\#.*/,"<font class=comment>&</font>");
}
{print $0}
Here's some shell script:
gawk -f wk4g2.awk wk4g2.html> wk4g2a.html; chmod a+r wk4g2a.html
Here's the header of wk4g2.awk:
<html> <head> <LINK REL="stylesheet" HREF="wk4g2.css" TYPE="text/css"> </head> <body> <h2>learners.sh</h2> <pre> #!/usr/bin/bash # CS510, Data Mining, Week 4 Report # Learners.sh, a program to calculate and print the accuracy of several learners . config # comma-delimited output. sep="," ...
Here's a CSS file:
body {background-color: #FFFFFF; }
pre {background-color: #FFFFDD;
color: #0000FF;
border: .05em solid #000000;
}
font.comment{color:#660000}
function any(max, n) {return int(max*rand())+1}
function size(group) {return Members[group,0]}
function who(group,n) {return Members[group,n]}
NR==1 {next}
{Members[$2,++Members[$2,0]]=$1}
{Groups[$2]}
END {
for(Group in Groups)
print Group " " who(Group,any(size(Group)))}
Here's some bash script:
. config cat<<-EOF | $gawk -f select.awk fname group george 1 john 1 thomas 1 andrew 1 martin 2 william 2 zachary 2 .. EOF
What does all this do? Why would students hate this code?
tear recommended
Age Spectacle astigmatism rate lenses
===================================================================
young myope no reduced none
young myope yes normal soft
young myope yes normal hard
young hypermetrope yes reduced none
Pre-presbyopic myope no reduced none
Pre-presbyopic myope no normal soft
Pre-presbyopic hypermetrope yes reduced none
presbyopic myope no reduced none
presbyopic hypermetrope yes normal none
presbyopic hypermetrope no normal soft
Show all your working. Leave fractions as fractions.
Make Size Convertible Type ---- ---- ----------- ---- Mitsubishi small yes coup Mitsubishi medium no suv Toyota small yes coup Toyota large no coup Toyota large no suv Benz small yes coup Benz large no suv BMW small yes coup BMW medium yes coup Ford small yes coup Ford large no suv Honda small no coup
and we see a new example:
Make Size Convertible Type ---- ------ ----------- ---- Ford medium no ?
Calculate the following for the new example, given the database of old examples:
A=Likelihood of SUV: B=Likelihood of Coup: C=Probability of SUV: D=Probability of coup:
Show all your working. Leave fractions as fractions.
Tim Menzies ,
tim@menzies.us,
http://menzies.us
This page generated by Site:
see http://www.cs.pdx.edu/~timm/dm/site.html
This site is built using PerlPod.Style sheet switching method taken from Eddie Traversa's excellent and simple-to-apply tutorial: http://dhtmlnirvana.com/content/styleswitch/styleswitch1.html.
Search engine powered by ATOMZ http://www.atomz.com/search/. Note, the indexes to this site are only updated weekly (heh, its a free service- what more ja want?).
Icons on this site come from http://www.sql-news.de/rubriken/olap.asp and http://www.ifnet.it/webif/centrodi/eng/toolbar.htm.
The JAVA machine learners used at this site come from the extensive data mining libraries found in the University of Waikato's Environment for Knowledge Analysis (the WEKA) http://www.cs.waikato.ac.nz/ml/weka/
Copyright (C) Tim Menzies 2004
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; see http://www.gnu.org/copyleft/gpl.html. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
The content from or through this web page are provided 'as is' and the author makes no warranties or representations regarding the accuracy or completeness of the information. Your use of this web page and information is at your own risk. You assume full responsibility and risk of loss resulting from the use of this web page or information. If your use of materials from this page results in the need for servicing, repair or correction of equipment, you assume any costs thereof. Follow all external links at your own risk and liability.