Data Mining Lab : Interpreting Weka Output

Interpreting Weka Output

Below is the output from Weka when using the weka.classifiers.trees.J48 classifier with the file $WEKAHOME/data/iris.arff as a training file and no testing file. I.e. using the command:

java weka.classifiers.trees.J48 -t $WEKAHOME/data/iris.arff

In square brackets ([,]) there are comments on how to interpret the output.

J48 pruned tree

------------------

petalwidth <= 0.6: Iris-setosa (50.0)

petalwidth > 0.6

| petalwidth <= 1.7

| | petallength <= 4.9: Iris-versicolor (48.0/1.0)

| | petallength > 4.9

| | | petalwidth <= 1.5: Iris-virginica (3.0)

| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)

| petalwidth > 1.7: Iris-virginica (46.0/1.0)

Number of Leaves : 5

Size of the tree : 9

[ Above is the decision tree constructed by the J48 classifier. This indicates how the classifier uses the attributes to make a decision. The leaf nodes indicate which class an instance will be assigned to should that node be reached. The numbers in brackets after

the leaf nodes indicate the number of instances assigned to that node, followed by how many of those instances are incorrectly classified as a result. With other classifiers some other output will be given that indicates how the decisions are made, e.g. a rule set. Note that the tree has been pruned. An unpruned tree and be produced by using the "-U" option. ]

Time taken to build model: 0.05 seconds

Time taken to test model on training data: 0.01 seconds

=== Error on training data ===

Correctly Classified Instances 147 98 %

Incorrectly Classified Instances 3 2 %

Kappa statistic 0.97

Mean absolute error 0.0233

Root mean squared error 0.108

Relative absolute error 5.2482 %

Root relative squared error 22.9089 %

Total Number of Instances 150

[ This gives the error levels when applying the classifier to the training data it was constructed from. For our purposes the most important figures here are the numbers of correctly and incorrectly classified instances. With the exception of the Kappa statistic, the remaining statistics compute various error measures based on the class probabilities assigned by the tree. ]

=== Confusion Matrix ===

a b c <-- classified as

50 0 0 | a = Iris-setosa

0 49 1 | b = Iris-versicolor

0 2 48 | c = Iris-virginica

[ This shows for each class, how instances from that class received the various classifications. E.g. for class "b", 49 instances were correctly classified but 1 was put into class "c". ]

=== Stratified cross-validation ===

Correctly Classified Instances 144 96 %

Incorrectly Classified Instances 6 4 %

Kappa statistic 0.94

Mean absolute error 0.035

Root mean squared error 0.1586

Relative absolute error 7.8705 %

Root relative squared error 33.6353 %

Total Number of Instances 150

[ This gives the error levels during a 10-fold cross-validation. The "-x" option can be used to specify a different number of folds. The correctly/incorrectly classified instances refers to the case where the instances are used as test data and again are the most important statistics here for our purposes. ]

=== Confusion Matrix ===

a b c <-- classified as

49 1 0 | a = Iris-setosa

0 47 3 | b = Iris-versicolor

0 2 48 | c = Iris-virginica

[ This is the confusion matrix for the 10-fold cross-validation, showing what classification the instances from each class received when it was used as testing data. E.g. for class "a" 49 instances were correctly classified and 1 instance was assigned to class "b". ]