Case
Study 5
Exercises on the WEKA tool
1. Launch the WEKA tool, and activate the Explorer environment.
2. Open the “weather.nominal” dataset
- How many instances (examples) contained in the
dataset?
- How many attributes used to represent the instances?
- Which attribute is the class label?
- What is the data type (e.g., numeric, nominal, etc.)
of the attributes in the dataset?
- For each attribute and for each of its possible
values, how many instances in each class have the attribute value (i.e., the
class distribution of the attribute values)?
3. Go to the Classify tab.
Select the ZeroR
classifier. Choose the “Cross-validation”
(10 folds) test mode. Run the classifier and observe the results shown in the
“Classifier output” window.
- How many instances are incorrectly classified?
- What is the MAE (mean absolute error) made by the
classifier?
- What can you infer from the information shown in the
Confusion Matrix?
- Visualize the classifier errors. In the plot, how
can you differentiate between the correctly and incorrectly classified
instances? In the plot, how can you see the detailed information of an
incorrectly classified instance?
- How can you save the learned classifier to a file?
- How can you load a learned classifier from a file?
4. Choose the “Percentage split” (66% for training)
test mode. Run the ZeroR
classifier and observe the results
shown in the “Classifier output” window.
- How many instances are incorrectly classified? Why
this number is smaller than that observed in the previous experiment (i.e.,
using the cross-validation test mode)?
- What is the MAE made by the classifier?
- Visualize the classifier errors to see the detailed
information.
5. Now, select the Id3 classifier
(i.e., you can find this classifier in the weka.classifiers.trees group).
Choose the “Cross-validation” (10 folds) test mode. Run the Id3 classifier and observe the results shown in the
“Classifier output” window.
- How many instances are incorrectly classified?
- What is the MAE made by the classifier?
- Visualize the classifier errors.
- Compare these results with those observed for the ZeroR classifier in the cross-validation test mode. Which
classifier, ZeroR
or Id3,
shows a better prediction performance for the current dataset and the
cross-validation test mode?
6. Choose the “Percentage split” (66% for training)
test mode. Run the Id3
classifier and observe the results
shown in the “Classifier output” window.
- How many instances are incorrectly classified?
- What is the MAE made by the classifier?
- Visualize the classifier errors.
- Compare the results made by the Id3 classifier for the two considered test modes. In which
test mode, does the classifier produces a better result (i.e., a smaller
error)?
- Which classifier, ZeroR or
Id3, shows a better prediction performance for the
current dataset and the splitting test mode?
Exercises on the probabilistic models
•
Let’s assume we have the following data set that recorded (i.e., in a period of
25 days)
whether
or not a person played tennis depending on the outlook and wind conditions.
•
Each instance (example) is represented by the three attributes.
o Outlook:
a value of {Sunny, Overcast, Rain}.
o
Wind: a value of {Weak, Strong}.
o
PlayTennis: the classification attribute (i.e., Yes- the person plays tennis;
No- the
person
does not play tennis).
Date
Outlook Wind PlayTennis
1
Sunny Weak No
2
Sunny Strong No
3
Overcast Weak Yes
4
Rain Weak Yes
5
Rain Weak Yes
6
Rain Strong No
7
Overcast Strong Yes
8
Sunny Weak No
9
Sunny Weak Yes
10
Rain Weak Yes
11
Sunny Strong Yes
12
Overcast Strong Yes
13
Overcast Weak Yes
14
Rain Strong No
15
Sunny Strong Yes
16
Overcast Strong No
17
Overcast Weak Yes
18
Rain Weak No
19
Sunny Weak No
20
Rain Strong Yes
21
Sunny Weak Yes
22
Overcast Weak No
23
Rain Weak Yes
24
Sunny Strong Yes
25
Overcast Weak No
• We
want to predict if the person will play tennis in the three future days.
o Day
26: (Outlook=Sunny, Wind=Strong) → PlayTennis=?
o Day
27: (Outlook=Overcast, Wind=Weak) → PlayTennis=?
o Day
28: (Outlook=Rain, Wind=Weak) → PlayTennis=?