Data Mining Lab : Exercises on the WEKA tool Case Study 5

Case Study 5

Exercises on the WEKA tool

1. Launch the WEKA tool, and activate the Explorer environment.

2. Open the “weather.nominal” dataset

- How many instances (examples) contained in the dataset?

- How many attributes used to represent the instances?

- Which attribute is the class label?

- What is the data type (e.g., numeric, nominal, etc.) of the attributes in the dataset?

- For each attribute and for each of its possible values, how many instances in each class have the attribute value (i.e., the class distribution of the attribute values)?

3. Go to the Classify tab. Select the ZeroR classifier. Choose the “Cross-validation” (10 folds) test mode. Run the classifier and observe the results shown in the “Classifier output” window.

- How many instances are incorrectly classified?

- What is the MAE (mean absolute error) made by the classifier?

- What can you infer from the information shown in the Confusion Matrix?

- Visualize the classifier errors. In the plot, how can you differentiate between the correctly and incorrectly classified instances? In the plot, how can you see the detailed information of an incorrectly classified instance?

- How can you save the learned classifier to a file?

- How can you load a learned classifier from a file?

4. Choose the “Percentage split” (66% for training) test mode. Run the ZeroR classifier and observe the results shown in the “Classifier output” window.

- How many instances are incorrectly classified? Why this number is smaller than that observed in the previous experiment (i.e., using the cross-validation test mode)?

- What is the MAE made by the classifier?

- Visualize the classifier errors to see the detailed information.

5. Now, select the Id3 classifier (i.e., you can find this classifier in the weka.classifiers.trees group). Choose the “Cross-validation” (10 folds) test mode. Run the Id3 classifier and observe the results shown in the “Classifier output” window.

- How many instances are incorrectly classified?

- What is the MAE made by the classifier?

- Visualize the classifier errors.

- Compare these results with those observed for the ZeroR classifier in the cross-validation test mode. Which classifier, ZeroR or Id3, shows a better prediction performance for the current dataset and the cross-validation test mode?

6. Choose the “Percentage split” (66% for training) test mode. Run the Id3 classifier and observe the results shown in the “Classifier output” window.

- How many instances are incorrectly classified?

- What is the MAE made by the classifier?

- Visualize the classifier errors.

- Compare the results made by the Id3 classifier for the two considered test modes. In which test mode, does the classifier produces a better result (i.e., a smaller error)?

- Which classifier, ZeroR or Id3, shows a better prediction performance for the current dataset and the splitting test mode?

Exercises on the probabilistic models

• Let’s assume we have the following data set that recorded (i.e., in a period of 25 days)

whether or not a person played tennis depending on the outlook and wind conditions.

• Each instance (example) is represented by the three attributes.

o Outlook: a value of {Sunny, Overcast, Rain}.

o Wind: a value of {Weak, Strong}.

o PlayTennis: the classification attribute (i.e., Yes- the person plays tennis; No- the

person does not play tennis).

Date Outlook Wind PlayTennis

1 Sunny Weak No

2 Sunny Strong No

3 Overcast Weak Yes

4 Rain Weak Yes

5 Rain Weak Yes

6 Rain Strong No

7 Overcast Strong Yes

8 Sunny Weak No

9 Sunny Weak Yes

10 Rain Weak Yes

11 Sunny Strong Yes

12 Overcast Strong Yes

13 Overcast Weak Yes

14 Rain Strong No

15 Sunny Strong Yes

16 Overcast Strong No

17 Overcast Weak Yes

18 Rain Weak No

19 Sunny Weak No

20 Rain Strong Yes

21 Sunny Weak Yes

22 Overcast Weak No

23 Rain Weak Yes

24 Sunny Strong Yes

25 Overcast Weak No

• We want to predict if the person will play tennis in the three future days.

o Day 26: (Outlook=Sunny, Wind=Strong) → PlayTennis=?

o Day 27: (Outlook=Overcast, Wind=Weak) → PlayTennis=?

o Day 28: (Outlook=Rain, Wind=Weak) → PlayTennis=?