ETC3250: Tutorial 5 Help Sheet

Question 1: Logistic Regression

Part A

Fit a logistic regression model to the training set.

(No hint, the code is in the tutorial)

Part B

Compute the confusion matrices for training and test sets, and thus the error for the test set. You can use this code to make the predictions.

Hint: Where to go for information

You have computed a confusion matrix multiple times before. Check lecture 4 slide 15 for an example of using a model to get the confusion matrix for the logistic model.

Part C

This question is an exploratory question so there isnt really a right or wrong question. Do think about it though.

Question 2: LDA

Part A

Is the assumption of equal variance-covariance reasonable to make for this data?

Hint: Where to go for information

Check lecture 4 slide 18 and slides 40 to 43.

Part B

Fit the LDA model to the training data

(No hint, the code is provided)

Part C

Compute the confusion matrices for training and test sets, and thus the error for the test set.

Hint: Where to go for information

You have computed a confusion matrix multiple times before. Check lecture 4 slide 15 for an example of using a model to get the confusion matrix for the logistic model. Think about how to get the predicted class for a LDA model.

Hint: Extra code hint

You should use the predict(lda_fit$fit, p_tr)$class should give you the predicted class for an LDA model. You will need to do this twice, once on your training and once on your test set.

Part D

Use boxplots and a tour to examine your fitted LDA model and see how it differs from your Logistic regression model. You can use this code to make the predictions.

Hint: Where to go for information

Check lecture 4 slides 6 to 9 for an understanding on how logistic regression works, check lecture 4 slides 18 to 22 to understand how LDA works. The answer is in the theoretical working of these two models.

Hint: Extra hint

Think about the dimensional of this problem. You are looking at the data in 2D space. What dimension is the model rule drawn in?

Part E

Re-do the plot of the discriminant space, to examine the boundary between groups. You’ll need to generate a set of random points in the domain of the data, predict their class, and projection into the discriminant space. The explore() in the classifly package can help you generate the box of random points.

Hint: Extra code hint

You should use the explore(lda_fit$fit, p_tidy_std) to generate the data and get predictions for those values. Once you have the predictions you should plot them. :::

Part F

What happens to the boundary, if you change the prior probabilities? And why does this happen? Change the prior probabilities to be 1.999/3, 0.001/3, 1/3 for Adelie, Chinstrap, Gentoo, respectively. Re-do the plot of the boundaries in the discriminant space.

Hint: Where to go for information

Check lecture 4 slide 27 for a hint on how to change the prior for probability for the code. Describe what happens think about why.

Question 3: Misclassifications

This question is an exploratory question so there isnt really a right or wrong question.

Question 4: Math

It is hard to explain what is happening without giving away the logic to solve the question. If you want me to go through this question, please let me know and we can go through it as a class in the last 20 mins.