ETC3250: Tutorial 5 Help Sheet
Question 1: Logistic Regression
Part A
Fit a logistic regression model to the training set.
(No hint, the code is in the tutorial)
Part B
Compute the confusion matrices for training and test sets, and thus the error for the test set. You can use this code to make the predictions.
You have computed a confusion matrix multiple times before. Check lecture 4 slide 15 for an example of using a model to get the confusion matrix for the logistic model.
Part C
This question is an exploratory question so there isnt really a right or wrong question. Do think about it though.
Question 2: LDA
Part A
Is the assumption of equal variance-covariance reasonable to make for this data?
Check lecture 4 slide 18 and slides 40 to 43.
Part B
Fit the LDA model to the training data
(No hint, the code is provided)
Part C
Compute the confusion matrices for training and test sets, and thus the error for the test set.
You have computed a confusion matrix multiple times before. Check lecture 4 slide 15 for an example of using a model to get the confusion matrix for the logistic model. Think about how to get the predicted class for a LDA model.
You should use the predict(lda_fit$fit, p_tr)$class
should give you the predicted class for an LDA model. You will need to do this twice, once on your training and once on your test set.
Part D
Use boxplots and a tour to examine your fitted LDA model and see how it differs from your Logistic regression model. You can use this code to make the predictions.
Check lecture 4 slides 6 to 9 for an understanding on how logistic regression works, check lecture 4 slides 18 to 22 to understand how LDA works. The answer is in the theoretical working of these two models.
Think about the dimensional of this problem. You are looking at the data in 2D space. What dimension is the model rule drawn in?
Part E
Re-do the plot of the discriminant space, to examine the boundary between groups. You’ll need to generate a set of random points in the domain of the data, predict their class, and projection into the discriminant space. The explore()
in the classifly
package can help you generate the box of random points.
Hint: Extra code hint
You should use the explore(lda_fit$fit, p_tidy_std)
to generate the data and get predictions for those values. Once you have the predictions you should plot them. :::
Part F
What happens to the boundary, if you change the prior probabilities? And why does this happen? Change the prior probabilities to be 1.999/3, 0.001/3, 1/3 for Adelie, Chinstrap, Gentoo, respectively. Re-do the plot of the boundaries in the discriminant space.
Check lecture 4 slide 27 for a hint on how to change the prior for probability for the code. Describe what happens think about why.
Question 3: Misclassifications
This question is an exploratory question so there isnt really a right or wrong question.
Question 4: Math
It is hard to explain what is happening without giving away the logic to solve the question. If you want me to go through this question, please let me know and we can go through it as a class in the last 20 mins.