ETC3250: Tutorial 9 Help Sheet
Exercises
Question 1
Show that
Check lecture 8 slide 9 for all the equalities you will need to answer this question.
Remember, you are trying to show that the kernel applied to
Starting from
You want to expand the inner product of
You want to rearrange the expression above until you get it in terms of
Question 2
Part A
Make plots of each data set.
A scatter plot is typically what you use when you want to display data, only have two variables, and no other information about the structure. Use geom_point
and make sure to colour by the class variable.
Part B
What type of kernel would be appropriate for each? How many support vectors would you expect are needed to define the boundary in each case?
Check lecture 8 slide 7 has an example of a linear SVM, and lecture 8 slide 9 has an example of a non-linear classier. Make sure to to differentiate between polynomial and radial kernels for this type of question (i.e. don’t just answer linear or non-linear).
There isn’t an obvious section of the lecture slides I can direct you to for this question, although the answer is really in the fact that we are using a separating hyperplane, so you should use lecture 8 slide 4. Instead of thinking about an exact number of support vectors to start, think about the number of support vectors each data set would need relative to each other. To answer this question, I would advise you to think about your results from Question 1. How many dimensions in the original data are you splitting on in the non-linear kernal vs in the linear kernal? Is it different? How does the dimensionality of the separating hyper plane in the non-linear kernal impact the number of support vectors needed?
To draw a line you only need at least two points. To draw a surface you need at least three points. Otherwise do not have a well defined line or surface. What is the minimum number of points needed to draw a separating hyper plane in the space defined by the non-linear kernal? How about the linear kernal? Of course, this does not mean you would need 2 points for a linear deicison boundary, but the logic is similar.
Part C
Break the data into training and test.
We have done this many times. Check any previous tutorial. You need to use the initial_split
function and remember to use the strata
option because this is a classification task.
Part D
Fit the svm model. Try changing the cost parameter to explore the number of support vectors used. Choose the value that gives you the smallest number.
The code to make the model is provided, so you only need to get information on the impact of the cost function. Check lecture 8 slide 10 for information on the cost parameter. Using your results from Part B, you should know the theoretical minimum for each kernel. You can try and fiddle around with cost
until you get close to that threshold (you might not be able to hit it).
Part E
Can you use the parameter estimates to write out the equation of the separating hyperplane for the linear SVM model?
Di told you to use svm_fit1$fit@coef
and svm_fit1$fit@SVindex
in the formula on lecture 8 slide 6. Remember that you are trying to compute all the
Consider using the apply
function if you need to do a calculation multiple times.
Part F
Compute the confusion table, and the test error for each model.
We have done this many times. Check any previous tutorial or your assignments
Part G
Which observations would you expect to be the support vectors? Overlay indications of the support vectors from each model to check whether the model thinks the same as you.
Check lectrue 8 slide 5 has a visualisation of the typical support vectors. This question only requires you to compare this to what you expected.
Part H
Would a neural network might be able to fit the second data set? Think about what would need to be done to make a neural network architecture.
One possible answer to this question can be given without knowing much about neural networks. After all, the simplest boundary you can give is linear. Is there anything you can do to your data that would make the modelling problem simple enough that any model could do the classification?