联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-10-04 10:44

Week 2 Homework

1

Homework

(Level 1: high-level understanding) (10 pt)

• Which of the following can describe SVM? (select all that apply)

• It is a supervised learning algorithm.

• It is an unsupervised learning algorithm.

• It can be used to solve a classification problem.

• The algorithm is based on the concept of finding a hyperplane that separates

the data with a maximal margin, in order to classified future data points with

more confidence.

• Which of the following problems SVM is best suited for? (select one)

• Predict housing prices based on location, size, and other factors.

• Classify images, such as identifying the contents of a photograph

• Group customers according to their purchasing behavior

• Extract important features from a dataset, allowing for more accurate analysis

2

Homework

(Level 1: high-level understanding) (5 pt)

• What is the purpose of the training set, validation set, and testing set

in machine learning? (One each)

• To evaluate the performance of the model on unseen data

• To fine-tune the model's parameters and learn the underlying patterns in the

data

• To evaluate the performance of the model during the training process

Homework

(Level 1: high-level understanding) (5 pt)

• What are the ML problems due to data?

1. If we are trying to predict the price of a house, but the training data includes

irrelevant information such as the color of the house or the name of the previous

owner

2. If we are trying to predict the success of a new product based on customer data,

but we only have a small amount of historical customer data.

3. If we are trying to predict customer behavior but the data contains a lot of missing

values or errors

4. If we are trying to predict customer behavior, but the training data only includes

data from a single demographic group.

• The choices are (one each)

a. insufficient quantity of training data

b. nonrepresentative training data

c. poor-quality data

d. irrelevant features

Homework

(Level 2: manual exercise) (20 pt)

• Given the following dataset of labeled points in two dimensions, use a

support vector machine to find the best separating hyperplane.

(Note: please use hard margin. Using high-school geometry should be

sufficient; no need to solve the NP optimization problem.)

• Positive samples: {(3,3), (4,4)}

• Negative samples: {(2,1), (1,2)}

• Use the constructed hyperplane to predict the class of a new data

point at (2,2.4)

5

Homework

(Level 3: extension of the basic algorithm) (20 pt)

• There may be outliers or noises in the data from real-world applications.

• To address this issue, a soft margin can be used in a modified optimization

problem, known as a soft-margin SVM:

• Objective: min !

"

 " +   ∑#  #

• Constraint:  #   ⋅  # +   = 1 −  # and  # ≥ 0

•  # is the slack, which allows  # to be inside the margin

• SVM without the slacks is known as hard-margin SVM.

• Where is  ! relative to where the margin is when its  ! value is 0?

• Where is  ! relative to where the margin is when 0 ≤  ! ≤ 1?

• Where is  ! relative to where the margin is when  ! > 1?

6

Homework

(Level 4: computer-based exercise) (20 pt)

• (Using HW2-4.ipynb as the template.)

• This is related to HW1-4

• Using popular scikit-learn package for SVM

• Assuming (0, 1, 1) is the ground truth of the decision boundary, create 40 unique

samples (20 are positive and 20 are negative).

• First, evenly split the 40 samples into two sets: one is called training samples, and the

other is called testing samples.

• Second, train a hard-margin SVM using 100% of the training samples, and test the accuracy of the unseen testing samples. (Repeat 10 times for the average accuracy)

• Third, train a hard-margin SVM using 60% of the training samples (e.g., 6 positive and 6

negative samples), and test the accuracy of the unseen testing samples. (Repeat 10 times

for the average accuracy)

• Compare your results of PLA vs. SVM? What do you observe?

7

Homework

(Level 5: computer-based exercise) (20 pt)

• (Using HW2-5.ipynb as the template.) • Use LIBSVM (https://www.csie.ntu.edu.tw/~cjlin/libsvm/)

• Need to install libsvm-official (in colab, to install new python package, use “!” in front of command line)

• Follow “A Practical Guide to Support Vector Classification”

https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

• Use A.1 Astroparticle Physics dataset: svmguide1 and svmguide1.t

• Use the radial basis function (RBF) (i.e., t =2)

• First, train the model without scaling the dataset with   = 2, C = 32, then report your prediction accuracy on the testing data

• Second, scale datasets with default parameters, train the model, and then report your prediction

accuracy on the testing data

• What do you observe? Why?

• Third, change C = 2, 8, 32, 128, 512, repeat the model training (using scaled datasets), and report

the prediction accuracy of the training data and that of the testing data.

• What do you observe across different C’s? Why?

8


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp