代价函数及梯度下降算法的应用
/#1
Consider the following training set of m=4 training examples:
x | y |
---|---|
1 | 0.5 |
2 | 1 |
4 | 2 |
0 | 0 |
Consider the linear regression model hθ(x)=θ0+θ1x. What are the values of θ0 and θ1 that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)
A. θ0=0.5,θ1=0
B. θ0=0.5,θ1=0.5
C. θ0=1,θ1=1
D. θ0=1,θ1=0.5
-
F. θ0=0,θ1=0.5
分析解答:由四组样本数据可以得出一个标准的一元一次线性方程,由此可求出答案是F
/#2
Let f be some function so that
f(θ0,θ1) outputs a number. For this problem,
f is some arbitrary/unknown smooth function (not necessarily the
cost function of linear regression, so f may have local optima).
Suppose we use gradient descent to try to minimize f(θ0,θ1)
as a function of θ0 and θ1. Which of the
following statements are true? (Check all that apply.)
A. If θ0 and θ1 are initialized at the global minimum, then one iteration will not change their values.
B. Setting the learning rate α to be very small is not harmful, and can only speed up the convergence of gradient descent.
C. If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate α to too large a value.
-
D. No matter how θ0 and θ1 are initialized, so long as α is sufficiently small, we can safely expect gradient descent to convergen to the same solution.
分析解答:学习速率影响其数据变化的快慢
/#3
For this question, assume that we are
using the training set from Q1. Recall our definition of the
cost function was J(θ0,θ1)=12m∑i=1m(hθ(x(i))−y(i))2.
What is J(0,1)? In the box below,
please enter your answer (Simplify fractions to decimals when entering answer, and '.' as the decimal delimiter e.g., 1.5).
分析解答:展开公式直接带入得0.5
多元线性回归方程
Suppose m=4 students have taken some class, and the class had a midterm exam and a final exam. You have collected a dataset of their scores on the two exams, which is as follows:
midterm exam | (midterm exam)^2 | final exam |
---|---|---|
89 | 7921 | 96 |
72 | 5184 | 74 |
94 | 8836 | 87 |
69 | 4761 | 78 |
You'd like to use polynomial regression to predict a student's final exam score from their midterm exam score. Concretely, suppose you want to fit a model of the form hθ(x)=θ0+θ1x1+θ2x2, where x1 is the midterm score and x2 is (midterm score)2. Further, you plan to use both feature scaling (dividing by the "max-min", or range, of a feature) and mean normalization.
What is the normalized feature x1(3)? (Hint: midterm = 94, final = 87 is training example 3.) Please round off your answer to two decimal places and enter in the text box below.
公式:正规方程特征 = (目标值 - 平均值)/(Max-Min)
分析解答:平均值为 (7921+5184+8836+4761)/4=6675.5
Max-Min=8836-4761=4075
(94-6675.5)/4075=-1.61509202
保留两位小数为-1.62