IV. Linear Regression with Multiple Variables (Week 2)

机器学习Machine Learning – Andrew NG courses学习笔记

multivariate linear regression多变量线性规划

(linear regression works with multiple variables or with multiple features)

Multiple Features(variables)多特征(变量)

{x上标i代表第i个trainning example; x下标i代表特定trainning example中的第i个数值}

the hypothesis for linear regression with multiple features(variables)多变量线性回归的假设函数的表示

additional zero feature x0(为了方便表示)

for every example i have a feature vector X superscript I and X superscript I subscript 0 is going to be equal to 1.

Gradient Descent for Multiple Variables多变量的梯度下降

模型表示

通过gradient descent algorithm求解cost func最小值来求parameters θ

{其中左边是单变量线性规划求解参数的gradient descent algorithm;

右边是多变量线性规划求解参数的算法}

Gradient Descent in Practice I – Feature Scaling梯度下降实践1 – 特征缩放

{feature skill : getting features to be on similar ranges of Scales of similar ranges of values of each other.for making gradient descent work well: make gradient descent run much faster and converge in a lot fewer other iterations.}

why:

if you make sure that the features are on a similar scale, then gradient descents can converge more quickly.(如下右图)如果不进行feature scaling:gradients may end up taking a long time and can oscillate(振荡) back and forth and take a long time before it can finally find its way to the global minimum.(如下左图)

how to feature scaling?

1. 除以最大值max或者范围range(max – min)

In addition to dividing by so that the maximum value when performing feature scaling sometimes.{每个feature都除以最大值来做feature scaling, 使取值区间在[-1, 1]类似范围内就可以}

If you end up having a different feature that winds being between -2 and + 0.5,this is close enough to minus one and plus one, and that’s fine.{x1,x2,x3 不必一定都在区间[-i, i]上, 只要比较接近就可以}

if you have a different feature, say X3 ranges [-100, +100] or if X 4 takes on values between [-0.0001, +0.0001], this is a very different values than minus 1 and plus 1. So, this might be a less well-skilled(poorly scaled) feature and similarly.{但差别不能太大}关于feature区间的一个好的选择:if a feature takes on the range of values from say [-3, 3] should be just fine.2. mean normalization

也即

note:x1 or x2 can actually be slightly larger than .5 but, close enough.any value that gets the features into anything close to these sorts of ranges will do fine.总之:the feature scaling doesn’t have to be too exact,in order to get gradient descent to run quite a lot faster.

Gradient Descent in Practice II – Learning Rate α梯度下降实践2 – 学习率

如何判断gradient descent迭代到收敛了Howto make sure gradient descent is working correctly

1. 绘图法(图左):usually by plotting this sort of plot,by looking at these plots that I tried to tell if gradient descent has converged.

2. 收敛测试(图右)

莫愁前路无知己,天下谁人不识君。

IV. Linear Regression with Multiple Variables (Week 2)

相关文章:

你感兴趣的文章:

标签云: