IV. Linear Regression with Multiple Variables (Week 2)

机器学习Machine Learning – Andrew NG courses学习笔记

multivariate linear regression多变量线性规划

（linear regression works with multiple variables or with multiple features）

Multiple Features(variables)多特征（变量）

{x上标i代表第i个trainning example; x下标i代表特定trainning example中的第i个数值}

the hypothesis for linear regression with multiple features(variables)多变量线性回归的假设函数的表示

additional zero feature x0(为了方便表示)

for every example i have a feature vector X superscript I and X superscript I subscript 0 is going to be equal to 1.

Gradient Descent for Multiple Variables多变量的梯度下降

模型表示

通过gradient descent algorithm求解cost func最小值来求parameters θ

{其中左边是单变量线性规划求解参数的gradient descent algorithm;

右边是多变量线性规划求解参数的算法}

Gradient Descent in Practice I – Feature Scaling梯度下降实践1 – 特征缩放

{feature skill ： getting features to be on similar ranges of Scales of similar ranges of values of each other.for making gradient descent work well： make gradient descent run much faster and converge in a lot fewer other iterations.}

why:

if you make sure that the features are on a similar scale, then gradient descents can converge more quickly.（如下右图）如果不进行feature scaling:gradients may end up taking a long time and can oscillate(振荡) back and forth and take a long time before it can finally find its way to the global minimum.(如下左图)

how to feature scaling?

1. 除以最大值max或者范围range(max – min)

In addition to dividing by so that the maximum value when performing feature scaling sometimes.{每个feature都除以最大值来做feature scaling, 使取值区间在[-1, 1]类似范围内就可以}

If you end up having a different feature that winds being between -2 and + 0.5,this is close enough to minus one and plus one, and that’s fine.{x1,x2,x3 不必一定都在区间[-i, i]上，只要比较接近就可以}

if you have a different feature, say X3 ranges [-100, +100] or if X 4 takes on values between [-0.0001, +0.0001], this is a very different values than minus 1 and plus 1. So, this might be a less well-skilled(poorly scaled) feature and similarly.{但差别不能太大}关于feature区间的一个好的选择：if a feature takes on the range of values from say [-3, 3] should be just fine.2. mean normalization

也即

note:x1 or x2 can actually be slightly larger than .5 but, close enough.any value that gets the features into anything close to these sorts of ranges will do fine.总之：the feature scaling doesn’t have to be too exact,in order to get gradient descent to run quite a lot faster.

Gradient Descent in Practice II – Learning Rate α梯度下降实践2 – 学习率

如何判断gradient descent迭代到收敛了Howto make sure gradient descent is working correctly

1. 绘图法（图左）：usually by plotting this sort of plot,by looking at these plots that I tried to tell if gradient descent has converged.

2. 收敛测试（图右）

莫愁前路无知己，天下谁人不识君。

相关文章：

你感兴趣的文章：

标签云：