ML

Machine Learning笔记(1)

Linear Regression

Posted by mtt on September 25, 2018

Linear Regression

假设输入feature向量为$x = (x_{1}, \ldots , x_{n})$, 有m个数据点$x^{(1)} , \ldots , x^{(m)} \in \mathbb{R^n}$ 对应的标签为$y^{(1)}, \ldots , y^{(m)} \in \mathbb{R}$, 那么我们可以表示输入数据集为 模型

  • hypothesis: $h_{\theta}(X) = \theta^{T}X$
  • parameters: $\theta \in \mathbb{R}^{n}$
  • objective function: $J(\theta) = \frac{1}{2}\sum_{i=1}^m (\theta^{T}x^{(i)} - y^{(i)})^{2} = \frac{1}{2} (\theta^{T}X - Y)^{T}(\theta^{T}X - Y)$
  • goal: find $\theta$ to $min J(\theta)$

闭式解

目标函数是一个关于$\theta$的凹函数,它的最小值为导数为0的地方 令 当$X^{T}X$是可逆矩阵/满秩矩阵时,有

迭代求解

1. Gradient Descent

$\theta := \theta - \alpha\nabla_{\theta}J$

2. SGD
3. Newton’s Method

为了求$f(x) = 0$的解,用在x处的切线与x轴的交点迭代,也就是$x_{t+1} = x_{t} - \frac{f(x_{t})}{f’(x_{t})}$ 那么,$\theta_{t+1} = \theta_{t} - H^{-1}\nabla_{\theta}J$,$H_{ij} = \frac{\partial^2J}{\partial\theta_{i}\theta_{j}}$

概率观点看J的选取

假设 $y^{(i)} = \theta^{T} x^{(i)} + \varepsilon^{(i)}$, $\varepsilon^{(i)}$ ~ $N(0, \sigma^{2})$ 那么有 $y^{(i)}|x^{(i)};\theta$ ~ $N(\theta^{T}x^{(i)} , \sigma^{2})$ 假设$\varepsilon^{(i)}$ 是 IID(独立同分布), 那么
$L(\theta) = P(Y|X ; \theta)
= \prod_{i=1}^m P(y^{(i)}|x^{(i)}; \theta^{(i)})
= \prod_{i = 1}^m \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y^{(i)}-x^{(i)})^{2}}{2\sigma^{2}})$ $l(\theta) = \ln{L(\theta)} = m\log{\frac{1}{\sqrt{2\pi}\sigma}} - \sum_{i = 1}^m\frac{(y^{(i)}-\theta^{T}x^{(i)})^{2}}{2\sigma^{2}}$ 我们的目标是最大化后验概率等价于$max\;l(\theta)$ 等价于$min J(\theta)$

Local Weighted Regression

$J(\theta) = \sum_{i = 1}^m w^{(i)}(y^{(i)} - \theta^T x^{(i)})^2$ 通常选取$w^{(i)} = exp(-\frac{(x^{(i)} - x)^2}{2})$

logistic Regression

假设 $y^{(i)} \in \{0 , 1\}$, 令$h_{\theta}(x) \in [0, 1]$预测y是1的概率 模型:

  • hypothesis: $h_{\theta}(X) = g(\theta^{T}X)$,其中g为sigmoid function: $g(z) = \frac{1}{1 + e^{-z}}$
  • parameters: $\theta \in \mathbb{R}^{n}$
  • objective function:
    因为$P(y = 1|x;\theta) = h_{\theta}(x)$
    所以 $L(\theta) = P(Y|X;\theta) = \prod_{i=1}^mP(y^{(i)}|x^{(i)};\theta) = \prod_{i=1}^m{h(x)^{y}(1-h(x))^{1-y}}$
    $l(\theta)=\sum_{i=1}^m\;y\log{h(x)} + (1-y)\log{(1-h(x))}$
  • goal: find $\theta$ to $max\;l(\theta)$

    求解

    $\frac{\partial{l(\theta)}}{\partial{\theta_{j}}} =$

    选取sigmoid函数的解释

    见Generalized Linear Models