Perceptrons - an introduction to computational geometry Perceptrons - an introduction to computational geometry
Paper summary ### Perceptron Classification The function of the perceptron takes this form for some weight vector $\vec{w}$ and bias scalar $b$. Given some input $x$ it will produce a binary prediction. $$ f(x) = \left\\{ \begin{matrix} 1 & \text{if } (\vec{w} \cdot \vec{x} + b > 0) \\\\ -1 & otherwise \\\\ \end{matrix}\right. $$ ### Perceptron Learning The values $w$ and $b$ for this function are learned from the sample data by minimizing the misclassification error of predictions. Our sample data is in the form $(x\_i,y\_i)$ where $y\_i$ the correct label (1 or -1). If the output of $f(x\_i)$ is equal to $y\_i$ then multiplying $-y\_i f(x\_i)$ will be 1 or -1. If it is incorrect it will be 1. So we can take the $max$ of 0 and this product and then sum them all to get how bad $w$ and $b$ are! $J_i(w,b)$ is the error for that one example. We can sum these together to get the error over all samples. $$J_i(w,b) = max(0,-y\_i f(x\_i))$$ $$J(w,b) = \frac{1}{N} \displaystyle\sum\_{i=1}^N max(0,-y\_i f(x\_i))$$ To apply Gradient Decent to this problem we calculate the gradient of $J_i(w,b)$ with respect to each $w\_j \in w$ so we can know how to adjust it to minimize $J_i(w,b)$ Because we have a $max$ this gradient is annoying and has a split. $$ \frac{\partial J_i}{\partial w_j}= \left\\{ \begin{matrix} 0 & \text{if } (\vec{w} \cdot \vec{x} + b > 0) \\\\ y\_ix\_{ij} & otherwise \\\\ \end{matrix}\right. $$ This gradient $\frac{\partial J_i}{\partial w_j}$ is then used to adjust $w_j$. By subtracting $\frac{\partial J_i}{\partial w_j}$ from $w_j$ it will adjust the output of $f(x_i)$ such that the error $J_i(w,b)$ is reduced. Generally, subtracting the full gradient will not result in the minimal error. So a fraction of the gradient is subtracted $\lambda$ normally at a rate of $0.05$ but this term is still a point of debate and generally is set by experience.
Perceptrons - an introduction to computational geometry
Minsky, Marvin and Papert, Seymour
MIT Press - 1987 via Bibsonomy
Keywords: dblp

Your comment:

Short Science allows researchers to publish paper summaries that are voted on and ranked!