[link]
Barlett shows that lower generalization bounds for multilayer perceptrons with limited sizes of the weights can be found using the socalled fatshattering dimension. Similar to the classical VC dimensions, the fat shattering dimensions quantifies the expressiveness of hypothesis classes in machine learning. Specifically, considering a sequence of points $x_1, \ldots, x_d$, a hypothesis class $H$ is said to shatter this sequence if, for any label assignment $b_1, \ldots, b_d \in \{1,1\}$, a function $h \in H$ exists that correctly classifies the sequence, i.e. $\text{sign}(h(x_i)) = b_i$. The VC dimension is the largest $d$ for which this is possible. The VC dimension has been studied for a wide range of machine learning models (i.e., hypothesis classes). Thus, it is well known that multilayer perceptrons with at least two layers have infinite VC dimension – which seems natural as twolayer perceptrons are universal approximators. As a result, most bounds on the generalization performance of multilayer networks (and, thus, also of more general deep networks) do not apply as the VC dimension is infinite. The fatshattering dimension, in contrast, does not strictly require the sequence $x_1,\ldots, x_d$ to be correctly classified into the labels $b_1,\ldots, b_d$. Instead, the sequence is said to be $\gamma$shattered if real values $r_1,\ldots,r_d$ exist such that for every labeling, $b_1,\ldots,b_d$, some some $h \in H$ satisfies $(h(x_i) – r_i)b_i \geq \gamma$. Note that the values $r_i$ are fixed across labelings, i.e., are chosen “before” knowing the labels. The fatshattering dimension is the largest $d$ for which this is possible. As a result, the fatshattering dimension relaxes the VC dimension in that the models in $H$ are allowed some “slack” (in lack of a better word). Note that $H$ contains realvalued functions. Based on this definition, Barlett shows that multilayer perceptrons in which all layers have weights $w$ constrained as $\w\_1 \leq A$ scales with $A^{l(l + 1)}$. More importantly, however, the fatshattering dimension is finite. Thus, generalization bounds based on the fatshattering dimensions apply and are discussed by Barlett; I refer to the paper for details on the bound. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
Your comment:
