[link]
This paper studies a linear latent factor model, where one observes "examples" consisting of highdimensional vectors $x_1, x_2, ..\in R^d$, and one wants to predict "labels" consisting of scalars $y_1, y_2, ... \in R$. Crucially, one is working in the "oneshot learning" regime, where the number of training examples n is small (say, $n=2$ or $n=10$), while the dimension d is large (say, $d \rightarrow \infty$). This paper considers a wellknown method, principal component regression (PCR), and proves some somewhat surprising theoretical results: PCR is inconsistent, but a modified PCR estimator is weakly consistent; the modified estimator is obtained by "expanding" the PCR estimator, which is different from the usual "shrinkage" methods for highdimensional data. This paper aims to provide an analysis for principle component regression in the setting where the feature vectors $x$. The authors let $x = v + e$ where $e$ is some corruption of the nominal feature vector $v$; and $v = a u$ where $a \sim N(0,\eta^2 \gamma^2 d)$ while the observations $y = \theta/(\gamma \sqrt{d}) \langle v,u \rangle + \xi$. This formulation is slightly different than the standard one because our design vectors are noisy, which can pose challenges in identifying the linear relationship between $x$ and $y$. Thus, using the top principle components of $x$ is a standard method used in order to help regularize the estimation. The paper is relevant to the ML community. The key message of using a biascorrected estimate of $y$ is interesting, but not necessarily new. Handling bias in regularized methods is a common problem (cf. Regularization and variable selection via the Elastic Net, Zou and Hastie, 2005). The authors present theoretical analysis to justify their results. I find the paper interesting; however I am not sure if the number of new results and level of insights warrants acceptance.
Your comment:
