This paper studies a linear latent factor model, where one observes "examples" consisting of high-dimensional vectors $x_1, x_2, ..\in R^d$, and one wants to predict "labels" consisting of scalars $y_1, y_2, ... \in R$. Crucially, one is working in the "one-shot learning" regime, where the number of training examples n is small (say, $n=2$ or $n=10$), while the dimension d is large (say, $d \rightarrow \infty$). This paper considers a well-known method, principal component regression (PCR), and proves some somewhat surprising theoretical results: PCR is inconsistent, but a modified PCR estimator is weakly consistent; the modified estimator is obtained by "expanding" the PCR estimator, which is different from the usual "shrinkage" methods for high-dimensional data.
This paper aims to provide an analysis for principle component
regression in the setting where the feature vectors $x$. The authors
let $x = v + e$ where $e$ is some corruption of the nominal feature
vector $v$; and $v = a u$ where $a \sim N(0,\eta^2 \gamma^2 d)$ while
the observations $y = \theta/(\gamma \sqrt{d}) \langle v,u \rangle + \xi$. This
formulation is slightly different than the standard one because our
design vectors are noisy, which can pose challenges in identifying the
linear relationship between $x$ and $y$. Thus, using the top principle
components of $x$ is a standard method used in order to help
regularize the estimation. The paper is relevant to the ML
community. The key message of using a bias-corrected estimate of $y$
is interesting, but not necessarily new. Handling bias in regularized
methods is a common problem (cf. Regularization and variable selection
via the Elastic Net, Zou and Hastie, 2005). The authors present
theoretical analysis to justify their results. I find the paper
interesting; however I am not sure if the number of new results and
level of insights warrants acceptance.

more
less