The Randomized Dependence CoefficientThe Randomized Dependence CoefficientLópez-Paz, David and Hennig, Philipp and Schölkopf, Bernhard2013
Paper summarynipsreviewsThe authors propose a non-linear measure of dependence between two random variables. This turns out to be the canonical correlation between random, nonlinear projections of the variables after a copula transformation which renders the marginals of the r.vs invariant to linear transformations.
The paper introduces a new method called RDC to measure the statistical dependence between random variables. It combines a copula transform to a variant of kernel CCA using random projections, resulting in a $O(n log n)$ complexity. Results on synthetic and real benchmark data show promising results for feature selection.
The RDC is a non-linear dependency estimator that satisfies Renyi's criteria and exploits the very recent FastFood speedup trick (ICML13) \cite{journals/corr/LeSS14}. This is a straightforward recipe: 1) copularize the data, effectively preserving the dependency structure while ignoring the marginals, 2) sample k non-linear features of each datum (inspired from Bochner's theorem) and 3) solve the regular CCA eigenvalue problem on the resulting paired datasets. Ultimately, RDC feels like a copularised variation of kCCA (misleading as this may sound). Its efficiency is illustrated successfully on a set of classical non-linear bivariate dependency scenarios and 12 real datasets via a forward feature selection procedure.
The authors propose a non-linear measure of dependence between two random variables. This turns out to be the canonical correlation between random, nonlinear projections of the variables after a copula transformation which renders the marginals of the r.vs invariant to linear transformations.
The paper introduces a new method called RDC to measure the statistical dependence between random variables. It combines a copula transform to a variant of kernel CCA using random projections, resulting in a $O(n log n)$ complexity. Results on synthetic and real benchmark data show promising results for feature selection.
The RDC is a non-linear dependency estimator that satisfies Renyi's criteria and exploits the very recent FastFood speedup trick (ICML13) \cite{journals/corr/LeSS14}. This is a straightforward recipe: 1) copularize the data, effectively preserving the dependency structure while ignoring the marginals, 2) sample k non-linear features of each datum (inspired from Bochner's theorem) and 3) solve the regular CCA eigenvalue problem on the resulting paired datasets. Ultimately, RDC feels like a copularised variation of kCCA (misleading as this may sound). Its efficiency is illustrated successfully on a set of classical non-linear bivariate dependency scenarios and 12 real datasets via a forward feature selection procedure.