[link]
## Introduction to elastic net * Regularization and variable selection method. * Sparse Representation * Exihibits grouping effect. * Prticulary useful when number of predictors (*p*) >> number of observations (*n*). * LARSEN algorithm to compute elastic net regularization path. ## Lasso * Least square method with L1penalty on regression coefficient. * Does continuous shrinkage and automatic variable selection ### Limitations * If *p >> n*, lasso can select at most *n* variables. * In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected. * If *n > p* and there is a high correlation between predictors, ridge regression outperforms lasso. ## Naive elastic net * Least square method. * Penalty on regression cofficients is a convex combination of lasso and ridge penalty. * *penalty = (1−α)\*β + α\*β<sup>2</sup>* where *β* refers to the coefficient matrix. * *α = 0* => lasso penalty * *α = 1* => ridge penalty * Naive elastic net can be solved by transforming to lasso on augmeneted data. * Can be viewed as redge type shrinkage followed by lasso type thresholding. ### Limitations * The twostage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance. ## Bridge Regression * Generalization of lasso and ridge regression. * Can not produce sparse solutions. ## Elastic net * Rescaled naive elastic net coefficients to undo shrinkage. * Retains good properties of the naive elastic net. ## Justification for scaling * Elastic net becomes minimax optimal. * Scaling reverses the shrinkage control introduced by ridge regression. ## LARSEN * Based on LARS (used to solve lasso). * Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm. * Use sparseness to save on computation. ## Conclusion Elastic net performs superior to lasso.
Your comment:
