Regularization and Variable Selection via the Elastic NetRegularization and Variable Selection via the Elastic NetZou, H. and Hastie, T.2003
Paper summaryshagunsodhani## Introduction to elastic net
* Regularization and variable selection method.
* Sparse Representation
* Exihibits grouping effect.
* Prticulary useful when number of predictors (*p*) >> number of observations (*n*).
* LARS-EN algorithm to compute elastic net regularization path.
## Lasso
* Least square method with L1-penalty on regression coefficient.
* Does continuous shrinkage and automatic variable selection
### Limitations
* If *p >> n*, lasso can select at most *n* variables.
* In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected.
* If *n > p* and there is a high correlation between predictors, ridge regression outperforms lasso.
## Naive elastic net
* Least square method.
* Penalty on regression cofficients is a convex combination of lasso and ridge penalty.
* *penalty = (1−α)\*|β| + α\*|β|<sup>2</sup>* where *β* refers to the coefficient matrix.
* *α = 0* => lasso penalty
* *α = 1* => ridge penalty
* Naive elastic net can be solved by transforming to lasso on augmeneted data.
* Can be viewed as redge type shrinkage followed by lasso type thresholding.
### Limitations
* The two-stage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance.
## Bridge Regression
* Generalization of lasso and ridge regression.
* Can not produce sparse solutions.
## Elastic net
* Rescaled naive elastic net coefficients to undo shrinkage.
* Retains good properties of the naive elastic net.
## Justification for scaling
* Elastic net becomes minimax optimal.
* Scaling reverses the shrinkage control introduced by ridge regression.
## LARS-EN
* Based on LARS (used to solve lasso).
* Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm.
* Use sparseness to save on computation.
## Conclusion
Elastic net performs superior to lasso.
## Introduction to elastic net
* Regularization and variable selection method.
* Sparse Representation
* Exihibits grouping effect.
* Prticulary useful when number of predictors (*p*) >> number of observations (*n*).
* LARS-EN algorithm to compute elastic net regularization path.
## Lasso
* Least square method with L1-penalty on regression coefficient.
* Does continuous shrinkage and automatic variable selection
### Limitations
* If *p >> n*, lasso can select at most *n* variables.
* In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected.
* If *n > p* and there is a high correlation between predictors, ridge regression outperforms lasso.
## Naive elastic net
* Least square method.
* Penalty on regression cofficients is a convex combination of lasso and ridge penalty.
* *penalty = (1−α)\*|β| + α\*|β|<sup>2</sup>* where *β* refers to the coefficient matrix.
* *α = 0* => lasso penalty
* *α = 1* => ridge penalty
* Naive elastic net can be solved by transforming to lasso on augmeneted data.
* Can be viewed as redge type shrinkage followed by lasso type thresholding.
### Limitations
* The two-stage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance.
## Bridge Regression
* Generalization of lasso and ridge regression.
* Can not produce sparse solutions.
## Elastic net
* Rescaled naive elastic net coefficients to undo shrinkage.
* Retains good properties of the naive elastic net.
## Justification for scaling
* Elastic net becomes minimax optimal.
* Scaling reverses the shrinkage control introduced by ridge regression.
## LARS-EN
* Based on LARS (used to solve lasso).
* Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm.
* Use sparseness to save on computation.
## Conclusion
Elastic net performs superior to lasso.