Feature selection for conditional random forests. — FeatureSelection • moreparty

Performs feature selection for a conditional random forest model. Four approaches are available : non-recursive feature elimination (NRFE), recursive feature elimination (RFE), permutation test approach with permuted response (Altmann et al, 2010), permutation test approach with permuted predictors (Hapfelmeier et Ulm, 2013).

FeatureSelection(Y, X, method = 'NRFE', ntree = 1000, measure = NULL,
                 nperm = 30, alpha = 0.05, distrib = 'approx',
                 parallel = FALSE, ...)

Arguments

Y: response vector. Must be of class factor or numeric
X: matrix or data frame containing the predictors
method: method for feature selection. Should be 'NRFE' (non-recursive feature elimination, default), 'RFE' (recursive feature elimination), 'ALT' (permutation of response) or 'HAPF' (permutation of predictors)
ntree: number of trees contained in a forest
measure: the name of the measure of the measures package that should be used for error and variable importance calculations.
nperm: number of permutations. Only for 'ALT' and 'HAPF' methods.
alpha: alpha level for permutation tests. Only for 'ALT' and 'HAPF' methods.
distrib: the null distribution of the variable importance can be approximated by its asymptotic distribution ("asympt") or via Monte Carlo resampling ("approx", default). Only for 'ALT' and 'HAPF' methods.
parallel: Logical indicating whether or not to run fastvarImp in parallel using a backend provided by the foreach package. Default is FALSE.
...: Further arguments (like positive or negative class) that are needed by the measure.

Details

To be developed soon !

Value

A list with the following elements :

selection.0se: selected variables with the 0 standard error rule
forest.0se: forest corresponding the variables selected with the 0 standard error rule
oob.error.0se: OOB error of the forest with 0 standard error rule
selection.1se: selected variables with the 1 standard error rule
forest.1se: forest corresponding the variables selected with the 1 standard error rule
oob.error.1se: OOB error of the forest with 1 standard error rule

References

B. Gregorutti, B. Michel, and P. Saint Pierre. "Correlation and variable importance in random forests". arXiv:1310.5726, 2017.

A. Hapfelmeier and K. Ulm. "A new variable selection approach using random forests". Computational Statistics and Data Analysis, 60:50–69, 2013.

A. Altmann, L. Toloşi, O. Sander et T. Lengauer. "Permutation importance: a corrected feature importance measure". Bioinformatics, 26(10):1340-1347, 2010.

Author

Nicolas Robette

Note

The code is adapted from Hapfelmeier & Ulm (2013).

Only works for regression and binary classification.

Examples

  data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species == "versicolor")
  featsel <- FeatureSelection(iris2$Species, iris2[,1:4], measure='ACC', ntree=200)
#> [1] "initial step"
#> [1] "final step"
  featsel$selection.0se
#> [1] "Petal.Length" "Petal.Width"  "Sepal.Width" 
  featsel$selection.1se
#> [1] "Petal.Length"