FeatureSelection.Rd
Performs feature selection for a conditional random forest model. Four approaches are available : non-recursive feature elimination (NRFE), recursive feature elimination (RFE), permutation test approach with permuted response (Altmann et al, 2010), permutation test approach with permuted predictors (Hapfelmeier et Ulm, 2013).
FeatureSelection(Y, X, method = 'NRFE', ntree = 1000, measure = NULL,
nperm = 30, alpha = 0.05, distrib = 'approx',
parallel = FALSE, ...)
response vector. Must be of class factor
or numeric
matrix or data frame containing the predictors
method for feature selection. Should be 'NRFE' (non-recursive feature elimination, default), 'RFE' (recursive feature elimination), 'ALT' (permutation of response) or 'HAPF' (permutation of predictors)
number of trees contained in a forest
the name of the measure of the measures
package that should be used for error and variable importance calculations.
number of permutations. Only for 'ALT' and 'HAPF' methods.
alpha level for permutation tests. Only for 'ALT' and 'HAPF' methods.
the null distribution of the variable importance can be approximated by its asymptotic distribution ("asympt"
) or via Monte Carlo resampling ("approx"
, default). Only for 'ALT' and 'HAPF' methods.
Logical indicating whether or not to run fastvarImp
in parallel using a backend provided by the foreach
package. Default is FALSE
.
Further arguments (like positive or negative class) that are needed by the measure.
To be developed soon !
A list with the following elements :
selected variables with the 0 standard error rule
forest corresponding the variables selected with the 0 standard error rule
OOB error of the forest with 0 standard error rule
selected variables with the 1 standard error rule
forest corresponding the variables selected with the 1 standard error rule
OOB error of the forest with 1 standard error rule
B. Gregorutti, B. Michel, and P. Saint Pierre. "Correlation and variable importance in random forests". arXiv:1310.5726, 2017.
A. Hapfelmeier and K. Ulm. "A new variable selection approach using random forests". Computational Statistics and Data Analysis, 60:50–69, 2013.
A. Altmann, L. Toloşi, O. Sander et T. Lengauer. "Permutation importance: a corrected feature importance measure". Bioinformatics, 26(10):1340-1347, 2010.
The code is adapted from Hapfelmeier & Ulm (2013).
Only works for regression and binary classification.
data(iris)
iris2 = iris
iris2$Species = factor(iris$Species == "versicolor")
featsel <- FeatureSelection(iris2$Species, iris2[,1:4], measure='ACC', ntree=200)
#> [1] "initial step"
#> [1] "final step"
featsel$selection.0se
#> [1] "Petal.Length" "Petal.Width" "Sepal.Width"
featsel$selection.1se
#> [1] "Petal.Length"