| Title: | Feature Selection for 'mlr3' |
|---|---|
| Description: | Feature selection package of the 'mlr3' ecosystem. It selects the optimal feature set for any 'mlr3' learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling. |
| Authors: | Marc Becker [aut, cre] (ORCID: <https://orcid.org/0000-0002-8115-0400>), Patrick Schratz [aut] (ORCID: <https://orcid.org/0000-0003-0748-6624>), Michel Lang [aut] (ORCID: <https://orcid.org/0000-0001-9754-0393>), Bernd Bischl [aut] (ORCID: <https://orcid.org/0000-0001-6002-6980>), John Zobolas [aut] (ORCID: <https://orcid.org/0000-0002-3609-8674>) |
| Maintainer: | Marc Becker <[email protected]> |
| License: | LGPL-3 |
| Version: | 1.5.1 |
| Built: | 2026-05-08 06:28:35 UTC |
| Source: | https://github.com/mlr-org/mlr3fselect |
Feature selection package of the 'mlr3' ecosystem. It selects the optimal feature set for any 'mlr3' learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling.
Maintainer: Marc Becker [email protected] (ORCID)
Authors:
Patrick Schratz [email protected] (ORCID)
Michel Lang [email protected] (ORCID)
Bernd Bischl [email protected] (ORCID)
John Zobolas [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/mlr-org/mlr3fselect/issues
The ArchiveAsyncFSelect stores all evaluated feature subsets and performance scores in a rush::Rush database.
The ArchiveAsyncFSelect is a connector to a rush::Rush database.
The table ($data) has the following columns:
One column for each feature of the search space ($search_space).
One column for each performance measure ($codomain).
runtime_learners (numeric(1))
Sum of training and predict times logged in learners per mlr3::ResampleResult / evaluation.
This does not include potential overhead time.
timestamp (POSIXct)
Time stamp when the evaluation was logged into the archive.
For analyzing the feature selection results, it is recommended to pass the ArchiveAsyncFSelect to as.data.table().
The returned data table contains the mlr3::ResampleResult for each feature subset evaluation.
as.data.table.ArchiveFSelect(x, unnest = "x_domain", exclude_columns = "uhash", measures = NULL)
Returns a tabular view of all evaluated feature subsets.
ArchiveAsyncFSelect -> data.table::data.table()
unnest (character())
Transforms list columns to separate columns. Set to NULL if no column should be unnested.
exclude_columns (character())
Exclude columns from table. Set to NULL if no column should be excluded.
measures (List of mlr3::Measure)
Score feature subsets on additional measures.
bbotk::Archive -> bbotk::ArchiveAsync -> ArchiveAsyncFSelect
benchmark_result(mlr3::BenchmarkResult)
Benchmark result.
ties_method(character(1))
Method to handle ties in the archive.
One of "least_features" (default) or "random".
new()
Creates a new instance of this R6 class.
ArchiveAsyncFSelect$new( search_space, codomain, rush, ties_method = "least_features" )
search_space(paradox::ParamSet)
Search space.
Internally created from provided mlr3::Task by instance.
codomain(paradox::ParamSet)
Specifies codomain of function.
Most importantly the tags of each output "Parameter" define whether it should
be minimized or maximized. The default is to minimize each component.
rush(Rush)
If a rush instance is supplied, the optimization runs without batches.
ties_method(character(1))
The method to break ties when selecting sets while optimizing and when selecting the best set.
Can be "least_features" or "random".
The option "least_features" (default) selects the feature set with the least features.
If there are multiple best feature sets with the same number of features, one is selected randomly.
The random method returns a random feature set from the best feature sets.
Ignored if multiple measures are used.
check_values(logical(1))
If TRUE (default), feature subsets are check for validity.
learner()
Retrieve mlr3::Learner of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
Learner does not contain a model. Use $learners() to get learners with models.
ArchiveAsyncFSelect$learner(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
learners()
Retrieve list of trained mlr3::Learner objects of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
ArchiveAsyncFSelect$learners(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
predictions()
Retrieve list of mlr3::Prediction objects of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
ArchiveAsyncFSelect$predictions(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
resample_result()
Retrieve mlr3::ResampleResult of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
ArchiveAsyncFSelect$resample_result(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
print()
Printer.
ArchiveAsyncFSelect$print()
...(ignored).
best()
Returns the best scoring feature set(s). For single-crit optimization, the solution that minimizes / maximizes the objective function. For multi-crit optimization, the Pareto set / front.
ArchiveAsyncFSelect$best(n_select = 1, ties_method = "least_features")
n_select(integer(1L))
Amount of points to select.
Ignored for multi-crit optimization.
ties_method(character(1L))
Method to break ties when multiple points have the same score.
Either "least_features" (default) or "random".
Ignored for multi-crit optimization.
If n_select > 1L, the tie method is ignored and the first point is returned.
push_result()
Push result to the archive.
ArchiveAsyncFSelect$push_result(key, ys, x_domain, extra = NULL)
key(character())
Key of the point.
ys(list())
Named list of results.
x_domain(list())
Is ignored for feature selection.
extra(list())
Named list of additional information.
clone()
The objects of this class are cloneable with this method.
ArchiveAsyncFSelect$clone(deep = FALSE)
deepWhether to make a deep clone.
Freezes the Redis data base of an ArchiveAsyncFSelect to a data.table::data.table().
No further points can be added to the archive but the data can be accessed and analyzed.
Useful when the Redis data base is not permanently available.
Use the callback mlr3fselect.async_freeze_archive to freeze the archive after the optimization has finished.
as.data.table(archive)
ArchiveAsyncFSelectFrozen -> data.table::data.table()
Returns a tabular view of all performed function calls of the Objective.
bbotk::Archive -> bbotk::ArchiveAsync -> bbotk::ArchiveAsyncFrozen -> ArchiveAsyncFSelectFrozen
benchmark_result(mlr3::BenchmarkResult)
Benchmark result.
bbotk::Archive$format()bbotk::Archive$help()bbotk::ArchiveAsync$best()bbotk::ArchiveAsync$nds_selection()bbotk::ArchiveAsyncFrozen$clear()bbotk::ArchiveAsyncFrozen$data_with_state()bbotk::ArchiveAsyncFrozen$pop_point()bbotk::ArchiveAsyncFrozen$push_failed_point()bbotk::ArchiveAsyncFrozen$push_points()bbotk::ArchiveAsyncFrozen$push_result()bbotk::ArchiveAsyncFrozen$push_running_point()new()
Creates a new instance of this R6 class.
ArchiveAsyncFSelectFrozen$new(archive)
archive(ArchiveAsyncFSelect)
The archive to freeze.
learner()
Retrieve mlr3::Learner of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
Learner does not contain a model. Use $learners() to get learners with models.
ArchiveAsyncFSelectFrozen$learner(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
learners()
Retrieve list of trained mlr3::Learner objects of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
ArchiveAsyncFSelectFrozen$learners(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
predictions()
Retrieve list of mlr3::Prediction objects of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
ArchiveAsyncFSelectFrozen$predictions(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
resample_result()
Retrieve mlr3::ResampleResult of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
ArchiveAsyncFSelectFrozen$resample_result(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
print()
Printer.
ArchiveAsyncFSelectFrozen$print()
...(ignored).
clone()
The objects of this class are cloneable with this method.
ArchiveAsyncFSelectFrozen$clone(deep = FALSE)
deepWhether to make a deep clone.
The ArchiveBatchFSelect stores all evaluated feature sets and performance scores.
The ArchiveBatchFSelect is a container around a data.table::data.table().
Each row corresponds to a single evaluation of a feature set.
See the section on Data Structure for more information.
The archive stores additionally a mlr3::BenchmarkResult ($benchmark_result) that records the resampling experiments.
Each experiment corresponds to a single evaluation of a feature set.
The table ($data) and the benchmark result ($benchmark_result) are linked by the uhash column.
If the archive is passed to as.data.table(), both are joined automatically.
The table ($data) has the following columns:
One column for each feature of the task ($search_space).
One column for each performance measure ($codomain).
runtime_learners (numeric(1))
Sum of training and predict times logged in learners per mlr3::ResampleResult / evaluation.
This does not include potential overhead time.
timestamp (POSIXct)
Time stamp when the evaluation was logged into the archive.
batch_nr (integer(1))
Feature sets are evaluated in batches. Each batch has a unique batch number.
uhash (character(1))
Connects each feature set to the resampling experiment stored in the mlr3::BenchmarkResult.
For analyzing the feature selection results, it is recommended to pass the archive to as.data.table().
The returned data table is joined with the benchmark result which adds the mlr3::ResampleResult for each feature set.
The archive provides various getters (e.g. $learners()) to ease the access.
All getters extract by position (i) or unique hash (uhash).
For a complete list of all getters see the methods section.
The benchmark result ($benchmark_result) allows to score the feature sets again on a different measure.
Alternatively, measures can be supplied to as.data.table().
as.data.table.ArchiveBatchFSelect(x, exclude_columns = "uhash", measures = NULL)
Returns a tabular view of all evaluated feature sets.
ArchiveBatchFSelect -> data.table::data.table()
exclude_columns (character())
Exclude columns from table. Set to NULL if no column should be excluded.
measures (list of mlr3::Measure)
Score feature sets on additional measures.
bbotk::Archive -> bbotk::ArchiveBatch -> ArchiveBatchFSelect
benchmark_result(mlr3::BenchmarkResult)
Benchmark result.
ties_method(character(1))
Method to handle ties.
new()
Creates a new instance of this R6 class.
ArchiveBatchFSelect$new( search_space, codomain, check_values = TRUE, ties_method = "least_features" )
search_space(paradox::ParamSet)
Search space.
Internally created from provided mlr3::Task by instance.
codomain(bbotk::Codomain)
Specifies codomain of objective function i.e. a set of performance measures.
Internally created from provided mlr3::Measures by instance.
check_values(logical(1))
If TRUE (default), hyperparameter configurations are check for validity.
ties_method(character(1))
The method to break ties when selecting sets while optimizing and when selecting the best set.
Can be "least_features" or "random".
The option "least_features" (default) selects the feature set with the least features.
If there are multiple best feature sets with the same number of features, one is selected randomly.
The random method returns a random feature set from the best feature sets.
Ignored if multiple measures are used.
add_evals()
Adds function evaluations to the archive table.
ArchiveBatchFSelect$add_evals(xdt, xss_trafoed = NULL, ydt)
xdt(data.table::data.table())
x values as data.table. Each row is one point. Contains the value in
the search space of the FSelectInstanceBatchMultiCrit object. Can contain
additional columns for extra information.
xss_trafoed(list())
Ignored in feature selection.
ydt(data.table::data.table())
Optimal outcome.
learner()
Retrieve mlr3::Learner of the i-th evaluation, by position or by unique hash uhash.
i and uhash are mutually exclusive.
Learner does not contain a model. Use $learners() to get learners with models.
ArchiveBatchFSelect$learner(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
learners()
Retrieve list of trained mlr3::Learner objects of the i-th evaluation,
by position or by unique hash uhash. i and uhash are mutually
exclusive.
ArchiveBatchFSelect$learners(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
predictions()
Retrieve list of mlr3::Prediction objects of the i-th evaluation, by
position or by unique hash uhash. i and uhash are mutually
exclusive.
ArchiveBatchFSelect$predictions(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
resample_result()
Retrieve mlr3::ResampleResult of the i-th evaluation, by position
or by unique hash uhash. i and uhash are mutually exclusive.
ArchiveBatchFSelect$resample_result(i = NULL, uhash = NULL)
i(integer(1))
The iteration value to filter for.
uhash(logical(1))
The uhash value to filter for.
print()
Printer.
ArchiveBatchFSelect$print()
...(ignored).
best()
Returns the best scoring feature sets.
ArchiveBatchFSelect$best(batch = NULL, ties_method = NULL)
batch(integer())
The batch number(s) to limit the best results to.
Default is all batches.
ties_method(character(1))
Method to handle ties.
If NULL (default), the global ties method set during initialization is used.
The default global ties method is least_features which selects the feature set with the least features.
If there are multiple best feature sets with the same number of features, one is selected randomly.
The random method returns a random feature set from the best feature sets.
clone()
The objects of this class are cloneable with this method.
ArchiveBatchFSelect$clone(deep = FALSE)
deepWhether to make a deep clone.
Assertions for CallbackAsyncFSelect class.
assert_async_fselect_callback(callback, null_ok = FALSE) assert_async_fselect_callbacks(callbacks)assert_async_fselect_callback(callback, null_ok = FALSE) assert_async_fselect_callbacks(callbacks)
callback |
|
null_ok |
( |
callbacks |
(list of CallbackAsyncFSelect). |
[CallbackAsyncFSelect | List of CallbackAsyncFSelects.
The AutoFSelector wraps a mlr3::Learner and augments it with an automatic feature selection.
The auto_fselector() function creates an AutoFSelector object.
auto_fselector( fselector, learner, resampling, measure = NULL, term_evals = NULL, term_time = NULL, terminator = NULL, store_fselect_instance = TRUE, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL, id = NULL )auto_fselector( fselector, learner, resampling, measure = NULL, term_evals = NULL, term_time = NULL, terminator = NULL, store_fselect_instance = TRUE, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL, id = NULL )
fselector |
(FSelector) |
learner |
(mlr3::Learner) |
resampling |
(mlr3::Resampling) |
measure |
(mlr3::Measure) |
term_evals |
( |
term_time |
( |
terminator |
(bbotk::Terminator) |
store_fselect_instance |
( |
store_benchmark_result |
( |
store_models |
( |
check_values |
( |
callbacks |
(list of CallbackBatchFSelect) |
ties_method |
( |
rush |
( |
id |
( |
The AutoFSelector is a mlr3::Learner which wraps another mlr3::Learner and performs the following steps during $train():
The wrapped (inner) learner is trained on the feature subsets via resampling. The feature selection can be specified by providing a FSelector, a bbotk::Terminator, a mlr3::Resampling and a mlr3::Measure.
A final model is fit on the complete training data with the best-found feature subset.
During $predict() the AutoFSelector just calls the predict method of the wrapped (inner) learner.
There are several sections about feature selection in the mlr3book.
Estimate Model Performance with nested resampling.
The gallery features a collection of case studies and demos about optimization.
Nested resampling can be performed by passing an AutoFSelector object to mlr3::resample() or mlr3::benchmark().
To access the inner resampling results, set store_fselect_instance = TRUE and execute mlr3::resample() or mlr3::benchmark() with store_models = TRUE (see examples).
The mlr3::Resampling passed to the AutoFSelector is meant to be the inner resampling, operating on the training set of an arbitrary outer resampling.
For this reason it is not feasible to pass an instantiated mlr3::Resampling here.
afs = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 4) afs$train(tsk("pima"))afs = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 4) afs$train(tsk("pima"))
The AutoFSelector wraps a mlr3::Learner and augments it with an automatic feature selection.
The auto_fselector() function creates an AutoFSelector object.
The AutoFSelector is a mlr3::Learner which wraps another mlr3::Learner and performs the following steps during $train():
The wrapped (inner) learner is trained on the feature subsets via resampling. The feature selection can be specified by providing a FSelector, a bbotk::Terminator, a mlr3::Resampling and a mlr3::Measure.
A final model is fit on the complete training data with the best-found feature subset.
During $predict() the AutoFSelector just calls the predict method of the wrapped (inner) learner.
There are several sections about feature selection in the mlr3book.
Estimate Model Performance with nested resampling.
The gallery features a collection of case studies and demos about optimization.
Nested resampling can be performed by passing an AutoFSelector object to mlr3::resample() or mlr3::benchmark().
To access the inner resampling results, set store_fselect_instance = TRUE and execute mlr3::resample() or mlr3::benchmark() with store_models = TRUE (see examples).
The mlr3::Resampling passed to the AutoFSelector is meant to be the inner resampling, operating on the training set of an arbitrary outer resampling.
For this reason it is not feasible to pass an instantiated mlr3::Resampling here.
mlr3::Learner -> AutoFSelector
instance_args(list())
All arguments from construction to create the FSelectInstanceBatchSingleCrit.
fselector(FSelector)
Optimization algorithm.
archive([ArchiveBatchFSelect)
Returns FSelectInstanceBatchSingleCrit archive.
learner(mlr3::Learner)
Trained learner.
fselect_instance(FSelectInstanceBatchSingleCrit)
Internally created feature selection instance with all intermediate results.
fselect_result(data.table::data.table)
Short-cut to $result from FSelectInstanceBatchSingleCrit.
predict_type(character(1))
Stores the currently active predict type, e.g. "response".
Must be an element of $predict_types.
hash(character(1))
Hash (unique identifier) for this object.
phash(character(1))
Hash (unique identifier) for this partial object, excluding some components which are varied systematically during tuning (parameter values) or feature selection (feature names).
new()
Creates a new instance of this R6 class.
AutoFSelector$new( fselector, learner, resampling, measure = NULL, terminator, store_fselect_instance = TRUE, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL, id = NULL )
fselector(FSelector)
Optimization algorithm.
learner(mlr3::Learner)
Learner to optimize the feature subset for.
resampling(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets.
Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits.
Already instantiated resamplings are kept unchanged.
measure(mlr3::Measure)
Measure to optimize. If NULL, default measure is used.
terminator(bbotk::Terminator)
Stop criterion of the feature selection.
store_fselect_instance(logical(1))
If TRUE (default), stores the internally created FSelectInstanceBatchSingleCrit with all intermediate results in slot $fselect_instance.
Is set to TRUE, if store_models = TRUE
store_benchmark_result(logical(1))
Store benchmark result in archive?
store_models(logical(1)).
Store models in benchmark result?
check_values(logical(1))
Check the parameters before the evaluation and the results for
validity?
callbacks(list of CallbackBatchFSelect)
List of callbacks.
ties_method(character(1))
The method to break ties when selecting sets while optimizing and when selecting the best set.
Can be "least_features" or "random".
The option "least_features" (default) selects the feature set with the least features.
If there are multiple best feature sets with the same number of features, one is selected randomly.
The random method returns a random feature set from the best feature sets.
Ignored if multiple measures are used.
rush(Rush)
If a rush instance is supplied, the optimization runs without batches.
id(character(1))
Identifier for the new instance.
base_learner()
Extracts the base learner from nested learner objects like GraphLearner in mlr3pipelines.
If recursive = 0, the (tuned) learner is returned.
AutoFSelector$base_learner(recursive = Inf)
recursive(integer(1))
Depth of recursion for multiple nested objects.
importance()
The importance scores of the final model.
AutoFSelector$importance()
Named numeric().
selected_features()
The selected features of the final model. These features are selected internally by the learner.
AutoFSelector$selected_features()
character().
oob_error()
The out-of-bag error of the final model.
AutoFSelector$oob_error()
numeric(1).
loglik()
The log-likelihood of the final model.
AutoFSelector$loglik()
logLik.
Printer.
print()
AutoFSelector$print()
...(ignored).
clone()
The objects of this class are cloneable with this method.
AutoFSelector$clone(deep = FALSE)
deepWhether to make a deep clone.
# Automatic Feature Selection # split to train and external set task = tsk("penguins") split = partition(task, ratio = 0.8) # create auto fselector afs = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) # optimize feature subset and fit final model afs$train(task, row_ids = split$train) # predict with final model afs$predict(task, row_ids = split$test) # show result afs$fselect_result # model slot contains trained learner and fselect instance afs$model # shortcut trained learner afs$learner # shortcut fselect instance afs$fselect_instance # Nested Resampling afs = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) resampling_outer = rsmp("cv", folds = 3) rr = resample(task, afs, resampling_outer, store_models = TRUE) # retrieve inner feature selection results. extract_inner_fselect_results(rr) # performance scores estimated on the outer resampling rr$score() # unbiased performance of the final model trained on the full data set rr$aggregate()# Automatic Feature Selection # split to train and external set task = tsk("penguins") split = partition(task, ratio = 0.8) # create auto fselector afs = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) # optimize feature subset and fit final model afs$train(task, row_ids = split$train) # predict with final model afs$predict(task, row_ids = split$test) # show result afs$fselect_result # model slot contains trained learner and fselect instance afs$model # shortcut trained learner afs$learner # shortcut fselect instance afs$fselect_instance # Nested Resampling afs = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) resampling_outer = rsmp("cv", folds = 3) rr = resample(task, afs, resampling_outer, store_models = TRUE) # retrieve inner feature selection results. extract_inner_fselect_results(rr) # performance scores estimated on the outer resampling rr$score() # unbiased performance of the final model trained on the full data set rr$aggregate()
Function to create a CallbackAsyncFSelect.
Predefined callbacks are stored in the dictionary mlr_callbacks and can be retrieved with clbk().
Feature selection callbacks can be called from different stages of the feature selection process.
The stages are prefixed with on_*.
Start Feature Selection
- on_optimization_begin
Start Worker
- on_worker_begin
Start Optimization on Worker
- on_optimizer_before_eval
Start Evaluation
- on_eval_after_xs
Start Resampling Iteration
- on_resample_begin
- on_resample_before_train
- on_resample_before_predict
- on_resample_end
End Resampling Iteration
- on_eval_after_resample
- on_eval_before_archive
End Evaluation
- on_optimizer_after_eval
End Optimization on Worker
- on_worker_end
End Worker
- on_fselect_result_begin
- on_result_begin
- on_result_end
- on_optimization_end
End Feature Selection
See also the section on parameters for more information on the stages. A feature selection callback works with ContextAsyncFSelect.
callback_async_fselect( id, label = NA_character_, man = NA_character_, on_optimization_begin = NULL, on_worker_begin = NULL, on_optimizer_before_eval = NULL, on_eval_after_xs = NULL, on_resample_begin = NULL, on_resample_before_train = NULL, on_resample_before_predict = NULL, on_resample_end = NULL, on_eval_after_resample = NULL, on_eval_before_archive = NULL, on_optimizer_after_eval = NULL, on_worker_end = NULL, on_fselect_result_begin = NULL, on_result_begin = NULL, on_result_end = NULL, on_result = NULL, on_optimization_end = NULL )callback_async_fselect( id, label = NA_character_, man = NA_character_, on_optimization_begin = NULL, on_worker_begin = NULL, on_optimizer_before_eval = NULL, on_eval_after_xs = NULL, on_resample_begin = NULL, on_resample_before_train = NULL, on_resample_before_predict = NULL, on_resample_end = NULL, on_eval_after_resample = NULL, on_eval_before_archive = NULL, on_optimizer_after_eval = NULL, on_worker_end = NULL, on_fselect_result_begin = NULL, on_result_begin = NULL, on_result_end = NULL, on_result = NULL, on_optimization_end = NULL )
id |
( |
label |
( |
man |
( |
on_optimization_begin |
( |
on_worker_begin |
( |
on_optimizer_before_eval |
( |
on_eval_after_xs |
( |
on_resample_begin |
( |
on_resample_before_train |
( |
on_resample_before_predict |
( |
on_resample_end |
( |
on_eval_after_resample |
( |
on_eval_before_archive |
( |
on_optimizer_after_eval |
( |
on_worker_end |
( |
on_fselect_result_begin |
( |
on_result_begin |
( |
on_result_end |
( |
on_result |
( |
on_optimization_end |
( |
When implementing a callback, each function must have two arguments named callback and context.
A callback can write data to the state ($state), e.g. settings that affect the callback itself.
Feature selection callbacks access ContextAsyncFSelect and mlr3::ContextResample.
Function to create a CallbackBatchFSelect.
Predefined callbacks are stored in the dictionary mlr_callbacks and can be retrieved with clbk().
Feature selection callbacks can be called from different stages of feature selection.
The stages are prefixed with on_*.
The on_auto_fselector_* stages are only available when the callback is used in an AutoFSelector.
Start Automatic Feature Selection
Start Feature Selection
- on_optimization_begin
Start FSelect Batch
- on_optimizer_before_eval
Start Evaluation
- on_eval_after_design
- on_eval_after_benchmark
- on_eval_before_archive
End Evaluation
- on_optimizer_after_eval
End FSelect Batch
- on_result
- on_optimization_end
End Feature Selection
- on_auto_fselector_before_final_model
- on_auto_fselector_after_final_model
End Automatic Feature Selection
See also the section on parameters for more information on the stages. A feature selection callback works with bbotk::ContextBatch and ContextBatchFSelect.
callback_batch_fselect( id, label = NA_character_, man = NA_character_, on_optimization_begin = NULL, on_optimizer_before_eval = NULL, on_eval_after_design = NULL, on_eval_after_benchmark = NULL, on_eval_before_archive = NULL, on_optimizer_after_eval = NULL, on_result = NULL, on_optimization_end = NULL, on_auto_fselector_before_final_model = NULL, on_auto_fselector_after_final_model = NULL )callback_batch_fselect( id, label = NA_character_, man = NA_character_, on_optimization_begin = NULL, on_optimizer_before_eval = NULL, on_eval_after_design = NULL, on_eval_after_benchmark = NULL, on_eval_before_archive = NULL, on_optimizer_after_eval = NULL, on_result = NULL, on_optimization_end = NULL, on_auto_fselector_before_final_model = NULL, on_auto_fselector_after_final_model = NULL )
id |
( |
label |
( |
man |
( |
on_optimization_begin |
( |
on_optimizer_before_eval |
( |
on_eval_after_design |
( |
on_eval_after_benchmark |
( |
on_eval_before_archive |
( |
on_optimizer_after_eval |
( |
on_result |
( |
on_optimization_end |
( |
on_auto_fselector_before_final_model |
( |
on_auto_fselector_after_final_model |
( |
When implementing a callback, each function must have two arguments named callback and context.
A callback can write data to the state ($state), e.g. settings that affect the callback itself.
Avoid writing large data the state.
# Write archive to disk callback_batch_fselect("mlr3fselect.backup", on_optimization_end = function(callback, context) { saveRDS(context$instance$archive, "archive.rds") } )# Write archive to disk callback_batch_fselect("mlr3fselect.backup", on_optimization_end = function(callback, context) { saveRDS(context$instance$archive, "archive.rds") } )
Specialized bbotk::CallbackAsync for asynchronous feature selection.
Callbacks allow to customize the behavior of processes in mlr3fselect.
The callback_async_fselect() function creates a CallbackAsyncFSelect.
Predefined callbacks are stored in the dictionary mlr_callbacks and can be retrieved with clbk().
For more information on feature selection callbacks see callback_async_fselect().
mlr3misc::Callback -> bbotk::CallbackAsync -> CallbackAsyncFSelect
on_eval_after_xs(function())
Stage called after xs is passed.
Called in ObjectiveFSelectAsync$eval().
on_resample_begin(function())
Stage called at the beginning of an evaluation.
Called in workhorse() (internal).
on_resample_before_train(function())
Stage called before training the learner.
Called in workhorse() (internal).
on_resample_before_predict(function())
Stage called before predicting.
Called in workhorse() (internal).
on_resample_end(function())
Stage called at the end of an evaluation.
Called in workhorse() (internal).
on_eval_after_resample(function())
Stage called after feature subsets are evaluated.
Called in ObjectiveFSelectAsync$eval().
on_eval_before_archive(function())
Stage called before performance values are written to the archive.
Called in ObjectiveFSelectAsync$eval().
on_fselect_result_begin(function())
Stage called before the results are written.
Called in FSelectInstance*$assign_result().
clone()
The objects of this class are cloneable with this method.
CallbackAsyncFSelect$clone(deep = FALSE)
deepWhether to make a deep clone.
Specialized bbotk::CallbackBatch for feature selection.
Callbacks allow customizing the behavior of processes in mlr3fselect.
The callback_batch_fselect() function creates a CallbackBatchFSelect.
Predefined callbacks are stored in the dictionary mlr_callbacks and can be retrieved with clbk().
For more information on callbacks see callback_batch_fselect().
mlr3misc::Callback -> bbotk::CallbackBatch -> CallbackBatchFSelect
on_eval_after_design(function())
Stage called after design is created.
Called in ObjectiveFSelectBatch$eval_many().
on_eval_after_benchmark(function())
Stage called after feature sets are evaluated.
Called in ObjectiveFSelectBatch$eval_many().
on_eval_before_archive(function())
Stage called before performance values are written to the archive.
Called in ObjectiveFSelectBatch$eval_many().
on_auto_fselector_before_final_model(function())
Stage called before the final model is trained.
Called in AutoFSelector$train().
This stage is called after the optimization has finished and the final model is trained with the best feature set found.
on_auto_fselector_after_final_model(function())
Stage called after the final model is trained.
Called in AutoFSelector$train().
This stage is called after the final model is trained with the best feature set found.
clone()
The objects of this class are cloneable with this method.
CallbackBatchFSelect$clone(deep = FALSE)
deepWhether to make a deep clone.
# Write archive to disk callback_batch_fselect("mlr3fselect.backup", on_optimization_end = function(callback, context) { saveRDS(context$instance$archive, "archive.rds") } )# Write archive to disk callback_batch_fselect("mlr3fselect.backup", on_optimization_end = function(callback, context) { saveRDS(context$instance$archive, "archive.rds") } )
A CallbackAsyncFSelect accesses and modifies data during the optimization via the ContextAsyncFSelect.
See the section on active bindings for a list of modifiable objects.
See callback_async_fselect() for a list of stages that access ContextAsyncFSelect.
Changes to $instance and $optimizer in the stages executed on the workers are not reflected in the main process.
mlr3misc::Context -> bbotk::ContextAsync -> ContextAsyncFSelect
auto_fselector(AutoFSelector)
The AutoFSelector instance.
xs_objective(list())
The feature subset currently evaluated.
resample_result(mlr3::BenchmarkResult)
The resample result of the feature subset currently evaluated.
aggregated_performance(list())
Aggregated performance scores and training time of the evaluated feature subset.
This list is passed to the archive.
A callback can add additional elements which are also written to the archive.
result_feature_set(character())
The feature set passed to instance$assign_result().
clone()
The objects of this class are cloneable with this method.
ContextAsyncFSelect$clone(deep = FALSE)
deepWhether to make a deep clone.
The ContextBatchFSelect allows CallbackBatchFSelects to access and modify data while a batch of feature sets is evaluated.
See the section on active bindings for a list of modifiable objects.
See callback_batch_fselect() for a list of stages that access ContextBatchFSelect.
This context is re-created each time a new batch of feature sets is evaluated.
Changes to $objective_fselect, $design $benchmark_result are discarded after the function is finished.
Modification on the data table in $aggregated_performance are written to the archive.
Any number of columns can be added.
mlr3misc::Context -> bbotk::ContextBatch -> ContextBatchFSelect
auto_fselector(AutoFSelector)
The AutoFSelector instance.
xss(list())
The feature sets of the latest batch.
design(data.table::data.table)
The benchmark design of the latest batch.
benchmark_result(mlr3::BenchmarkResult)
The benchmark result of the latest batch.
aggregated_performance(data.table::data.table)
Aggregated performance scores and training time of the latest batch.
This data table is passed to the archive.
A callback can add additional columns which are also written to the archive.
clone()
The objects of this class are cloneable with this method.
ContextBatchFSelect$clone(deep = FALSE)
deepWhether to make a deep clone.
Ensemble feature selection using multiple learners. The ensemble feature selection method is designed to identify the most predictive features from a given dataset by leveraging multiple machine learning models and resampling techniques. Returns an EnsembleFSResult.
embedded_ensemble_fselect( task, learners, init_resampling, measure, store_benchmark_result = TRUE )embedded_ensemble_fselect( task, learners, init_resampling, measure, store_benchmark_result = TRUE )
task |
(mlr3::Task) |
learners |
(list of mlr3::Learner) |
init_resampling |
(mlr3::Resampling) |
measure |
(mlr3::Measure) |
store_benchmark_result |
( |
The method begins by applying an initial resampling technique specified by the user, to create multiple subsamples from the original dataset (train/test splits). This resampling process helps in generating diverse subsets of data for robust feature selection.
For each subsample (train set) generated in the previous step, the method applies learners that support embedded feature selection. These learners are then scored on their ability to predict on the resampled test sets, storing the selected features during training, for each combination of subsample and learner.
Results are stored in an EnsembleFSResult.
an EnsembleFSResult object.
Meinshausen, Nicolai, Buhlmann, Peter (2010). “Stability Selection.” Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(4), 417–473. ISSN 1369-7412, doi:10.1111/J.1467-9868.2010.00740.X, 0809.2932.
Hedou, Julien, Maric, Ivana, Bellan, Gregoire, Einhaus, Jakob, Gaudilliere, K. D, Ladant, Xavier F, Verdonk, Franck, Stelzer, A. I, Feyaerts, Dorien, Tsai, S. A, Ganio, A. E, Sabayev, Maximilian, Gillard, Joshua, Amar, Jonas, Cambriel, Amelie, Oskotsky, T. T, Roldan, Alennie, Golob, L. J, Sirota, Marina, Bonham, A. T, Sato, Masaki, Diop, Maigane, Durand, Xavier, Angst, S. M, Stevenson, K. D, Aghaeepour, Nima, Montanari, Andrea, Gaudilliere, Brice (2024). “Discovery of sparse, reliable omic biomarkers with Stabl.” Nature Biotechnology 2024, 1–13. ISSN 1546-1696, doi:10.1038/s41587-023-02033-x, https://www.nature.com/articles/s41587-023-02033-x.
eefsr = embedded_ensemble_fselect( task = tsk("sonar"), learners = lrns(c("classif.rpart", "classif.featureless")), init_resampling = rsmp("subsampling", repeats = 5), measure = msr("classif.ce") ) eefsreefsr = embedded_ensemble_fselect( task = tsk("sonar"), learners = lrns(c("classif.rpart", "classif.featureless")), init_resampling = rsmp("subsampling", repeats = 5), measure = msr("classif.ce") ) eefsr
The EnsembleFSResult stores the results of ensemble feature selection.
It includes methods for evaluating the stability of the feature selection process and for ranking the selected features among others.
Both functions ensemble_fselect() and embedded_ensemble_fselect() return an object of this class.
as.data.table.EnsembleFSResult(x, benchmark_result = TRUE)
Returns a tabular view of the ensemble feature selection.
EnsembleFSResult -> data.table::data.table()
x (EnsembleFSResult)
benchmark_result (logical(1))
Whether to add the learner, task and resampling information from the benchmark result.
c(...)
(EnsembleFSResult, ...) -> EnsembleFSResult
Combines multiple EnsembleFSResult objects into a new EnsembleFSResult.
benchmark_result(mlr3::BenchmarkResult)
The benchmark result.
man(character(1))
Manual page for this object.
result(data.table::data.table)
Returns the result of the ensemble feature selection.
n_learners(numeric(1))
Returns the number of learners used in the ensemble feature selection.
measure(mlr3::Measure)
Returns the 'active' measure that is used in methods of this object.
active_measure(character(1))
Indicates the type of the active performance measure.
During the ensemble feature selection process, the dataset is split into multiple subsamples (train/test splits) using an initial resampling scheme. So, performance can be evaluated using one of two measures:
"outer": measure used to evaluate the performance on the test sets.
"inner": measure used for optimization and to compute performance during inner resampling on the training sets.
n_resamples(character(1))
Returns the number of times the task was initially resampled in the ensemble feature selection process.
new()
Creates a new instance of this R6 class.
EnsembleFSResult$new( result, features, benchmark_result = NULL, measure, inner_measure = NULL )
result(data.table::data.table)
The result of the ensemble feature selection.
Mandatory column names should include "resampling_iteration", "learner_id",
"features" and "n_features".
A column named as {measure$id} (scores on the test sets) must also be
always present.
The column with the performance scores on the inner resampling of the train sets is not mandatory,
but note that it should be named as {inner_measure$id}_inner to distinguish from
the {measure$id}.
features(character())
The vector of features of the task that was used in the ensemble feature
selection.
benchmark_result(mlr3::BenchmarkResult)
The benchmark result object.
measure(mlr3::Measure)
The performance measure used to evaluate the learners on the test sets generated
during the ensemble feature selection process.
By default, this serves as the 'active' measure for the methods of this object.
The active measure can be updated using the $set_active_measure() method.
inner_measure(mlr3::Measure)
The performance measure used to optimize and evaluate the learners during the inner resampling process of the training sets, generated as part of the ensemble feature selection procedure.
format()
Helper for print outputs.
EnsembleFSResult$format(...)
...(ignored).
print()
Printer.
EnsembleFSResult$print(...)
...(ignored).
help()
Opens the corresponding help page referenced by field $man.
EnsembleFSResult$help()
set_active_measure()
Use this function to change the active measure.
EnsembleFSResult$set_active_measure(which = "outer")
which(character(1))
Which measure from the ensemble feature selection result
to use in methods of this object.
Should be either "inner" (optimization measure used in training sets)
or "outer" (measure used in test sets, default value).
combine()
Combines a second EnsembleFSResult into the current object, modifying it in-place.
If the second EnsembleFSResult (efsr) is NULL, the method returns the object unmodified.
Both objects must have the same task features and measure.
If the inner_measure differs between the objects or is NULL in either, it will be set to NULL in the combined object.
Additionally, the importance column will be removed if it is missing in either object.
If both objects contain a benchmark_result, these will be combined.
Otherwise, the combined object will have a NULL value for benchmark_result.
This method modifies the object by reference.
To preserve the original state, explicitly $clone() the object beforehand.
Alternatively, you can use the c() function, which internally calls this method.
EnsembleFSResult$combine(efsr)
efsr(EnsembleFSResult)
A second EnsembleFSResult object to combine with the current object.
Returns the object itself, but modified by reference.
feature_ranking()
Calculates the feature ranking via fastVoteR::rank_candidates().
EnsembleFSResult$feature_ranking( method = "av", use_weights = TRUE, committee_size = NULL, shuffle_features = TRUE )
method(character(1))
The method to calculate the feature ranking. See fastVoteR::rank_candidates()
for a complete list of available methods.
Approval voting ("av") is the default method.
use_weights(logical(1))
The default value (TRUE) uses weights equal to the performance scores
of each voter/model (or the inverse scores if the measure is minimized).
If FALSE, we treat all voters as equal and assign them all a weight equal to 1.
committee_size(integer(1))
Number of top selected features in the output ranking.
This parameter can be used to speed-up methods that build a committee sequentially
("seq_pav"), by requesting only the top N selected candidates/features
and not the complete feature ranking.
shuffle_features(logical(1))
Whether to shuffle the task features randomly before computing the ranking.
Shuffling ensures consistent random tie-breaking across methods and prevents
deterministic biases when features with equal scores are encountered.
Default is TRUE and it's advised to set a seed before running this function.
Set to FALSE if deterministic ordering of features is preferred (same as
during initialization).
The feature ranking process is built on the following framework: models act as voters, features act as candidates, and voters select certain candidates (features). The primary objective is to compile these selections into a consensus ranked list of features, effectively forming a committee.
For every feature a score is calculated, which depends on the "method" argument.
The higher the score, the higher the ranking of the feature.
Note that some methods output a feature ranking instead of a score per feature, so we always include Borda's score, which is method-agnostic, i.e. it can be used to compare the feature rankings across different methods.
We shuffle the input candidates/features so that we enforce random tie-breaking.
Users should set the same seed for consistent comparison between the different feature ranking methods and for reproducibility.
A data.table::data.table listing all the features, ordered by decreasing scores (depends on the "method"). Columns are as follows:
"feature": Feature names.
"score": Scores assigned to each feature based on the selected method (if applicable).
"norm_score": Normalized scores (if applicable), scaled to the range , which can be loosely interpreted as selection probabilities (Meinshausen et al. (2010)).
"borda_score": Borda scores for method-agnostic comparison, ranging in , where the top feature receives a score of 1 and the lowest-ranked feature receives a score of 0.
This column is always included so that feature ranking methods that output only rankings have also a feature-wise score.
stability()
Calculates the stability of the selected features with the stabm package. The results are cached. When the same stability measure is requested again with different arguments, the cache must be reset.
EnsembleFSResult$stability( stability_measure = "jaccard", stability_args = NULL, global = TRUE, reset_cache = FALSE )
stability_measure(character(1))
The stability measure to be used.
One of the measures returned by stabm::listStabilityMeasures() in lower case.
Default is "jaccard".
stability_args(list)
Additional arguments passed to the stability measure function.
global(logical(1))
Whether to calculate the stability globally or for each learner.
reset_cache(logical(1))
If TRUE, the cached results are ignored.
A numeric() value representing the stability of the selected features.
Or a numeric() vector with the stability of the selected features for each learner.
pareto_front()
This function identifies the Pareto front of the ensemble feature selection process, i.e., the set of points that represent the trade-off between the number of features and performance (e.g. classification error).
EnsembleFSResult$pareto_front(type = "empirical", max_nfeatures = NULL)
type(character(1))
Specifies the type of Pareto front to return. See details.
max_nfeatures(integer(1))
Specifies the maximum number of features for which the estimated Pareto
front is computed. Applicable only when type = "estimated".
If NULL (default), the maximum number of features
is determined by the ensemble feature selection process.
Two options are available for the Pareto front:
"empirical" (default): returns the empirical Pareto front.
"estimated": the Pareto front points are estimated by fitting a linear model with the inversed of the number of features () as input and the associated performance scores as output.
This method is useful when the Pareto points are sparse and the front assumes a convex shape if better performance corresponds to lower measure values (e.g. classification error), or a concave shape otherwise (e.g. classification accuracy).
When type = "estimated", the estimated Pareto front includes points with the number of features ranging from 1 up to max_nfeatures.
If max_nfeatures is not provided, it defaults to the maximum number of features available in the ensemble feature selection result, i.e. the maximum out of all learners and resamplings included.
A data.table::data.table with columns the number of features and the performance that together form the Pareto front.
knee_points()
This function implements various knee point identification (KPI) methods, which select points in the Pareto front, such that an optimal trade-off between performance and number of features is achieved. In most cases, only one such point is returned.
EnsembleFSResult$knee_points( method = "NBI", type = "empirical", max_nfeatures = NULL )
method(character(1))
Type of method to use to identify the knee point.
type(character(1))
Specifies the type of Pareto front to use for the identification of the knee point.
max_nfeatures(integer(1))
Specifies the maximum number of features for which the estimated Pareto
front is computed. Applicable only when type = "estimated".
If NULL (default), the maximum number of features
is determined by the ensemble feature selection process.
See pareto_front() method for more details.
The available KPI methods are:
"NBI" (default): The Normal-Boundary Intersection method is a geometry-based method which calculates the perpendicular distance of each point from the line connecting the first and last points of the Pareto front.
The knee point is determined as the Pareto point with the maximum distance from this line, see Das (1999).
A data.table::data.table with the knee point(s) of the Pareto front.
clone()
The objects of this class are cloneable with this method.
EnsembleFSResult$clone(deep = FALSE)
deepWhether to make a deep clone.
Das, I (1999). “On characterizing the 'knee' of the Pareto curve based on normal-boundary intersection.” Structural Optimization, 18(1-2), 107–115. ISSN 09344373.
Meinshausen, Nicolai, Buhlmann, Peter (2010). “Stability Selection.” Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(4), 417–473. ISSN 1369-7412, doi:10.1111/J.1467-9868.2010.00740.X, 0809.2932.
efsr = ensemble_fselect( fselector = fs("rfe", n_features = 2, feature_fraction = 0.8), task = tsk("sonar"), learners = lrns(c("classif.rpart", "classif.featureless")), init_resampling = rsmp("subsampling", repeats = 2), inner_resampling = rsmp("cv", folds = 3), inner_measure = msr("classif.ce"), measure = msr("classif.acc"), terminator = trm("none") ) # contains the benchmark result efsr$benchmark_result # contains the selected features for each iteration efsr$result # returns the stability of the selected features efsr$stability(stability_measure = "jaccard") # returns a ranking of all features head(efsr$feature_ranking()) # returns the empirical pareto front, i.e. n_features vs measure (error) efsr$pareto_front() # returns the knee points (optimal trade-off between n_features and performance) efsr$knee_points() # change to use the inner optimization measure efsr$set_active_measure(which = "inner") # Pareto front is calculated on the inner measure efsr$pareto_front()efsr = ensemble_fselect( fselector = fs("rfe", n_features = 2, feature_fraction = 0.8), task = tsk("sonar"), learners = lrns(c("classif.rpart", "classif.featureless")), init_resampling = rsmp("subsampling", repeats = 2), inner_resampling = rsmp("cv", folds = 3), inner_measure = msr("classif.ce"), measure = msr("classif.acc"), terminator = trm("none") ) # contains the benchmark result efsr$benchmark_result # contains the selected features for each iteration efsr$result # returns the stability of the selected features efsr$stability(stability_measure = "jaccard") # returns a ranking of all features head(efsr$feature_ranking()) # returns the empirical pareto front, i.e. n_features vs measure (error) efsr$pareto_front() # returns the knee points (optimal trade-off between n_features and performance) efsr$knee_points() # change to use the inner optimization measure efsr$set_active_measure(which = "inner") # Pareto front is calculated on the inner measure efsr$pareto_front()
Ensemble feature selection using multiple learners. The ensemble feature selection method is designed to identify the most predictive features from a given dataset by leveraging multiple machine learning models and resampling techniques. Returns an EnsembleFSResult.
ensemble_fselect( fselector, task, learners, init_resampling, inner_resampling, inner_measure, measure, terminator, callbacks = NULL, store_benchmark_result = TRUE, store_models = FALSE )ensemble_fselect( fselector, task, learners, init_resampling, inner_resampling, inner_measure, measure, terminator, callbacks = NULL, store_benchmark_result = TRUE, store_models = FALSE )
fselector |
(FSelector) |
task |
(mlr3::Task) |
learners |
(list of mlr3::Learner) |
init_resampling |
(mlr3::Resampling) |
inner_resampling |
(mlr3::Resampling) |
inner_measure |
(mlr3::Measure) |
measure |
(mlr3::Measure) |
terminator |
(bbotk::Terminator) |
callbacks |
(Named list of lists of CallbackBatchFSelect) |
store_benchmark_result |
( |
store_models |
( |
The method begins by applying an initial resampling technique specified by the user, to create multiple subsamples from the original dataset (train/test splits). This resampling process helps in generating diverse subsets of data for robust feature selection.
For each subsample (train set) generated in the previous step, the method performs wrapped-based feature selection (auto_fselector) using each provided learner, the given inner resampling method, inner performance measure and optimization algorithm. This process generates 1) the best feature subset and 2) a final trained model using these best features, for each combination of subsample and learner. The final models are then scored on their ability to predict on the resampled test sets.
Results are stored in an EnsembleFSResult.
The result object also includes the performance scores calculated during the inner resampling of the training sets, using models with the best feature subsets.
These scores are stored in a column named {measure_id}_inner.
an EnsembleFSResult object.
The active measure of performance is the one applied to the test sets.
This is preferred, as inner resampling scores on the training sets are likely to be overestimated when using the final models.
Users can change the active measure by using the set_active_measure() method of the EnsembleFSResult.
Saeys, Yvan, Abeel, Thomas, Van De Peer, Yves (2008). “Robust feature selection using ensemble feature selection techniques.” Machine Learning and Knowledge Discovery in Databases, 5212 LNAI, 313–325. doi:10.1007/978-3-540-87481-2_21.
Abeel, Thomas, Helleputte, Thibault, Van de Peer, Yves, Dupont, Pierre, Saeys, Yvan (2010). “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.” Bioinformatics, 26, 392–398. ISSN 1367-4803, doi:10.1093/BIOINFORMATICS/BTP630.
Pes, Barbara (2020). “Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains.” Neural Computing and Applications, 32(10), 5951–5973. ISSN 14333058, doi:10.1007/s00521-019-04082-3.
efsr = ensemble_fselect( fselector = fs("random_search"), task = tsk("sonar"), learners = lrns(c("classif.rpart", "classif.featureless")), init_resampling = rsmp("subsampling", repeats = 2), inner_resampling = rsmp("cv", folds = 3), inner_measure = msr("classif.ce"), measure = msr("classif.acc"), terminator = trm("evals", n_evals = 10) ) efsrefsr = ensemble_fselect( fselector = fs("random_search"), task = tsk("sonar"), learners = lrns(c("classif.rpart", "classif.featureless")), init_resampling = rsmp("subsampling", repeats = 2), inner_resampling = rsmp("cv", folds = 3), inner_measure = msr("classif.ce"), measure = msr("classif.acc"), terminator = trm("evals", n_evals = 10) ) efsr
Extract inner feature selection archives of nested resampling.
Implemented for mlr3::ResampleResult and mlr3::BenchmarkResult.
The function iterates over the AutoFSelector objects and binds the archives to a data.table::data.table().
AutoFSelector must be initialized with store_fselect_instance = TRUE and resample() or benchmark() must be called with store_models = TRUE.
extract_inner_fselect_archives(x, exclude_columns = "uhash")extract_inner_fselect_archives(x, exclude_columns = "uhash")
x |
|
exclude_columns |
( |
The returned data table has the following columns:
experiment (integer(1))
Index, giving the according row number in the original benchmark grid.
iteration (integer(1))
Iteration of the outer resampling.
One column for each feature of the task.
One column for each performance measure.
runtime_learners (numeric(1))
Sum of training and predict times logged in learners per
mlr3::ResampleResult / evaluation. This does not include potential
overhead time.
timestamp (POSIXct)
Time stamp when the evaluation was logged into the archive.
batch_nr (integer(1))
Feature sets are evaluated in batches. Each batch has a unique batch
number.
resample_result (mlr3::ResampleResult)
Resample result of the inner resampling.
task_id (character(1)).
learner_id (character(1)).
resampling_id (character(1)).
# Nested Resampling on Palmer Penguins Data Set # create auto fselector at = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) resampling_outer = rsmp("cv", folds = 2) rr = resample(tsk("penguins"), at, resampling_outer, store_models = TRUE) # extract inner archives extract_inner_fselect_archives(rr)# Nested Resampling on Palmer Penguins Data Set # create auto fselector at = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) resampling_outer = rsmp("cv", folds = 2) rr = resample(tsk("penguins"), at, resampling_outer, store_models = TRUE) # extract inner archives extract_inner_fselect_archives(rr)
Extract inner feature selection results of nested resampling. Implemented for mlr3::ResampleResult and mlr3::BenchmarkResult.
extract_inner_fselect_results(x, fselect_instance, ...)extract_inner_fselect_results(x, fselect_instance, ...)
x |
|
fselect_instance |
( |
... |
(any) |
The function iterates over the AutoFSelector objects and binds the feature selection results to a data.table::data.table().
AutoFSelector must be initialized with store_fselect_instance = TRUE and resample() or benchmark() must be called with store_models = TRUE.
Optionally, the instance can be added for each iteration.
The returned data table has the following columns:
experiment (integer(1))
Index, giving the according row number in the original benchmark grid.
iteration (integer(1))
Iteration of the outer resampling.
One column for each feature of the task.
One column for each performance measure.
features (character())
Vector of selected feature set.
task_id (character(1)).
learner_id (character(1)).
resampling_id (character(1)).
# Nested Resampling on Palmer Penguins Data Set # create auto fselector at = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) resampling_outer = rsmp("cv", folds = 2) rr = resample(tsk("iris"), at, resampling_outer, store_models = TRUE) # extract inner results extract_inner_fselect_results(rr)# Nested Resampling on Palmer Penguins Data Set # create auto fselector at = auto_fselector( fselector = fs("random_search"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measure = msr("classif.ce"), term_evals = 4) resampling_outer = rsmp("cv", folds = 2) rr = resample(tsk("iris"), at, resampling_outer, store_models = TRUE) # extract inner results extract_inner_fselect_results(rr)
Aggregates a mlr3::ResampleResult or mlr3::BenchmarkResult for a single simple measure. Returns the aggregated score for each resample result.
faggregate(obj, measure, conditions = FALSE)faggregate(obj, measure, conditions = FALSE)
obj |
|
measure |
|
conditions |
( |
This function is faster than $aggregate() because it does not reassemble the resampling results.
It only works on simple measures which do not require the task, learner, model or train set to be available.
Functions to retrieve objects, set parameters and assign to fields in one go.
Relies on mlr3misc::dictionary_sugar_get() to extract objects from the respective mlr3misc::Dictionary:
fs() for a FSelector from mlr_fselectors.
fss() for a list of FSelectors from mlr_fselectors.
trm() for a bbotk::Terminator from mlr_terminators.
trms() for a list of Terminators from mlr_terminators.
fs(.key, ...) fss(.keys, ...)fs(.key, ...) fss(.keys, ...)
.key |
( |
... |
(any) |
.keys |
( |
R6::R6Class object of the respective type, or a list of R6::R6Class objects for the plural versions.
# random search fselector with batch size of 5 fs("random_search", batch_size = 5) # run time terminator with 20 seconds trm("run_time", secs = 20)# random search fselector with batch size of 5 fs("random_search", batch_size = 5) # run time terminator with 20 seconds trm("run_time", secs = 20)
Function to optimize the features of a mlr3::Learner.
The function internally creates a FSelectInstanceBatchSingleCrit or FSelectInstanceBatchMultiCrit which describes the feature selection problem.
It executes the feature selection with the FSelector (fselector) and returns the result with the feature selection instance ($result).
The ArchiveBatchFSelect and ArchiveAsyncFSelect ($archive) stores all evaluated feature subsets and performance scores.
You can find an overview of all feature selectors on our website.
fselect( fselector, task, learner, resampling, measures = NULL, term_evals = NULL, term_time = NULL, terminator = NULL, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL )fselect( fselector, task, learner, resampling, measures = NULL, term_evals = NULL, term_time = NULL, terminator = NULL, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL )
fselector |
(FSelector) |
task |
(mlr3::Task) |
learner |
(mlr3::Learner) |
resampling |
(mlr3::Resampling) |
measures |
(mlr3::Measure or list of mlr3::Measure) |
term_evals |
( |
term_time |
( |
terminator |
(bbotk::Terminator) |
store_benchmark_result |
( |
store_models |
( |
check_values |
( |
callbacks |
(list of CallbackBatchFSelect) |
ties_method |
( |
rush |
( |
The mlr3::Task, mlr3::Learner, mlr3::Resampling, mlr3::Measure and bbotk::Terminator are used to construct a FSelectInstanceBatchSingleCrit.
If multiple performance mlr3::Measures are supplied, a FSelectInstanceBatchMultiCrit is created.
The parameter term_evals and term_time are shortcuts to create a bbotk::Terminator.
If both parameters are passed, a bbotk::TerminatorCombo is constructed.
For other Terminators, pass one with terminator.
If no termination criterion is needed, set term_evals, term_time and terminator to NULL.
FSelectInstanceBatchSingleCrit | FSelectInstanceBatchMultiCrit
If no measure is passed, the default measure is used. The default measure depends on the task type.
| Task | Default Measure | Package |
"classif" |
"classif.ce" |
mlr3 |
"regr" |
"regr.mse" |
mlr3 |
"surv" |
"surv.cindex" |
mlr3proba |
"dens" |
"dens.logloss" |
mlr3proba |
"classif_st" |
"classif.ce" |
mlr3spatial |
"regr_st" |
"regr.mse" |
mlr3spatial |
"clust" |
"clust.dunn" |
mlr3cluster |
There are several sections about feature selection in the mlr3book.
Getting started with wrapper feature selection.
Do a sequential forward selection Palmer Penguins data set.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
For analyzing the feature selection results, it is recommended to pass the archive to as.data.table().
The returned data table is joined with the benchmark result which adds the mlr3::ResampleResult for each feature set.
The archive provides various getters (e.g. $learners()) to ease the access.
All getters extract by position (i) or unique hash (uhash).
For a complete list of all getters see the methods section.
The benchmark result ($benchmark_result) allows to score the feature sets again on a different measure.
Alternatively, measures can be supplied to as.data.table().
# Feature selection on the Pima Indians data set task = tsk("pima") # Load learner learner = lrn("classif.rpart") # Run feature selection instance = fselect( fselector = fs("random_search", batch_size = 2), task = task, learner = learner, resampling = rsmp ("holdout"), measures = msr("classif.ce"), term_evals = 4) # Subset task to optimized feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated feature subsets as.data.table(instance$archive)# Feature selection on the Pima Indians data set task = tsk("pima") # Load learner learner = lrn("classif.rpart") # Run feature selection instance = fselect( fselector = fs("random_search", batch_size = 2), task = task, learner = learner, resampling = rsmp ("holdout"), measures = msr("classif.ce"), term_evals = 4) # Subset task to optimized feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated feature subsets as.data.table(instance$archive)
Function to conduct nested resampling.
fselect_nested( fselector, task, learner, inner_resampling, outer_resampling, measure = NULL, term_evals = NULL, term_time = NULL, terminator = NULL, store_fselect_instance = TRUE, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features" )fselect_nested( fselector, task, learner, inner_resampling, outer_resampling, measure = NULL, term_evals = NULL, term_time = NULL, terminator = NULL, store_fselect_instance = TRUE, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features" )
fselector |
(FSelector) |
task |
(mlr3::Task) |
learner |
(mlr3::Learner) |
inner_resampling |
(mlr3::Resampling) |
outer_resampling |
mlr3::Resampling) |
measure |
(mlr3::Measure) |
term_evals |
( |
term_time |
( |
terminator |
(bbotk::Terminator) |
store_fselect_instance |
( |
store_benchmark_result |
( |
store_models |
( |
check_values |
( |
callbacks |
(list of CallbackBatchFSelect) |
ties_method |
( |
# Nested resampling on Palmer Penguins data set rr = fselect_nested( fselector = fs("random_search"), task = tsk("penguins"), learner = lrn("classif.rpart"), inner_resampling = rsmp ("holdout"), outer_resampling = rsmp("cv", folds = 2), measure = msr("classif.ce"), term_evals = 4) # Performance scores estimated on the outer resampling rr$score() # Unbiased performance of the final model trained on the full data set rr$aggregate()# Nested resampling on Palmer Penguins data set rr = fselect_nested( fselector = fs("random_search"), task = tsk("penguins"), learner = lrn("classif.rpart"), inner_resampling = rsmp ("holdout"), outer_resampling = rsmp("cv", folds = 2), measure = msr("classif.ce"), term_evals = 4) # Performance scores estimated on the outer resampling rr$score() # Unbiased performance of the final model trained on the full data set rr$aggregate()
The FSelectInstanceAsyncMultiCrit specifies a feature selection problem for a FSelectorAsync.
The function fsi_async() creates a FSelectInstanceAsyncMultiCrit and the function fselect() creates an instance internally.
If no measure is passed, the default measure is used. The default measure depends on the task type.
| Task | Default Measure | Package |
"classif" |
"classif.ce" |
mlr3 |
"regr" |
"regr.mse" |
mlr3 |
"surv" |
"surv.cindex" |
mlr3proba |
"dens" |
"dens.logloss" |
mlr3proba |
"classif_st" |
"classif.ce" |
mlr3spatial |
"regr_st" |
"regr.mse" |
mlr3spatial |
"clust" |
"clust.dunn" |
mlr3cluster |
For analyzing the feature selection results, it is recommended to pass the ArchiveAsyncFSelect to as.data.table().
The returned data table contains the mlr3::ResampleResult for each feature subset evaluation.
There are several sections about feature selection in the mlr3book.
Getting started with wrapper feature selection.
Do a sequential forward selection Palmer Penguins data set.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
bbotk::OptimInstance -> bbotk::OptimInstanceAsync -> bbotk::OptimInstanceAsyncMultiCrit -> FSelectInstanceAsyncMultiCrit
new()
Creates a new instance of this R6 class.
FSelectInstanceAsyncMultiCrit$new( task, learner, resampling, measures, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, rush = NULL )
task(mlr3::Task)
Task to operate on.
learner(mlr3::Learner)
Learner to optimize the feature subset for.
resampling(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets.
Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits.
Already instantiated resamplings are kept unchanged.
measures(list of mlr3::Measure)
Measures to optimize.
If NULL, mlr3's default measure is used.
terminator(bbotk::Terminator)
Stop criterion of the feature selection.
store_benchmark_result(logical(1))
Store benchmark result in archive?
store_models(logical(1)).
Store models in benchmark result?
check_values(logical(1))
Check the parameters before the evaluation and the results for
validity?
callbacks(list of CallbackBatchFSelect)
List of callbacks.
rush(Rush)
If a rush instance is supplied, the optimization runs without batches.
assign_result()
The FSelectorAsync object writes the best found points and estimated performance values here (probably the Pareto set / front). For internal use.
FSelectInstanceAsyncMultiCrit$assign_result(xdt, ydt, extra = NULL, ...)
xdt(data.table::data.table())
x values as data.table. Each row is one point. Contains the value in
the search space of the FSelectInstanceBatchMultiCrit object. Can contain
additional columns for extra information.
ydt(numeric())
Optimal outcomes, e.g. the Pareto front.
extra(data.table::data.table())
Additional information.
...(any)
ignored.
clone()
The objects of this class are cloneable with this method.
FSelectInstanceAsyncMultiCrit$clone(deep = FALSE)
deepWhether to make a deep clone.
The FSelectInstanceAsyncSingleCrit specifies a feature selection problem for a FSelectorAsync.
The function fsi_async() creates a FSelectInstanceAsyncSingleCrit and the function fselect() creates an instance internally.
If no measure is passed, the default measure is used. The default measure depends on the task type.
| Task | Default Measure | Package |
"classif" |
"classif.ce" |
mlr3 |
"regr" |
"regr.mse" |
mlr3 |
"surv" |
"surv.cindex" |
mlr3proba |
"dens" |
"dens.logloss" |
mlr3proba |
"classif_st" |
"classif.ce" |
mlr3spatial |
"regr_st" |
"regr.mse" |
mlr3spatial |
"clust" |
"clust.dunn" |
mlr3cluster |
For analyzing the feature selection results, it is recommended to pass the ArchiveAsyncFSelect to as.data.table().
The returned data table contains the mlr3::ResampleResult for each feature subset evaluation.
There are several sections about feature selection in the mlr3book.
Getting started with wrapper feature selection.
Do a sequential forward selection Palmer Penguins data set.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
bbotk::OptimInstance -> bbotk::OptimInstanceAsync -> bbotk::OptimInstanceAsyncSingleCrit -> FSelectInstanceAsyncSingleCrit
new()
Creates a new instance of this R6 class.
FSelectInstanceAsyncSingleCrit$new( task, learner, resampling, measure = NULL, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL )
task(mlr3::Task)
Task to operate on.
learner(mlr3::Learner)
Learner to optimize the feature subset for.
resampling(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets.
Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits.
Already instantiated resamplings are kept unchanged.
measure(mlr3::Measure)
Measure to optimize. If NULL, default measure is used.
terminator(bbotk::Terminator)
Stop criterion of the feature selection.
store_benchmark_result(logical(1))
Store benchmark result in archive?
store_models(logical(1)).
Store models in benchmark result?
check_values(logical(1))
Check the parameters before the evaluation and the results for
validity?
callbacks(list of CallbackBatchFSelect)
List of callbacks.
ties_method(character(1))
The method to break ties when selecting sets while optimizing and when selecting the best set.
Can be "least_features" or "random".
The option "least_features" (default) selects the feature set with the least features.
If there are multiple best feature sets with the same number of features, one is selected randomly.
The random method returns a random feature set from the best feature sets.
Ignored if multiple measures are used.
rush(Rush)
If a rush instance is supplied, the optimization runs without batches.
assign_result()
The FSelectorAsync object writes the best found point and estimated performance value here. For internal use.
FSelectInstanceAsyncSingleCrit$assign_result(xdt, y, extra = NULL, ...)
xdt(data.table::data.table())
x values as data.table. Each row is one point. Contains the value in
the search space of the FSelectInstanceBatchMultiCrit object. Can contain
additional columns for extra information.
y(numeric(1))
Optimal outcome.
extra(data.table::data.table())
Additional information.
...(any)
ignored.
clone()
The objects of this class are cloneable with this method.
FSelectInstanceAsyncSingleCrit$clone(deep = FALSE)
deepWhether to make a deep clone.
The FSelectInstanceBatchMultiCrit specifies a feature selection problem for a FSelector.
The function fsi() creates a FSelectInstanceBatchMultiCrit and the function fselect() creates an instance internally.
There are several sections about feature selection in the mlr3book.
Learn about multi-objective optimization.
The gallery features a collection of case studies and demos about optimization.
For analyzing the feature selection results, it is recommended to pass the archive to as.data.table().
The returned data table is joined with the benchmark result which adds the mlr3::ResampleResult for each feature set.
The archive provides various getters (e.g. $learners()) to ease the access.
All getters extract by position (i) or unique hash (uhash).
For a complete list of all getters see the methods section.
The benchmark result ($benchmark_result) allows to score the feature sets again on a different measure.
Alternatively, measures can be supplied to as.data.table().
bbotk::OptimInstance -> bbotk::OptimInstanceBatch -> bbotk::OptimInstanceBatchMultiCrit -> FSelectInstanceBatchMultiCrit
result_feature_set(list of character())
Feature sets for task subsetting.
new()
Creates a new instance of this R6 class.
FSelectInstanceBatchMultiCrit$new( task, learner, resampling, measures, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL )
task(mlr3::Task)
Task to operate on.
learner(mlr3::Learner)
Learner to optimize the feature subset for.
resampling(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets.
Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits.
Already instantiated resamplings are kept unchanged.
measures(list of mlr3::Measure)
Measures to optimize.
If NULL, mlr3's default measure is used.
terminator(bbotk::Terminator)
Stop criterion of the feature selection.
store_benchmark_result(logical(1))
Store benchmark result in archive?
store_models(logical(1)).
Store models in benchmark result?
check_values(logical(1))
Check the parameters before the evaluation and the results for
validity?
callbacks(list of CallbackBatchFSelect)
List of callbacks.
assign_result()
The FSelector object writes the best found feature subsets and estimated performance values here. For internal use.
FSelectInstanceBatchMultiCrit$assign_result(xdt, ydt, extra = NULL, ...)
xdt(data.table::data.table())
x values as data.table. Each row is one point. Contains the value in
the search space of the FSelectInstanceBatchMultiCrit object. Can contain
additional columns for extra information.
ydt(data.table::data.table())
Optimal outcomes, e.g. the Pareto front.
extra(data.table::data.table())
Additional information.
...(any)
ignored.
print()
Printer.
FSelectInstanceBatchMultiCrit$print(...)
...(ignored).
clone()
The objects of this class are cloneable with this method.
FSelectInstanceBatchMultiCrit$clone(deep = FALSE)
deepWhether to make a deep clone.
# Feature selection on Palmer Penguins data set task = tsk("penguins") # Construct feature selection instance instance = fsi( task = task, learner = lrn("classif.rpart"), resampling = rsmp("cv", folds = 3), measures = msrs(c("classif.ce", "time_train")), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Optimal feature sets instance$result_feature_set # Inspect all evaluated sets as.data.table(instance$archive)# Feature selection on Palmer Penguins data set task = tsk("penguins") # Construct feature selection instance instance = fsi( task = task, learner = lrn("classif.rpart"), resampling = rsmp("cv", folds = 3), measures = msrs(c("classif.ce", "time_train")), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Optimal feature sets instance$result_feature_set # Inspect all evaluated sets as.data.table(instance$archive)
The FSelectInstanceBatchSingleCrit specifies a feature selection problem for a FSelector.
The function fsi() creates a FSelectInstanceBatchSingleCrit and the function fselect() creates an instance internally.
The instance contains an ObjectiveFSelectBatch object that encodes the black box objective function a FSelector has to optimize.
The instance allows the basic operations of querying the objective at design points ($eval_batch()).
This operation is usually done by the FSelector.
Evaluations of feature subsets are performed in batches by calling mlr3::benchmark() internally.
The evaluated feature subsets are stored in the Archive ($archive).
Before a batch is evaluated, the bbotk::Terminator is queried for the remaining budget.
If the available budget is exhausted, an exception is raised, and no further evaluations can be performed from this point on.
The FSelector is also supposed to store its final result, consisting of a selected feature subset and associated estimated performance values, by calling the method instance$assign_result().
If no measure is passed, the default measure is used. The default measure depends on the task type.
| Task | Default Measure | Package |
"classif" |
"classif.ce" |
mlr3 |
"regr" |
"regr.mse" |
mlr3 |
"surv" |
"surv.cindex" |
mlr3proba |
"dens" |
"dens.logloss" |
mlr3proba |
"classif_st" |
"classif.ce" |
mlr3spatial |
"regr_st" |
"regr.mse" |
mlr3spatial |
"clust" |
"clust.dunn" |
mlr3cluster |
There are several sections about feature selection in the mlr3book.
Getting started with wrapper feature selection.
Do a sequential forward selection Palmer Penguins data set.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
For analyzing the feature selection results, it is recommended to pass the archive to as.data.table().
The returned data table is joined with the benchmark result which adds the mlr3::ResampleResult for each feature set.
The archive provides various getters (e.g. $learners()) to ease the access.
All getters extract by position (i) or unique hash (uhash).
For a complete list of all getters see the methods section.
The benchmark result ($benchmark_result) allows to score the feature sets again on a different measure.
Alternatively, measures can be supplied to as.data.table().
bbotk::OptimInstance -> bbotk::OptimInstanceBatch -> bbotk::OptimInstanceBatchSingleCrit -> FSelectInstanceBatchSingleCrit
result_feature_set(character())
Feature set for task subsetting.
new()
Creates a new instance of this R6 class.
FSelectInstanceBatchSingleCrit$new( task, learner, resampling, measure, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features" )
task(mlr3::Task)
Task to operate on.
learner(mlr3::Learner)
Learner to optimize the feature subset for.
resampling(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets.
Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits.
Already instantiated resamplings are kept unchanged.
measure(mlr3::Measure)
Measure to optimize. If NULL, default measure is used.
terminator(bbotk::Terminator)
Stop criterion of the feature selection.
store_benchmark_result(logical(1))
Store benchmark result in archive?
store_models(logical(1)).
Store models in benchmark result?
check_values(logical(1))
Check the parameters before the evaluation and the results for
validity?
callbacks(list of CallbackBatchFSelect)
List of callbacks.
ties_method(character(1))
The method to break ties when selecting sets while optimizing and when selecting the best set.
Can be "least_features" or "random".
The option "least_features" (default) selects the feature set with the least features.
If there are multiple best feature sets with the same number of features, one is selected randomly.
The random method returns a random feature set from the best feature sets.
Ignored if multiple measures are used.
assign_result()
The FSelector writes the best found feature subset and estimated performance value here. For internal use.
FSelectInstanceBatchSingleCrit$assign_result(xdt, y, extra = NULL, ...)
xdt(data.table::data.table())
x values as data.table. Each row is one point. Contains the value in
the search space of the FSelectInstanceBatchMultiCrit object. Can contain
additional columns for extra information.
y(numeric(1))
Optimal outcome.
extra(data.table::data.table())
Additional information.
...(any)
ignored.
print()
Printer.
FSelectInstanceBatchSingleCrit$print(...)
...(ignored).
clone()
The objects of this class are cloneable with this method.
FSelectInstanceBatchSingleCrit$clone(deep = FALSE)
deepWhether to make a deep clone.
# Feature selection on Palmer Penguins data set task = tsk("penguins") learner = lrn("classif.rpart") # Construct feature selection instance instance = fsi( task = task, learner = learner, resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Subset task to optimal feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated sets as.data.table(instance$archive)# Feature selection on Palmer Penguins data set task = tsk("penguins") learner = lrn("classif.rpart") # Construct feature selection instance instance = fsi( task = task, learner = learner, resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Subset task to optimal feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated sets as.data.table(instance$archive)
The 'FSelector“ implements the optimization algorithm.
FSelector is an abstract base class that implements the base functionality each fselector must provide.
There are several sections about feature selection in the mlr3book.
Learn more about fselectors.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
id(character(1))
Identifier of the object.
Used in tables, plot and text output.
param_setparadox::ParamSet
Set of control parameters.
properties(character())
Set of properties of the fselector.
Must be a subset of mlr_reflections$fselect_properties.
packages(character())
Set of required packages.
Note that these packages will be loaded via requireNamespace(), and are not attached.
label(character(1))
Label for this object.
Can be used in tables, plot and text output instead of the ID.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
new()
Creates a new instance of this R6 class.
FSelector$new( id = "fselector", param_set, properties, packages = character(), label = NA_character_, man = NA_character_ )
id(character(1))
Identifier for the new instance.
param_setparadox::ParamSet
Set of control parameters.
properties(character())
Set of properties of the fselector.
Must be a subset of mlr_reflections$fselect_properties.
packages(character())
Set of required packages.
Note that these packages will be loaded via requireNamespace(), and are not attached.
label(character(1))
Label for this object.
Can be used in tables, plot and text output instead of the ID.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
format()
Helper for print outputs.
FSelector$format(...)
...(ignored).
(character()).
print()
Print method.
FSelector$print()
(character()).
help()
Opens the corresponding help page referenced by field $man.
FSelector$help()
clone()
The objects of this class are cloneable with this method.
FSelector$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelector:
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
The FSelectorAsync implements the asynchronous optimization algorithm.
FSelectorAsync is an abstract base class that implements the base functionality each asynchronous fselector must provide.
There are several sections about feature selection in the mlr3book.
Learn more about fselectors.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
mlr3fselect::FSelector -> FSelectorAsync
optimize()
Performs the feature selection on a FSelectInstanceAsyncSingleCrit or FSelectInstanceAsyncMultiCrit until termination. The single evaluations will be written into the ArchiveAsyncFSelect that resides in the FSelectInstanceAsyncSingleCrit/FSelectInstanceAsyncMultiCrit. The result will be written into the instance object.
FSelectorAsync$optimize(inst)
clone()
The objects of this class are cloneable with this method.
FSelectorAsync$clone(deep = FALSE)
deepWhether to make a deep clone.
The FSelectorBatch implements the optimization algorithm.
FSelectorBatch is an abstract base class that implements the base functionality each fselector must provide. A subclass is implemented in the following way:
Inherit from FSelectorBatch.
Specify the private abstract method $.optimize() and use it to call into your optimizer.
You need to call instance$eval_batch() to evaluate design points.
The batch evaluation is requested at the FSelectInstanceBatchSingleCrit/FSelectInstanceBatchMultiCrit object instance, so each batch is possibly executed in parallel via mlr3::benchmark(), and all evaluations are stored inside of instance$archive.
Before the batch evaluation, the bbotk::Terminator is checked, and if it is positive, an exception of class "terminated_error" is generated.
In the latter case the current batch of evaluations is still stored in instance, but the numeric scores are not sent back to the handling optimizer as it has lost execution control.
After such an exception was caught we select the best set from instance$archive and return it.
Note that therefore more points than specified by the bbotk::Terminator may be evaluated, as the Terminator is only checked before a batch evaluation, and not in-between evaluation in a batch. How many more depends on the setting of the batch size.
Overwrite the private super-method .assign_result() if you want to decide how to estimate the final set in the instance and its estimated performance.
The default behavior is: We pick the best resample experiment, regarding the given measure, then assign its set and aggregated performance to the instance.
.optimize(instance) -> NULL
Abstract base method. Implement to specify feature selection of your subclass.
See technical details sections.
.assign_result(instance) -> NULL
Abstract base method. Implement to specify how the final feature subset is selected.
See technical details sections.
There are several sections about feature selection in the mlr3book.
Learn more about fselectors.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
mlr3fselect::FSelector -> FSelectorBatch
new()
Creates a new instance of this R6 class.
FSelectorBatch$new( id = "fselector_batch", param_set, properties, packages = character(), label = NA_character_, man = NA_character_ )
id(character(1))
Identifier for the new instance.
param_setparadox::ParamSet
Set of control parameters.
properties(character())
Set of properties of the fselector.
Must be a subset of mlr_reflections$fselect_properties.
packages(character())
Set of required packages.
Note that these packages will be loaded via requireNamespace(), and are not attached.
label(character(1))
Label for this object.
Can be used in tables, plot and text output instead of the ID.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
optimize()
Performs the feature selection on a FSelectInstanceBatchSingleCrit or FSelectInstanceBatchMultiCrit until termination. The single evaluations will be written into the ArchiveBatchFSelect that resides in the FSelectInstanceBatchSingleCrit / FSelectInstanceBatchMultiCrit. The result will be written into the instance object.
FSelectorBatch$optimize(inst)
clone()
The objects of this class are cloneable with this method.
FSelectorBatch$clone(deep = FALSE)
deepWhether to make a deep clone.
Function to construct a FSelectInstanceBatchSingleCrit or FSelectInstanceBatchMultiCrit.
fsi( task, learner, resampling, measures = NULL, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features" )fsi( task, learner, resampling, measures = NULL, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features" )
task |
(mlr3::Task) |
learner |
(mlr3::Learner) |
resampling |
(mlr3::Resampling) |
measures |
(mlr3::Measure or list of mlr3::Measure) |
terminator |
(bbotk::Terminator) |
store_benchmark_result |
( |
store_models |
( |
check_values |
( |
callbacks |
(list of CallbackBatchFSelect) |
ties_method |
( |
There are several sections about feature selection in the mlr3book.
Getting started with wrapper feature selection.
Do a sequential forward selection Palmer Penguins data set.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
If no measure is passed, the default measure is used. The default measure depends on the task type.
| Task | Default Measure | Package |
"classif" |
"classif.ce" |
mlr3 |
"regr" |
"regr.mse" |
mlr3 |
"surv" |
"surv.cindex" |
mlr3proba |
"dens" |
"dens.logloss" |
mlr3proba |
"classif_st" |
"classif.ce" |
mlr3spatial |
"regr_st" |
"regr.mse" |
mlr3spatial |
"clust" |
"clust.dunn" |
mlr3cluster |
# Feature selection on Palmer Penguins data set task = tsk("penguins") learner = lrn("classif.rpart") # Construct feature selection instance instance = fsi( task = task, learner = learner, resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Subset task to optimal feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated sets as.data.table(instance$archive)# Feature selection on Palmer Penguins data set task = tsk("penguins") learner = lrn("classif.rpart") # Construct feature selection instance instance = fsi( task = task, learner = learner, resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Subset task to optimal feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated sets as.data.table(instance$archive)
Function to construct a FSelectInstanceAsyncSingleCrit or FSelectInstanceAsyncMultiCrit.
fsi_async( task, learner, resampling, measures = NULL, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL )fsi_async( task, learner, resampling, measures = NULL, terminator, store_benchmark_result = TRUE, store_models = FALSE, check_values = FALSE, callbacks = NULL, ties_method = "least_features", rush = NULL )
task |
(mlr3::Task) |
learner |
(mlr3::Learner) |
resampling |
(mlr3::Resampling) |
measures |
(mlr3::Measure or list of mlr3::Measure) |
terminator |
(bbotk::Terminator) |
store_benchmark_result |
( |
store_models |
( |
check_values |
( |
callbacks |
(list of CallbackBatchFSelect) |
ties_method |
( |
rush |
( |
There are several sections about feature selection in the mlr3book.
Getting started with wrapper feature selection.
Do a sequential forward selection Palmer Penguins data set.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
Run a feature selection with Shadow Variable Search.
If no measure is passed, the default measure is used. The default measure depends on the task type.
| Task | Default Measure | Package |
"classif" |
"classif.ce" |
mlr3 |
"regr" |
"regr.mse" |
mlr3 |
"surv" |
"surv.cindex" |
mlr3proba |
"dens" |
"dens.logloss" |
mlr3proba |
"classif_st" |
"classif.ce" |
mlr3spatial |
"regr_st" |
"regr.mse" |
mlr3spatial |
"clust" |
"clust.dunn" |
mlr3cluster |
# Feature selection on Palmer Penguins data set task = tsk("penguins") learner = lrn("classif.rpart") # Construct feature selection instance instance = fsi( task = task, learner = learner, resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Subset task to optimal feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated sets as.data.table(instance$archive)# Feature selection on Palmer Penguins data set task = tsk("penguins") learner = lrn("classif.rpart") # Construct feature selection instance instance = fsi( task = task, learner = learner, resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("evals", n_evals = 4) ) # Choose optimization algorithm fselector = fs("random_search", batch_size = 2) # Run feature selection fselector$optimize(instance) # Subset task to optimal feature set task$select(instance$result_feature_set) # Train the learner with optimal feature set on the full data set learner$train(task) # Inspect all evaluated sets as.data.table(instance$archive)
A mlr3misc::Dictionary storing objects of class FSelector.
Each fselector has an associated help page, see mlr_fselectors_[id].
For a more convenient way to retrieve and construct fselectors, see fs()/fss().
R6::R6Class object inheriting from mlr3misc::Dictionary.
See mlr3misc::Dictionary.
as.data.table(dict, ..., objects = FALSE)
mlr3misc::Dictionary -> data.table::data.table()
Returns a data.table::data.table() with fields "key", "label", "properties" and "packages" as columns.
If objects is set to TRUE, the constructed objects are returned in the list column named object.
Other FSelector:
FSelector,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
as.data.table(mlr_fselectors) mlr_fselectors$get("random_search") fs("random_search")as.data.table(mlr_fselectors) mlr_fselectors$get("random_search") fs("random_search")
Subclass for asynchronous design points feature selection.
This FSelector can be instantiated with the associated sugar function fs():
fs("async_design_points")
designdata.table::data.table
Design points to try in search, one per row.
mlr3fselect::FSelector -> mlr3fselect::FSelectorAsync -> mlr3fselect::FSelectorAsyncFromOptimizerAsync -> FSelectorAsyncDesignPoints
new()
Creates a new instance of this R6 class.
FSelectorAsyncDesignPoints$new()
clone()
The objects of this class are cloneable with this method.
FSelectorAsyncDesignPoints$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelectorAsync:
mlr_fselectors_async_exhaustive_search,
mlr_fselectors_async_random_search
Feature Selection using the Asynchronous Exhaustive Search Algorithm. Exhaustive Search generates all possible feature sets. The feature sets are evaluated asynchronously.
The feature selection terminates itself when all feature sets are evaluated. It is not necessary to set a termination criterion.
This FSelector can be instantiated with the associated sugar function fs():
fs("async_exhaustive_search")
max_featuresinteger(1)
Maximum number of features.
By default, number of features in mlr3::Task.
mlr3fselect::FSelector -> mlr3fselect::FSelectorAsync -> FSelectorAsyncExhaustiveSearch
new()
Creates a new instance of this R6 class.
FSelectorAsyncExhaustiveSearch$new()
optimize()
Starts the asynchronous optimization.
FSelectorAsyncExhaustiveSearch$optimize(inst)
clone()
The objects of this class are cloneable with this method.
FSelectorAsyncExhaustiveSearch$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelectorAsync:
mlr_fselectors_async_design_points,
mlr_fselectors_async_random_search
Feature selection using Asynchronous Random Search Algorithm.
This FSelector can be instantiated with the associated sugar function fs():
fs("async_random_search")
max_featuresinteger(1)
Maximum number of features.
By default, number of features in mlr3::Task.
mlr3fselect::FSelector -> mlr3fselect::FSelectorAsync -> FSelectorAsyncRandomSearch
new()
Creates a new instance of this R6 class.
FSelectorAsyncRandomSearch$new()
clone()
The objects of this class are cloneable with this method.
FSelectorAsyncRandomSearch$clone(deep = FALSE)
deepWhether to make a deep clone.
Bergstra J, Bengio Y (2012). “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research, 13(10), 281–305. https://jmlr.csail.mit.edu/papers/v13/bergstra12a.html.
Other FSelectorAsync:
mlr_fselectors_async_design_points,
mlr_fselectors_async_exhaustive_search
Feature selection using user-defined feature sets.
The feature sets are evaluated in order as given.
The feature selection terminates itself when all feature sets are evaluated. It is not necessary to set a termination criterion.
This FSelector can be instantiated with the associated sugar function fs():
fs("design_points")
batch_sizeinteger(1)
Maximum number of configurations to try in a batch.
designdata.table::data.table
Design points to try in search, one per row.
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> mlr3fselect::FSelectorBatchFromOptimizerBatch -> FSelectorBatchDesignPoints
new()
Creates a new instance of this R6 class.
FSelectorBatchDesignPoints$new()
clone()
The objects of this class are cloneable with this method.
FSelectorBatchDesignPoints$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
# Feature Selection # retrieve task and load learner task = tsk("pima") learner = lrn("classif.rpart") # create design design = mlr3misc::rowwise_table( ~age, ~glucose, ~insulin, ~mass, ~pedigree, ~pregnant, ~pressure, ~triceps, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE ) # run feature selection on the Pima Indians diabetes data set instance = fselect( fselector = fs("design_points", design = design), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce") ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("pima") learner = lrn("classif.rpart") # create design design = mlr3misc::rowwise_table( ~age, ~glucose, ~insulin, ~mass, ~pedigree, ~pregnant, ~pressure, ~triceps, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE ) # run feature selection on the Pima Indians diabetes data set instance = fselect( fselector = fs("design_points", design = design), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce") ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
Feature Selection using the Exhaustive Search Algorithm. Exhaustive Search generates all possible feature sets.
The feature selection terminates itself when all feature sets are evaluated. It is not necessary to set a termination criterion.
This FSelector can be instantiated with the associated sugar function fs():
fs("exhaustive_search")
max_featuresinteger(1)
Maximum number of features.
By default, number of features in mlr3::Task.
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchExhaustiveSearch
new()
Creates a new instance of this R6 class.
FSelectorBatchExhaustiveSearch$new()
clone()
The objects of this class are cloneable with this method.
FSelectorBatchExhaustiveSearch$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("exhaustive_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("exhaustive_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
Feature selection using the Genetic Algorithm from the package genalg.
This FSelector can be instantiated with the associated sugar function fs():
fs("genetic_search")
For the meaning of the control parameters, see genalg::rbga.bin().
genalg::rbga.bin() internally terminates after iters iteration.
We set ìters = 100000 to allow the termination via our terminators.
If more iterations are needed, set ìters to a higher value in the parameter set.
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchGeneticSearch
new()
Creates a new instance of this R6 class.
FSelectorBatchGeneticSearch$new()
clone()
The objects of this class are cloneable with this method.
FSelectorBatchGeneticSearch$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("genetic_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("genetic_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
Feature selection using Random Search Algorithm.
The feature sets are randomly drawn.
The sets are evaluated in batches of size batch_size.
Larger batches mean we can parallelize more, smaller batches imply a more fine-grained checking of termination criteria.
This FSelector can be instantiated with the associated sugar function fs():
fs("random_search")
max_featuresinteger(1)
Maximum number of features.
By default, number of features in mlr3::Task.
batch_sizeinteger(1)
Maximum number of feature sets to try in a batch.
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchRandomSearch
new()
Creates a new instance of this R6 class.
FSelectorBatchRandomSearch$new()
clone()
The objects of this class are cloneable with this method.
FSelectorBatchRandomSearch$clone(deep = FALSE)
deepWhether to make a deep clone.
Bergstra J, Bengio Y (2012). “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research, 13(10), 281–305. https://jmlr.csail.mit.edu/papers/v13/bergstra12a.html.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("random_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("random_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
Feature selection using the Recursive Feature Elimination (RFE) algorithm. Recursive feature elimination iteratively removes features with a low importance score. Only works with mlr3::Learners that can calculate importance scores (see the section on optional extractors in mlr3::Learner).
The learner is trained on all features at the start and importance scores are calculated for each feature.
Then the least important feature is removed and the learner is trained on the reduced feature set.
The importance scores are calculated again and the procedure is repeated until the desired number of features is reached.
The non-recursive option (recursive = FALSE) only uses the importance scores calculated in the first iteration.
The feature selection terminates itself when n_features is reached.
It is not necessary to set a termination criterion.
When using a cross-validation resampling strategy, the importance scores of the resampling iterations are aggregated.
The parameter aggregation determines how the importance scores are aggregated.
By default ("rank"), the importance score vector of each fold is ranked and the feature with the lowest average rank is removed.
The option "mean" averages the score of each feature across the resampling iterations and removes the feature with the lowest average score.
Averaging the scores is not appropriate for most importance measures.
The ArchiveBatchFSelect holds the following additional columns:
"importance" (numeric())
The importance score vector of the feature subset.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
This FSelector can be instantiated with the associated sugar function fs():
fs("rfe")
n_featuresinteger(1)
The minimum number of features to select, by default half of the features.
feature_fractiondouble(1)
Fraction of features to retain in each iteration.
The default of 0.5 retains half of the features.
feature_numberinteger(1)
Number of features to remove in each iteration.
subset_sizesinteger()
Vector of the number of features to retain in each iteration.
Must be sorted in decreasing order.
recursivelogical(1)
If TRUE (default), the feature importance is calculated in each iteration.
aggregationcharacter(1)
The aggregation method for the importance scores of the resampling iterations.
See details.
The parameter feature_fraction, feature_number and subset_sizes are mutually exclusive.
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchRFE
new()
Creates a new instance of this R6 class.
FSelectorBatchRFE$new()
clone()
The objects of this class are cloneable with this method.
FSelectorBatchRFE$clone(deep = FALSE)
deepWhether to make a deep clone.
Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification using Support Vector Machines.” Machine Learning, 46(1), 389–422. ISSN 1573-0565, doi:10.1023/A:1012487302797.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfecv,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("rfe"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), store_models = TRUE ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("rfe"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), store_models = TRUE ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
Feature selection using the Recursive Feature Elimination with Cross-Validation (RFE-CV) algorithm. See FSelectorBatchRFE for a description of the base algorithm. RFE-CV runs a recursive feature elimination in each iteration of a cross-validation to determine the optimal number of features. Then a recursive feature elimination is run again on the complete dataset with the optimal number of features as the final feature set size. The performance of the optimal feature set is calculated on the complete data set and should not be reported as the performance of the final model. Only works with mlr3::Learners that can calculate importance scores (see the section on optional extractors in mlr3::Learner).
The resampling strategy is changed during the feature selection.
The resampling strategy passed to the instance (resampling) is used to determine the optimal number of features.
Usually, a cross-validation strategy is used and a recursive feature elimination is run in each iteration of the cross-validation.
Internally, mlr3::ResamplingCustom is used to emulate this part of the algorithm.
In the final recursive feature elimination run the resampling strategy is changed to mlr3::ResamplingInsample i.e. the complete data set is used for training and testing.
The feature selection terminates itself when the optimal number of features is reached. It is not necessary to set a termination criterion.
The ArchiveBatchFSelect holds the following additional columns:
"iteration" (integer(1))
The resampling iteration in which the feature subset was evaluated.
"importance" (numeric())
The importance score vector of the feature subset.
The gallery features a collection of case studies and demos about optimization.
Utilize the built-in feature importance of models with Recursive Feature Elimination.
This FSelector can be instantiated with the associated sugar function fs():
fs("rfe")
n_featuresinteger(1)
The number of features to select.
By default half of the features are selected.
feature_fractiondouble(1)
Fraction of features to retain in each iteration.
The default 0.5 retrains half of the features.
feature_numberinteger(1)
Number of features to remove in each iteration.
subset_sizesinteger()
Vector of number of features to retain in each iteration.
Must be sorted in decreasing order.
recursivelogical(1)
If TRUE (default), the feature importance is calculated in each iteration.
The parameter feature_fraction, feature_number and subset_sizes are mutually exclusive.
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchRFECV
new()
Creates a new instance of this R6 class.
FSelectorBatchRFECV$new()
clone()
The objects of this class are cloneable with this method.
FSelectorBatchRFECV$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_sequential,
mlr_fselectors_shadow_variable_search
# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("rfecv"), task = task, learner = learner, resampling = rsmp("cv", folds = 3), measure = msr("classif.ce"), store_models = TRUE ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("rfecv"), task = task, learner = learner, resampling = rsmp("cv", folds = 3), measure = msr("classif.ce"), store_models = TRUE ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
Feature selection using Sequential Search Algorithm.
Sequential forward selection (strategy = fsf) extends the feature set in each iteration with the feature that increases the model's performance the most.
Sequential backward selection (strategy = fsb) follows the same idea but starts with all features and removes features from the set.
The feature selection terminates itself when min_features or max_features is reached.
It is not necessary to set a termination criterion.
This FSelector can be instantiated with the associated sugar function fs():
fs("sequential")
min_featuresinteger(1)
Minimum number of features. By default, 1.
max_featuresinteger(1)
Maximum number of features. By default, number of features in mlr3::Task.
strategycharacter(1)
Search method sfs (forward search) or sbs (backward search).
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchSequential
new()
Creates a new instance of this R6 class.'
FSelectorBatchSequential$new()
optimization_path()
Returns the optimization path.
FSelectorBatchSequential$optimization_path(inst, include_uhash = FALSE)
inst(FSelectInstanceBatchSingleCrit)
Instance optimized with FSelectorBatchSequential.
include_uhash(logical(1))
Include uhash column?
clone()
The objects of this class are cloneable with this method.
FSelectorBatchSequential$clone(deep = FALSE)
deepWhether to make a deep clone.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_shadow_variable_search
# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("sequential"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("sequential"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), term_evals = 10 ) # best performing feature set instance$result # all evaluated feature sets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
Feature selection using the Shadow Variable Search Algorithm. Shadow variable search creates for each feature a permutated copy and stops when one of them is selected.
The feature selection terminates itself when the first shadow variable is selected. It is not necessary to set a termination criterion.
The gallery features a collection of case studies and demos about optimization.
Run a feature selection with Shadow Variable Search.
This FSelector can be instantiated with the associated sugar function fs():
fs("shadow_variable_search")
mlr3fselect::FSelector -> mlr3fselect::FSelectorBatch -> FSelectorBatchShadowVariableSearch
new()
Creates a new instance of this R6 class.'
FSelectorBatchShadowVariableSearch$new()
optimization_path()
Returns the optimization path.
FSelectorBatchShadowVariableSearch$optimization_path(inst)
inst(FSelectInstanceBatchSingleCrit)
Instance optimized with FSelectorBatchShadowVariableSearch.
clone()
The objects of this class are cloneable with this method.
FSelectorBatchShadowVariableSearch$clone(deep = FALSE)
deepWhether to make a deep clone.
Thomas J, Hepp T, Mayr A, Bischl B (2017). “Probing for Sparse and Fast Variable Selection with Model-Based Boosting.” Computational and Mathematical Methods in Medicine, 2017, 1–8. doi:10.1155/2017/1421409.
Wu Y, Boos DD, Stefanski LA (2007). “Controlling Variable Selection by the Addition of Pseudovariables.” Journal of the American Statistical Association, 102(477), 235–243. doi:10.1198/016214506000000843.
Other FSelector:
FSelector,
mlr_fselectors,
mlr_fselectors_design_points,
mlr_fselectors_exhaustive_search,
mlr_fselectors_genetic_search,
mlr_fselectors_random_search,
mlr_fselectors_rfe,
mlr_fselectors_rfecv,
mlr_fselectors_sequential
# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("shadow_variable_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)# Feature Selection # retrieve task and load learner task = tsk("penguins") learner = lrn("classif.rpart") # run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("shadow_variable_search"), task = task, learner = learner, resampling = rsmp("holdout"), measure = msr("classif.ce"), ) # best performing feature subset instance$result # all evaluated feature subsets as.data.table(instance$archive) # subset the task and fit the final model task$select(instance$result_feature_set) learner$train(task)
This CallbackAsyncFSelect freezes the ArchiveAsyncFSelect to ArchiveAsyncFSelectFrozen after the optimization has finished.
clbk("mlr3fselect.async_freeze_archive")clbk("mlr3fselect.async_freeze_archive")
This CallbackBatchFSelect writes the mlr3::BenchmarkResult after each batch to disk.
clbk("mlr3fselect.backup", path = "backup.rds") # Run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("random_search"), task = tsk("pima"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measures = msr("classif.ce"), term_evals = 4, callbacks = clbk("mlr3fselect.backup", path = tempfile(fileext = ".rds")))clbk("mlr3fselect.backup", path = "backup.rds") # Run feature selection on the Palmer Penguins data set instance = fselect( fselector = fs("random_search"), task = tsk("pima"), learner = lrn("classif.rpart"), resampling = rsmp ("holdout"), measures = msr("classif.ce"), term_evals = 4, callbacks = clbk("mlr3fselect.backup", path = tempfile(fileext = ".rds")))
This callback runs internal tuning alongside the feature selection. The internal tuning values are aggregated and stored in the results. The final model is trained with the best feature set and the tuned value.
clbk("mlr3fselect.internal_tuning")clbk("mlr3fselect.internal_tuning")
Selects the smallest feature set within one standard error of the best as the result. If there are multiple such feature sets with the same number of features, the first one is selected. If the sets have exactly the same performance but different number of features, the one with the smallest number of features is selected.
Kuhn, Max, Johnson, Kjell (2013). “Applied Predictive Modeling.” In chapter Over-Fitting and Model Tuning, 61–92. Springer New York, New York, NY. ISBN 978-1-4614-6849-3.
clbk("mlr3fselect.one_se_rule") # Run feature selection on the pima data set with the callback instance = fselect( fselector = fs("random_search"), task = tsk("pima"), learner = lrn("classif.rpart"), resampling = rsmp ("cv", folds = 3), measures = msr("classif.ce"), term_evals = 10, callbacks = clbk("mlr3fselect.one_se_rule")) # Smallest feature set within one standard error of the best instance$resultclbk("mlr3fselect.one_se_rule") # Run feature selection on the pima data set with the callback instance = fselect( fselector = fs("random_search"), task = tsk("pima"), learner = lrn("classif.rpart"), resampling = rsmp ("cv", folds = 3), measures = msr("classif.ce"), term_evals = 10, callbacks = clbk("mlr3fselect.one_se_rule")) # Smallest feature set within one standard error of the best instance$result
Runs a recursive feature elimination with a mlr3learners::LearnerClassifSVM.
The SVM must be configured with type = "C-classification" and kernel = "linear".
Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification using Support Vector Machines.” Machine Learning, 46(1), 389–422. ISSN 1573-0565, doi:10.1023/A:1012487302797.
clbk("mlr3fselect.svm_rfe") library(mlr3learners) # Create instance with classification svm with linear kernel instance = fsi( task = tsk("sonar"), learner = lrn("classif.svm", type = "C-classification", kernel = "linear"), resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("none"), callbacks = clbk("mlr3fselect.svm_rfe"), store_models = TRUE ) fselector = fs("rfe", feature_number = 5, n_features = 10) # Run recursive feature elimination on the Sonar data set fselector$optimize(instance)clbk("mlr3fselect.svm_rfe") library(mlr3learners) # Create instance with classification svm with linear kernel instance = fsi( task = tsk("sonar"), learner = lrn("classif.svm", type = "C-classification", kernel = "linear"), resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("none"), callbacks = clbk("mlr3fselect.svm_rfe"), store_models = TRUE ) fselector = fs("rfe", feature_number = 5, n_features = 10) # Run recursive feature elimination on the Sonar data set fselector$optimize(instance)
Stores the objective function that estimates the performance of feature subsets. This class is usually constructed internally by the FSelectInstanceBatchSingleCrit / FSelectInstanceBatchMultiCrit.
bbotk::Objective -> ObjectiveFSelect
task(mlr3::Task).
learnerresamplingmeasures(list of mlr3::Measure).
store_models(logical(1)).
store_benchmark_result(logical(1)).
callbacks(List of CallbackBatchFSelects).
new()
Creates a new instance of this R6 class.
ObjectiveFSelect$new( task, learner, resampling, measures, check_values = TRUE, store_benchmark_result = TRUE, store_models = FALSE, callbacks = NULL )
task(mlr3::Task)
Task to operate on.
learner(mlr3::Learner)
Learner to optimize the feature subset for.
resampling(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets.
Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits.
Already instantiated resamplings are kept unchanged.
measures(list of mlr3::Measure)
Measures to optimize.
If NULL, mlr3's default measure is used.
check_values(logical(1))
Check the parameters before the evaluation and the results for
validity?
store_benchmark_result(logical(1))
Store benchmark result in archive?
store_models(logical(1)).
Store models in benchmark result?
callbacks(list of CallbackBatchFSelect)
List of callbacks.
clone()
The objects of this class are cloneable with this method.
ObjectiveFSelect$clone(deep = FALSE)
deepWhether to make a deep clone.
Stores the objective function that estimates the performance of feature subsets. This class is usually constructed internally by the FSelectInstanceAsyncSingleCrit or FSelectInstanceAsyncMultiCrit.
bbotk::Objective -> mlr3fselect::ObjectiveFSelect -> ObjectiveFSelectAsync
clone()
The objects of this class are cloneable with this method.
ObjectiveFSelectAsync$clone(deep = FALSE)
deepWhether to make a deep clone.
Stores the objective function that estimates the performance of feature subsets. This class is usually constructed internally by the FSelectInstanceBatchSingleCrit / FSelectInstanceBatchMultiCrit.
bbotk::Objective -> mlr3fselect::ObjectiveFSelect -> ObjectiveFSelectBatch
archivenew()
Creates a new instance of this R6 class.
ObjectiveFSelectBatch$new( task, learner, resampling, measures, check_values = TRUE, store_benchmark_result = TRUE, store_models = FALSE, archive = NULL, callbacks = NULL )
task(mlr3::Task)
Task to operate on.
learner(mlr3::Learner)
Learner to optimize the feature subset for.
resampling(mlr3::Resampling)
Resampling that is used to evaluated the performance of the feature subsets.
Uninstantiated resamplings are instantiated during construction so that all feature subsets are evaluated on the same data splits.
Already instantiated resamplings are kept unchanged.
measures(list of mlr3::Measure)
Measures to optimize.
If NULL, mlr3's default measure is used.
check_values(logical(1))
Check the parameters before the evaluation and the results for
validity?
store_benchmark_result(logical(1))
Store benchmark result in archive?
store_models(logical(1)).
Store models in benchmark result?
archive(ArchiveBatchFSelect)
Reference to the archive of FSelectInstanceBatchSingleCrit | FSelectInstanceBatchMultiCrit.
If NULL (default), benchmark result and models cannot be stored.
callbacks(list of CallbackBatchFSelect)
List of callbacks.
clone()
The objects of this class are cloneable with this method.
ObjectiveFSelectBatch$clone(deep = FALSE)
deepWhether to make a deep clone.