Title: | Filter Based Feature Selection for 'mlr3' |
---|---|
Description: | Extends 'mlr3' with filter methods for feature selection. Besides standalone filter methods built-in methods of any machine-learning algorithm are supported. Partial scoring of multivariate filter methods is supported. |
Authors: | Patrick Schratz [aut] , Michel Lang [cre, aut] , Bernd Bischl [aut] , Martin Binder [aut], John Zobolas [aut] |
Maintainer: | Michel Lang <[email protected]> |
License: | LGPL-3 |
Version: | 0.8.0 |
Built: | 2024-10-31 18:38:54 UTC |
Source: | https://github.com/mlr-org/mlr3filters |
Extends 'mlr3' with filter methods for feature selection. Besides standalone filter methods built-in methods of any machine-learning algorithm are supported. Partial scoring of multivariate filter methods is supported.
Maintainer: Michel Lang [email protected] (ORCID)
Authors:
Patrick Schratz [email protected] (ORCID)
Bernd Bischl [email protected] (ORCID)
Martin Binder [email protected]
John Zobolas [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/mlr-org/mlr3filters/issues
Base class for filters. Predefined filters are stored in the dictionary mlr_filters. A Filter calculates a score for each feature of a task. Important features get a large value and unimportant features get a small value. Note that filter scores may also be negative.
Some features support partial scoring of the feature set:
If nfeat
is not NULL
, only the best nfeat
features are guaranteed to
get a score. Additional features may be ignored for computational reasons,
and then get a score value of NA
.
id
(character(1)
)
Identifier of the object.
Used in tables, plot and text output.
label
(character(1)
)
Label for this object.
Can be used in tables, plot and text output instead of the ID.
task_types
(character()
)
Set of supported task types, e.g. "classif"
or "regr"
.
Can be set to the scalar value NA
to allow any task type.
For a complete list of possible task types (depending on the loaded packages),
see mlr_reflections$task_types$type
.
task_properties
(character()
)
mlr3::Tasktask properties.
param_set
(paradox::ParamSet)
Set of hyperparameters.
feature_types
(character()
)
Feature types of the filter.
packages
(character()
)
Packages which this filter is relying on.
man
(character(1)
)
String in the format [pkg]::[topic]
pointing to a manual page for this object.
Defaults to NA
, but can be set by child classes.
scores
Stores the calculated filter score values as named numeric vector.
The vector is sorted in decreasing order with possible NA
values
last. The more important the feature, the higher the score.
Tied values (this includes NA
values) appear in a random,
non-deterministic order.
properties
(character()
)
Properties of the filter. Currently, only "missings"
is supported.
A filter has the property "missings"
, iff the filter can handle missing values
in the features in a graceful way. Otherwise, an assertion is thrown if missing
values are detected.
hash
(character(1)
)
Hash (unique identifier) for this object.
phash
(character(1)
)
Hash (unique identifier) for this partial object, excluding some components
which are varied systematically during tuning (parameter values) or feature
selection (feature names).
new()
Create a Filter object.
Filter$new( id, task_types, task_properties = character(), param_set = ps(), feature_types = character(), packages = character(), label = NA_character_, man = NA_character_ )
id
(character(1)
)
Identifier for the filter.
task_types
(character()
)
Types of the task the filter can operator on. E.g., "classif"
or
"regr"
. Can be set to scalar NA
to allow any task type.
task_properties
(character()
)
Required task properties, see mlr3::Task.
Must be a subset of
mlr_reflections$task_properties
.
param_set
(paradox::ParamSet)
Set of hyperparameters.
feature_types
(character()
)
Feature types the filter operates on.
Must be a subset of
mlr_reflections$task_feature_types
.
packages
(character()
)
Set of required packages.
Note that these packages will be loaded via requireNamespace()
, and
are not attached.
label
(character(1)
)
Label for the new instance.
man
(character(1)
)
String in the format [pkg]::[topic]
pointing to a manual page for
this object. The referenced help package can be opened via method
$help()
.
format()
Format helper for Filter class
Filter$format(...)
...
(ignored).
print()
Printer for Filter class
Filter$print()
help()
Opens the corresponding help page referenced by field $man
.
Filter$help()
calculate()
Calculates the filter score values for the provided mlr3::Task and
stores them in field scores
. nfeat
determines the minimum number of
features to score (see details), and defaults to the number
of features in task
. Loads required packages and then calls
private$.calculate()
of the respective subclass.
This private method is is expected to return a numeric vector, uniquely named
with (a subset of) feature names. The returned vector may have missing
values.
Features with missing values as well as features with no calculated
score are automatically ranked last, in a random order.
If the task has no rows, each feature gets the score NA
.
Filter$calculate(task, nfeat = NULL)
task
(mlr3::Task)
mlr3::Task to calculate the filter scores for.
nfeat
(integer()
)
The minimum number of features to calculate filter scores for.
clone()
The objects of this class are cloneable with this method.
Filter$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Filter:
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
These functions complements mlr_filters with a function in the spirit of mlr3::mlr_sugar.
flt(.key, ...) flts(.keys, ...)
flt(.key, ...) flts(.keys, ...)
.key |
( |
... |
(named |
.keys |
( |
flt("correlation", method = "kendall") flts(c("mrmr", "jmim"))
flt("correlation", method = "kendall") flts(c("mrmr", "jmim"))
A simple Dictionary storing objects of class Filter.
Each Filter has an associated help page, see mlr_filters_[id]
.
This dictionary can get populated with additional filters by add-on packages.
For a more convenient way to retrieve and construct filters, see flt()
.
mlr_filters
mlr_filters
R6Class object
See Dictionary.
Other Filter:
Filter
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
mlr_filters$keys() as.data.table(mlr_filters) mlr_filters$get("mim") flt("anova")
mlr_filters$keys() as.data.table(mlr_filters) mlr_filters$get("mim") flt("anova")
ANOVA F-Test filter calling stats::aov()
. Note that this is
equivalent to a -test for binary classification.
The filter value is -log10(p)
where p
is the -value. This
transformation is necessary to ensure numerical stability for very small
-values.
mlr3filters::Filter
-> FilterAnova
new()
Create a FilterAnova object.
FilterAnova$new()
clone()
The objects of this class are cloneable with this method.
FilterAnova$clone(deep = FALSE)
deep
Whether to make a deep clone.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
task = mlr3::tsk("iris") filter = flt("anova") filter$calculate(task) head(as.data.table(filter), 3) # transform to p-value 10^(-filter$scores) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("anova"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
task = mlr3::tsk("iris") filter = flt("anova") filter$calculate(task) head(as.data.table(filter), 3) # transform to p-value 10^(-filter$scores) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("anova"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Area under the (ROC) Curve filter, analogously to mlr3measures::auc()
from
mlr3measures. Missing values of the features are removed before
calculating the AUC. If the AUC is undefined for the input, it is set to 0.5
(random classifier). The absolute value of the difference between the AUC and
0.5 is used as final filter value.
mlr3filters::Filter
-> FilterAUC
new()
Create a FilterAUC object.
FilterAUC$new()
clone()
The objects of this class are cloneable with this method.
FilterAUC$clone(deep = FALSE)
deep
Whether to make a deep clone.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
task = mlr3::tsk("sonar") filter = flt("auc") filter$calculate(task) head(as.data.table(filter), 3) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
task = mlr3::tsk("sonar") filter = flt("auc") filter$calculate(task) head(as.data.table(filter), 3) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Filter using the Boruta algorithm for feature selection.
If keep = "tentative"
, confirmed and tentative features are returned.
Note that there is no ordering in the selected features.
Selected features get a score of 1, deselected features get a score of 0.
The order of selected features is random.
In combination with mlr3pipelines, only the filter criterion cutoff
makes sense.
mlr3filters::Filter
-> FilterBoruta
new()
Creates a new instance of this R6 class.
FilterBoruta$new()
clone()
The objects of this class are cloneable with this method.
FilterBoruta$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB, Rudnicki WR (2010). “Feature Selection with the Boruta Package.” Journal of Statistical Software, 36(11), 1-13.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("Boruta")) { task = mlr3::tsk("sonar") filter = flt("boruta") filter$calculate(task) as.data.table(filter) }
if (requireNamespace("Boruta")) { task = mlr3::tsk("sonar") filter = flt("boruta") filter$calculate(task) as.data.table(filter) }
Calculates the Correlation-Adjusted (marginal) coRrelation scores
(short CAR scores) implemented in care::carscore()
in package
care. The CAR scores for a set of features are defined as the
correlations between the target and the decorrelated features. The filter
returns the absolute value of the calculated scores.
Argument verbose
defaults to FALSE
.
mlr3filters::Filter
-> FilterCarScore
new()
Create a FilterCarScore object.
FilterCarScore$new()
clone()
The objects of this class are cloneable with this method.
FilterCarScore$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("care")) { task = mlr3::tsk("mtcars") filter = flt("carscore") filter$calculate(task) head(as.data.table(filter), 3) ## changing the filter settings filter = flt("carscore") filter$param_set$values = list("diagonal" = TRUE) filter$calculate(task) head(as.data.table(filter), 3) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "care", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("mtcars") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("carscore"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("regr.rpart")) graph$train(task) }
if (requireNamespace("care")) { task = mlr3::tsk("mtcars") filter = flt("carscore") filter$calculate(task) head(as.data.table(filter), 3) ## changing the filter settings filter = flt("carscore") filter$param_set$values = list("diagonal" = TRUE) filter$calculate(task) head(as.data.table(filter), 3) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "care", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("mtcars") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("carscore"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("regr.rpart")) graph$train(task) }
Calculates CARS scores for right-censored survival tasks.
Calls the implementation in carSurv::carSurvScore()
in package
carSurv.
mlr3filters::Filter
-> FilterCarSurvScore
new()
Create a FilterCarSurvScore object.
FilterCarSurvScore$new()
clone()
The objects of this class are cloneable with this method.
FilterCarSurvScore$clone(deep = FALSE)
deep
Whether to make a deep clone.
Bommert A, Welchowski T, Schmid M, Rahnenführer J (2021). “Benchmark of filter methods for feature selection in high-dimensional gene expression survival data.” Briefings in Bioinformatics, 23(1). doi:10.1093/bib/bbab354.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
Minimal conditional mutual information maximization filter
calling praznik::CMIM()
from package praznik.
This filter supports partial scoring (see Filter).
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
mlr3filters::Filter
-> FilterCMIM
new()
Create a FilterCMIM object.
FilterCMIM$new()
clone()
The objects of this class are cloneable with this method.
FilterCMIM$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("cmim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("cmim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("cmim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("cmim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Simple correlation filter calling stats::cor()
.
The filter score is the absolute value of the correlation.
mlr3filters::Filter
-> FilterCorrelation
new()
Create a FilterCorrelation object.
FilterCorrelation$new()
clone()
The objects of this class are cloneable with this method.
FilterCorrelation$clone(deep = FALSE)
deep
Whether to make a deep clone.
This filter, in its default settings, can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.
If a feature has no non-missing value, the resulting score will be NA
.
Missing scores appear in a random, non-deterministic order at the end of the vector of scores.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
## Pearson (default) task = mlr3::tsk("mtcars") filter = flt("correlation") filter$calculate(task) as.data.table(filter) ## Spearman filter = FilterCorrelation$new() filter$param_set$values = list("method" = "spearman") filter$calculate(task) as.data.table(filter) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("boston_housing") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("correlation"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("regr.rpart")) graph$train(task) }
## Pearson (default) task = mlr3::tsk("mtcars") filter = flt("correlation") filter$calculate(task) as.data.table(filter) ## Spearman filter = FilterCorrelation$new() filter$param_set$values = list("method" = "spearman") filter$calculate(task) as.data.table(filter) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("boston_housing") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("correlation"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("regr.rpart")) graph$train(task) }
Double input symmetrical relevance filter calling
praznik::DISR()
from package praznik.
This filter supports partial scoring (see Filter).
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
mlr3filters::Filter
-> FilterDISR
new()
Create a FilterDISR object.
FilterDISR$new()
clone()
The objects of this class are cloneable with this method.
FilterDISR$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("disr") filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("disr"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("disr") filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("disr"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Simple filter emulating caret::findCorrelation(exact = FALSE)
.
This gives each feature a score between 0 and 1 that is one minus the
cutoff value for which it is excluded when using caret::findCorrelation()
.
The negative is used because caret::findCorrelation()
excludes everything
above a cutoff, while filters exclude everything below a cutoff.
Here the filter scores are shifted by +1 to get positive values for to align
with the way other filters work.
Subsequently caret::findCorrelation(cutoff = 0.9)
lists the same features
that are excluded with FilterFindCorrelation
at score 0.1 (= 1 - 0.9).
mlr3filters::Filter
-> FilterFindCorrelation
new()
Create a FilterFindCorrelation object.
FilterFindCorrelation$new()
clone()
The objects of this class are cloneable with this method.
FilterFindCorrelation$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
# Pearson (default) task = mlr3::tsk("mtcars") filter = flt("find_correlation") filter$calculate(task) as.data.table(filter) ## Spearman filter = flt("find_correlation", method = "spearman") filter$calculate(task) as.data.table(filter) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("find_correlation"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
# Pearson (default) task = mlr3::tsk("mtcars") filter = flt("find_correlation") filter$calculate(task) as.data.table(filter) ## Spearman filter = flt("find_correlation", method = "spearman") filter$calculate(task) as.data.table(filter) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("find_correlation"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Variable Importance filter using embedded feature selection of machine learning algorithms. Takes a mlr3::Learner which is capable of extracting the variable importance (property "importance"), fits the model and extracts the importance values to use as filter scores.
mlr3filters::Filter
-> mlr3filters::FilterLearner
-> FilterImportance
learner
(mlr3::Learner)
Learner to extract the importance values from.
new()
Create a FilterImportance object.
FilterImportance$new(learner = mlr3::lrn("classif.featureless"))
learner
(mlr3::Learner)
Learner to extract the importance values from.
clone()
The objects of this class are cloneable with this method.
FilterImportance$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("rpart")) { task = mlr3::tsk("iris") learner = mlr3::lrn("classif.rpart") filter = flt("importance", learner = learner) filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "mlr3learners"), quietly = TRUE)) { library("mlr3learners") library("mlr3pipelines") task = mlr3::tsk("sonar") learner = mlr3::lrn("classif.rpart") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("importance", learner = learner), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.log_reg")) graph$train(task) }
if (requireNamespace("rpart")) { task = mlr3::tsk("iris") learner = mlr3::lrn("classif.rpart") filter = flt("importance", learner = learner) filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "mlr3learners"), quietly = TRUE)) { library("mlr3learners") library("mlr3pipelines") task = mlr3::tsk("sonar") learner = mlr3::lrn("classif.rpart") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("importance", learner = learner), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.log_reg")) graph$train(task) }
Information gain filter calling
FSelectorRcpp::information_gain()
in package FSelectorRcpp. Set
parameter "type"
to "gainratio"
to calculate the gain ratio, or set to
"symuncert"
to calculate the symmetrical uncertainty (see
FSelectorRcpp::information_gain()
). Default is "infogain"
.
Argument equal
defaults to FALSE
for classification tasks, and to
TRUE
for regression tasks.
mlr3filters::Filter
-> FilterInformationGain
new()
Create a FilterInformationGain object.
FilterInformationGain$new()
clone()
The objects of this class are cloneable with this method.
FilterInformationGain$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("FSelectorRcpp")) { ## InfoGain (default) task = mlr3::tsk("sonar") filter = flt("information_gain") filter$calculate(task) head(filter$scores, 3) as.data.table(filter) ## GainRatio filterGR = flt("information_gain") filterGR$param_set$values = list("type" = "gainratio") filterGR$calculate(task) head(as.data.table(filterGR), 3) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "FSelectorRcpp", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("information_gain"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("FSelectorRcpp")) { ## InfoGain (default) task = mlr3::tsk("sonar") filter = flt("information_gain") filter$calculate(task) head(filter$scores, 3) as.data.table(filter) ## GainRatio filterGR = flt("information_gain") filterGR$param_set$values = list("type" = "gainratio") filterGR$calculate(task) head(as.data.table(filterGR), 3) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "FSelectorRcpp", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("information_gain"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Joint mutual information filter calling praznik::JMI()
in
package praznik.
This filter supports partial scoring (see Filter).
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
mlr3filters::Filter
-> FilterJMI
new()
Create a FilterJMI object.
FilterJMI$new()
clone()
The objects of this class are cloneable with this method.
FilterJMI$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("jmi") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("jmi"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("jmi") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("jmi"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Minimal joint mutual information maximization filter calling
praznik::JMIM()
in package praznik.
This filter supports partial scoring (see Filter).
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
mlr3filters::Filter
-> FilterJMIM
new()
Create a FilterJMIM object.
FilterJMIM$new()
clone()
The objects of this class are cloneable with this method.
FilterJMIM$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("jmim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("jmim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("jmim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("jmim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Kruskal-Wallis rank sum test filter calling stats::kruskal.test()
.
The filter value is -log10(p)
where p
is the -value. This
transformation is necessary to ensure numerical stability for very small
-values.
mlr3filters::Filter
-> FilterKruskalTest
new()
Create a FilterKruskalTest object.
FilterKruskalTest$new()
clone()
The objects of this class are cloneable with this method.
FilterKruskalTest$clone(deep = FALSE)
deep
Whether to make a deep clone.
This filter, in its default settings, can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.
If a feature has not at least one non-missing observation per label, the resulting score will be NA. Missing scores appear in a random, non-deterministic order at the end of the vector of scores.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
task = mlr3::tsk("iris") filter = flt("kruskal_test") filter$calculate(task) as.data.table(filter) # transform to p-value 10^(-filter$scores) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("kruskal_test"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
task = mlr3::tsk("iris") filter = flt("kruskal_test") filter$calculate(task) as.data.table(filter) # transform to p-value 10^(-filter$scores) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("kruskal_test"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Conditional mutual information based feature selection filter
calling praznik::MIM()
in package praznik.
This filter supports partial scoring (see Filter).
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
mlr3filters::Filter
-> FilterMIM
new()
Create a FilterMIM object.
FilterMIM$new()
clone()
The objects of this class are cloneable with this method.
FilterMIM$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("mim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("mim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("mim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("mim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Minimum redundancy maximal relevancy filter calling
praznik::MRMR()
in package praznik.
This filter supports partial scoring (see Filter).
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
mlr3filters::Filter
-> FilterMRMR
new()
Create a FilterMRMR object.
FilterMRMR$new()
clone()
The objects of this class are cloneable with this method.
FilterMRMR$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("mrmr") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("mrmr"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("mrmr") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("mrmr"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Minimal normalised joint mutual information maximization filter
calling praznik::NJMIM()
from package praznik.
This filter supports partial scoring (see Filter).
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
mlr3filters::Filter
-> FilterNJMIM
new()
Create a FilterNJMIM object.
FilterNJMIM$new()
clone()
The objects of this class are cloneable with this method.
FilterNJMIM$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("njmim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("njmim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("praznik")) { task = mlr3::tsk("iris") filter = flt("njmim") filter$calculate(task, nfeat = 2) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("njmim"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Filter which uses the predictive performance of a
mlr3::Learner as filter score. Performs a mlr3::resample()
for each
feature separately. The filter score is the aggregated performance of the
mlr3::Measure, or the negated aggregated performance if the measure has
to be minimized.
mlr3filters::Filter
-> mlr3filters::FilterLearner
-> FilterPerformance
learner
resampling
measure
new()
Create a FilterDISR object.
FilterPerformance$new( learner = mlr3::lrn("classif.featureless"), resampling = mlr3::rsmp("holdout"), measure = NULL )
learner
(mlr3::Learner)
mlr3::Learner to use for model fitting.
resampling
(mlr3::Resampling)
mlr3::Resampling to be used within resampling.
measure
(mlr3::Measure)
mlr3::Measure to be used for evaluating the performance.
clone()
The objects of this class are cloneable with this method.
FilterPerformance$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("rpart")) { task = mlr3::tsk("iris") learner = mlr3::lrn("classif.rpart") filter = flt("performance", learner = learner) filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("iris") l = lrn("classif.rpart") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("performance", learner = l), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("rpart")) { task = mlr3::tsk("iris") learner = mlr3::lrn("classif.rpart") filter = flt("performance", learner = learner) filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("iris") l = lrn("classif.rpart") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("performance", learner = l), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
The permutation filter randomly permutes the values of a single feature in a
mlr3::Task to break the association with the response. The permuted
feature, together with the unmodified features, is used to perform a
mlr3::resample()
. The permutation filter score is the difference between
the aggregated performance of the mlr3::Measure and the performance
estimated on the unmodified mlr3::Task.
standardize
logical(1)
Standardize feature importance by maximum score.
nmc
integer(1)
Number of Monte-Carlo iterations to use in computing the feature importance.
mlr3filters::Filter
-> mlr3filters::FilterLearner
-> FilterPermutation
learner
resampling
measure
hash
(character(1)
)
Hash (unique identifier) for this object.
phash
(character(1)
)
Hash (unique identifier) for this partial object, excluding some components
which are varied systematically during tuning (parameter values) or feature
selection (feature names).
new()
Create a FilterPermutation object.
FilterPermutation$new( learner = mlr3::lrn("classif.featureless"), resampling = mlr3::rsmp("holdout"), measure = NULL )
learner
(mlr3::Learner)
mlr3::Learner to use for model fitting.
resampling
(mlr3::Resampling)
mlr3::Resampling to be used within resampling.
measure
(mlr3::Measure)
mlr3::Measure to be used for evaluating the performance.
clone()
The objects of this class are cloneable with this method.
FilterPermutation$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("rpart")) { learner = mlr3::lrn("classif.rpart") resampling = mlr3::rsmp("holdout") measure = mlr3::msr("classif.acc") filter = flt("permutation", learner = learner, measure = measure, resampling = resampling, nmc = 2) task = mlr3::tsk("iris") filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("iris") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("permutation", nmc = 2), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("rpart")) { learner = mlr3::lrn("classif.rpart") resampling = mlr3::rsmp("holdout") measure = mlr3::msr("classif.acc") filter = flt("permutation", learner = learner, measure = measure, resampling = resampling, nmc = 2) task = mlr3::tsk("iris") filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("iris") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("permutation", nmc = 2), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Information gain filter calling
FSelectorRcpp::relief()
in package FSelectorRcpp.
mlr3filters::Filter
-> FilterRelief
new()
Create a FilterRelief object.
FilterRelief$new()
clone()
The objects of this class are cloneable with this method.
FilterRelief$clone(deep = FALSE)
deep
Whether to make a deep clone.
This filter can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.
If a feature has no non-missing observation, the resulting score will be (close to) 0.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("FSelectorRcpp")) { ## Relief (default) task = mlr3::tsk("iris") filter = flt("relief") filter$calculate(task) head(filter$scores, 3) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "FSelectorRcpp", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("iris") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("relief"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
if (requireNamespace("FSelectorRcpp")) { ## Relief (default) task = mlr3::tsk("iris") filter = flt("relief") filter$calculate(task) head(filter$scores, 3) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "FSelectorRcpp", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("iris") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("relief"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Filter using embedded feature selection of machine learning algorithms. Takes a mlr3::Learner which is capable of extracting the selected features (property "selected_features"), fits the model and extracts the selected features.
Note that contrary to mlr_filters_importance, there is no ordering in
the selected features. Selected features get a score of 1, deselected
features get a score of 0. The order of selected features is random and
different from the order in the learner. In combination with
mlr3pipelines, only the filter criterion cutoff
makes sense.
mlr3filters::Filter
-> mlr3filters::FilterLearner
-> FilterSelectedFeatures
learner
(mlr3::Learner)
Learner to extract the importance values from.
new()
Create a FilterImportance object.
FilterSelectedFeatures$new(learner = mlr3::lrn("classif.featureless"))
learner
(mlr3::Learner)
Learner to extract the selected features from.
clone()
The objects of this class are cloneable with this method.
FilterSelectedFeatures$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_univariate_cox
,
mlr_filters_variance
if (requireNamespace("rpart")) { task = mlr3::tsk("iris") learner = mlr3::lrn("classif.rpart") filter = flt("selected_features", learner = learner) filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "mlr3learners", "rpart"), quietly = TRUE)) { library("mlr3pipelines") library("mlr3learners") task = mlr3::tsk("sonar") filter = flt("selected_features", learner = lrn("classif.rpart")) # Note: All filter scores are either 0 or 1, i.e. setting `filter.cutoff = 0.5` means that # we select all "selected features". graph = po("filter", filter = filter, filter.cutoff = 0.5) %>>% po("learner", mlr3::lrn("classif.log_reg")) graph$train(task) }
if (requireNamespace("rpart")) { task = mlr3::tsk("iris") learner = mlr3::lrn("classif.rpart") filter = flt("selected_features", learner = learner) filter$calculate(task) as.data.table(filter) } if (mlr3misc::require_namespaces(c("mlr3pipelines", "mlr3learners", "rpart"), quietly = TRUE)) { library("mlr3pipelines") library("mlr3learners") task = mlr3::tsk("sonar") filter = flt("selected_features", learner = lrn("classif.rpart")) # Note: All filter scores are either 0 or 1, i.e. setting `filter.cutoff = 0.5` means that # we select all "selected features". graph = po("filter", filter = filter, filter.cutoff = 0.5) %>>% po("learner", mlr3::lrn("classif.log_reg")) graph$train(task) }
Calculates scores for assessing the relationship between individual features and the time-to-event outcome (right-censored survival data) using a univariate Cox proportional hazards model. The goal is to determine which features have a statistically significant association with the event of interest, typically in the context of clinical or biomedical research.
This filter fits a Cox Proportional Hazards model using
each feature independently and extracts the -value that quantifies the
significance of the feature's impact on survival. The filter value is
-log10(p)
where p
is the -value. This transformation is necessary
to ensure numerical stability for very small
-values. Also higher
values denote more important features. The filter works only for numeric
features so please ensure that factor variables are properly encoded, e.g.
using PipeOpEncode.
mlr3filters::Filter
-> FilterUnivariateCox
new()
Create a FilterUnivariateCox object.
FilterUnivariateCox$new()
clone()
The objects of this class are cloneable with this method.
FilterUnivariateCox$clone(deep = FALSE)
deep
Whether to make a deep clone.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_variance
filter = flt("univariate_cox") filter
filter = flt("univariate_cox") filter
Variance filter calling stats::var()
.
Argument na.rm
defaults to TRUE
here.
mlr3filters::Filter
-> FilterVariance
new()
Create a FilterVariance object.
FilterVariance$new()
clone()
The objects of this class are cloneable with this method.
FilterVariance$clone(deep = FALSE)
deep
Whether to make a deep clone.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
task = mlr3::tsk("mtcars") filter = flt("variance") filter$calculate(task) head(filter$scores, 3) as.data.table(filter) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("variance"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
task = mlr3::tsk("mtcars") filter = flt("variance") filter$calculate(task) head(filter$scores, 3) as.data.table(filter) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("variance"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }