Package 'mlr3filters'

Title: Filter Based Feature Selection for 'mlr3'
Description: Extends 'mlr3' with filter methods for feature selection. Besides standalone filter methods built-in methods of any machine-learning algorithm are supported. Partial scoring of multivariate filter methods is supported.
Authors: Patrick Schratz [aut] , Michel Lang [cre, aut] , Bernd Bischl [aut] , Martin Binder [aut], John Zobolas [aut]
Maintainer: Michel Lang <[email protected]>
License: LGPL-3
Version: 0.8.0
Built: 2024-11-18 05:10:25 UTC
Source: https://github.com/mlr-org/mlr3filters

Help Index


mlr3filters: Filter Based Feature Selection for 'mlr3'

Description

logo

Extends 'mlr3' with filter methods for feature selection. Besides standalone filter methods built-in methods of any machine-learning algorithm are supported. Partial scoring of multivariate filter methods is supported.

Author(s)

Maintainer: Michel Lang [email protected] (ORCID)

Authors:

See Also

Useful links:


Filter Base Class

Description

Base class for filters. Predefined filters are stored in the dictionary mlr_filters. A Filter calculates a score for each feature of a task. Important features get a large value and unimportant features get a small value. Note that filter scores may also be negative.

Details

Some features support partial scoring of the feature set: If nfeat is not NULL, only the best nfeat features are guaranteed to get a score. Additional features may be ignored for computational reasons, and then get a score value of NA.

Public fields

id

(character(1))
Identifier of the object. Used in tables, plot and text output.

label

(character(1))
Label for this object. Can be used in tables, plot and text output instead of the ID.

task_types

(character())
Set of supported task types, e.g. "classif" or "regr". Can be set to the scalar value NA to allow any task type.

For a complete list of possible task types (depending on the loaded packages), see mlr_reflections$task_types$type.

task_properties

(character())
mlr3::Tasktask properties.

param_set

(paradox::ParamSet)
Set of hyperparameters.

feature_types

(character())
Feature types of the filter.

packages

(character())
Packages which this filter is relying on.

man

(character(1))
String in the format ⁠[pkg]::[topic]⁠ pointing to a manual page for this object. Defaults to NA, but can be set by child classes.

scores

Stores the calculated filter score values as named numeric vector. The vector is sorted in decreasing order with possible NA values last. The more important the feature, the higher the score. Tied values (this includes NA values) appear in a random, non-deterministic order.

Active bindings

properties

(character())
Properties of the filter. Currently, only "missings" is supported. A filter has the property "missings", iff the filter can handle missing values in the features in a graceful way. Otherwise, an assertion is thrown if missing values are detected.

hash

(character(1))
Hash (unique identifier) for this object.

phash

(character(1))
Hash (unique identifier) for this partial object, excluding some components which are varied systematically during tuning (parameter values) or feature selection (feature names).

Methods

Public methods


Method new()

Create a Filter object.

Usage
Filter$new(
  id,
  task_types,
  task_properties = character(),
  param_set = ps(),
  feature_types = character(),
  packages = character(),
  label = NA_character_,
  man = NA_character_
)
Arguments
id

(character(1))
Identifier for the filter.

task_types

(character())
Types of the task the filter can operator on. E.g., "classif" or "regr". Can be set to scalar NA to allow any task type.

task_properties

(character())
Required task properties, see mlr3::Task. Must be a subset of mlr_reflections$task_properties.

param_set

(paradox::ParamSet)
Set of hyperparameters.

feature_types

(character())
Feature types the filter operates on. Must be a subset of mlr_reflections$task_feature_types.

packages

(character())
Set of required packages. Note that these packages will be loaded via requireNamespace(), and are not attached.

label

(character(1))
Label for the new instance.

man

(character(1))
String in the format ⁠[pkg]::[topic]⁠ pointing to a manual page for this object. The referenced help package can be opened via method ⁠$help()⁠.


Method format()

Format helper for Filter class

Usage
Filter$format(...)
Arguments
...

(ignored).


Method print()

Printer for Filter class

Usage
Filter$print()

Method help()

Opens the corresponding help page referenced by field ⁠$man⁠.

Usage
Filter$help()

Method calculate()

Calculates the filter score values for the provided mlr3::Task and stores them in field scores. nfeat determines the minimum number of features to score (see details), and defaults to the number of features in task. Loads required packages and then calls private$.calculate() of the respective subclass.

This private method is is expected to return a numeric vector, uniquely named with (a subset of) feature names. The returned vector may have missing values. Features with missing values as well as features with no calculated score are automatically ranked last, in a random order. If the task has no rows, each feature gets the score NA.

Usage
Filter$calculate(task, nfeat = NULL)
Arguments
task

(mlr3::Task)
mlr3::Task to calculate the filter scores for.

nfeat

(integer())
The minimum number of features to calculate filter scores for.


Method clone()

The objects of this class are cloneable with this method.

Usage
Filter$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance


Syntactic Sugar for Filter Construction

Description

These functions complements mlr_filters with a function in the spirit of mlr3::mlr_sugar.

Usage

flt(.key, ...)

flts(.keys, ...)

Arguments

.key

(character(1))
Key passed to the respective dictionary to retrieve the object.

...

(named list())
Named arguments passed to the constructor, to be set as parameters in the paradox::ParamSet, or to be set as public field. See mlr3misc::dictionary_sugar_get() for more details.

.keys

(character())
Keys passed to the respective dictionary to retrieve multiple objects.

Value

Filter.

Examples

flt("correlation", method = "kendall")
flts(c("mrmr", "jmim"))

Dictionary of Filters

Description

A simple Dictionary storing objects of class Filter. Each Filter has an associated help page, see mlr_filters_[id].

This dictionary can get populated with additional filters by add-on packages.

For a more convenient way to retrieve and construct filters, see flt().

Usage

mlr_filters

Format

R6Class object

Usage

See Dictionary.

See Also

Other Filter: Filter, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

mlr_filters$keys()
as.data.table(mlr_filters)
mlr_filters$get("mim")
flt("anova")

ANOVA F-Test Filter

Description

ANOVA F-Test filter calling stats::aov(). Note that this is equivalent to a tt-test for binary classification.

The filter value is -log10(p) where p is the pp-value. This transformation is necessary to ensure numerical stability for very small pp-values.

Super class

mlr3filters::Filter -> FilterAnova

Methods

Public methods

Inherited methods

Method new()

Create a FilterAnova object.

Usage
FilterAnova$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterAnova$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

task = mlr3::tsk("iris")
filter = flt("anova")
filter$calculate(task)
head(as.data.table(filter), 3)

# transform to p-value
10^(-filter$scores)

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("anova"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

AUC Filter

Description

Area under the (ROC) Curve filter, analogously to mlr3measures::auc() from mlr3measures. Missing values of the features are removed before calculating the AUC. If the AUC is undefined for the input, it is set to 0.5 (random classifier). The absolute value of the difference between the AUC and 0.5 is used as final filter value.

Super class

mlr3filters::Filter -> FilterAUC

Methods

Public methods

Inherited methods

Method new()

Create a FilterAUC object.

Usage
FilterAUC$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterAUC$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

task = mlr3::tsk("sonar")
filter = flt("auc")
filter$calculate(task)
head(as.data.table(filter), 3)

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Burota Filter

Description

Filter using the Boruta algorithm for feature selection. If keep = "tentative", confirmed and tentative features are returned. Note that there is no ordering in the selected features. Selected features get a score of 1, deselected features get a score of 0. The order of selected features is random. In combination with mlr3pipelines, only the filter criterion cutoff makes sense.

Super class

mlr3filters::Filter -> FilterBoruta

Methods

Public methods

Inherited methods

Method new()

Creates a new instance of this R6 class.

Usage
FilterBoruta$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterBoruta$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB, Rudnicki WR (2010). “Feature Selection with the Boruta Package.” Journal of Statistical Software, 36(11), 1-13.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("Boruta")) {
   task = mlr3::tsk("sonar")
   filter = flt("boruta")
   filter$calculate(task)
   as.data.table(filter)
  }

Correlation-Adjusted Marignal Correlation Score Filter

Description

Calculates the Correlation-Adjusted (marginal) coRrelation scores (short CAR scores) implemented in care::carscore() in package care. The CAR scores for a set of features are defined as the correlations between the target and the decorrelated features. The filter returns the absolute value of the calculated scores.

Argument verbose defaults to FALSE.

Super class

mlr3filters::Filter -> FilterCarScore

Methods

Public methods

Inherited methods

Method new()

Create a FilterCarScore object.

Usage
FilterCarScore$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterCarScore$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("care")) {
  task = mlr3::tsk("mtcars")
  filter = flt("carscore")
  filter$calculate(task)
  head(as.data.table(filter), 3)

  ## changing the filter settings
  filter = flt("carscore")
  filter$param_set$values = list("diagonal" = TRUE)
  filter$calculate(task)
  head(as.data.table(filter), 3)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "care", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("mtcars")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("carscore"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("regr.rpart"))

  graph$train(task)
}

Correlation-Adjusted Survival Score Filter

Description

Calculates CARS scores for right-censored survival tasks. Calls the implementation in carSurv::carSurvScore() in package carSurv.

Super class

mlr3filters::Filter -> FilterCarSurvScore

Methods

Public methods

Inherited methods

Method new()

Create a FilterCarSurvScore object.

Usage
FilterCarSurvScore$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterCarSurvScore$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Bommert A, Welchowski T, Schmid M, Rahnenführer J (2021). “Benchmark of filter methods for feature selection in high-dimensional gene expression survival data.” Briefings in Bioinformatics, 23(1). doi:10.1093/bib/bbab354.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance


Minimal Conditional Mutual Information Maximization Filter

Description

Minimal conditional mutual information maximization filter calling praznik::CMIM() from package praznik.

This filter supports partial scoring (see Filter).

Details

As the scores calculated by the praznik package are not monotone due to the greedy forward fashion, the returned scores simply reflect the selection order: 1, (k-1)/k, ..., 1/k where k is the number of selected features.

Threading is disabled by default (hyperparameter threads is set to 1). Set to a number ⁠>= 2⁠ to enable threading, or to 0 for auto-detecting the number of available cores.

Super class

mlr3filters::Filter -> FilterCMIM

Methods

Public methods

Inherited methods

Method new()

Create a FilterCMIM object.

Usage
FilterCMIM$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterCMIM$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("praznik")) {
  task = mlr3::tsk("iris")
  filter = flt("cmim")
  filter$calculate(task, nfeat = 2)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("cmim"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Correlation Filter

Description

Simple correlation filter calling stats::cor(). The filter score is the absolute value of the correlation.

Super class

mlr3filters::Filter -> FilterCorrelation

Methods

Public methods

Inherited methods

Method new()

Create a FilterCorrelation object.

Usage
FilterCorrelation$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterCorrelation$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

This filter, in its default settings, can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.

If a feature has no non-missing value, the resulting score will be NA. Missing scores appear in a random, non-deterministic order at the end of the vector of scores.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

## Pearson (default)
task = mlr3::tsk("mtcars")
filter = flt("correlation")
filter$calculate(task)
as.data.table(filter)

## Spearman
filter = FilterCorrelation$new()
filter$param_set$values = list("method" = "spearman")
filter$calculate(task)
as.data.table(filter)
if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("boston_housing")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("correlation"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("regr.rpart"))

  graph$train(task)
}

Double Input Symmetrical Relevance Filter

Description

Double input symmetrical relevance filter calling praznik::DISR() from package praznik.

This filter supports partial scoring (see Filter).

Details

As the scores calculated by the praznik package are not monotone due to the greedy forward fashion, the returned scores simply reflect the selection order: 1, (k-1)/k, ..., 1/k where k is the number of selected features.

Threading is disabled by default (hyperparameter threads is set to 1). Set to a number ⁠>= 2⁠ to enable threading, or to 0 for auto-detecting the number of available cores.

Super class

mlr3filters::Filter -> FilterDISR

Methods

Public methods

Inherited methods

Method new()

Create a FilterDISR object.

Usage
FilterDISR$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterDISR$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("praznik")) {
  task = mlr3::tsk("iris")
  filter = flt("disr")
  filter$calculate(task)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("disr"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Correlation Filter

Description

Simple filter emulating caret::findCorrelation(exact = FALSE).

This gives each feature a score between 0 and 1 that is one minus the cutoff value for which it is excluded when using caret::findCorrelation(). The negative is used because caret::findCorrelation() excludes everything above a cutoff, while filters exclude everything below a cutoff. Here the filter scores are shifted by +1 to get positive values for to align with the way other filters work.

Subsequently caret::findCorrelation(cutoff = 0.9) lists the same features that are excluded with FilterFindCorrelation at score 0.1 (= 1 - 0.9).

Super class

mlr3filters::Filter -> FilterFindCorrelation

Methods

Public methods

Inherited methods

Method new()

Create a FilterFindCorrelation object.

Usage
FilterFindCorrelation$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterFindCorrelation$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

# Pearson (default)
task = mlr3::tsk("mtcars")
filter = flt("find_correlation")
filter$calculate(task)
as.data.table(filter)

## Spearman
filter = flt("find_correlation", method = "spearman")
filter$calculate(task)
as.data.table(filter)

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("find_correlation"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Filter for Embedded Feature Selection via Variable Importance

Description

Variable Importance filter using embedded feature selection of machine learning algorithms. Takes a mlr3::Learner which is capable of extracting the variable importance (property "importance"), fits the model and extracts the importance values to use as filter scores.

Super classes

mlr3filters::Filter -> mlr3filters::FilterLearner -> FilterImportance

Public fields

learner

(mlr3::Learner)
Learner to extract the importance values from.

Methods

Public methods

Inherited methods

Method new()

Create a FilterImportance object.

Usage
FilterImportance$new(learner = mlr3::lrn("classif.featureless"))
Arguments
learner

(mlr3::Learner)
Learner to extract the importance values from.


Method clone()

The objects of this class are cloneable with this method.

Usage
FilterImportance$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("rpart")) {
  task = mlr3::tsk("iris")
  learner = mlr3::lrn("classif.rpart")
  filter = flt("importance", learner = learner)
  filter$calculate(task)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "mlr3learners"), quietly = TRUE)) {
  library("mlr3learners")
  library("mlr3pipelines")
  task = mlr3::tsk("sonar")

  learner = mlr3::lrn("classif.rpart")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("importance", learner = learner), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.log_reg"))

  graph$train(task)
}

Information Gain Filter

Description

Information gain filter calling FSelectorRcpp::information_gain() in package FSelectorRcpp. Set parameter "type" to "gainratio" to calculate the gain ratio, or set to "symuncert" to calculate the symmetrical uncertainty (see FSelectorRcpp::information_gain()). Default is "infogain".

Argument equal defaults to FALSE for classification tasks, and to TRUE for regression tasks.

Super class

mlr3filters::Filter -> FilterInformationGain

Methods

Public methods

Inherited methods

Method new()

Create a FilterInformationGain object.

Usage
FilterInformationGain$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterInformationGain$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("FSelectorRcpp")) {
  ## InfoGain (default)
  task = mlr3::tsk("sonar")
  filter = flt("information_gain")
  filter$calculate(task)
  head(filter$scores, 3)
  as.data.table(filter)

  ## GainRatio

  filterGR = flt("information_gain")
  filterGR$param_set$values = list("type" = "gainratio")
  filterGR$calculate(task)
  head(as.data.table(filterGR), 3)

}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "FSelectorRcpp", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("information_gain"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)

}

Joint Mutual Information Filter

Description

Joint mutual information filter calling praznik::JMI() in package praznik.

This filter supports partial scoring (see Filter).

Details

As the scores calculated by the praznik package are not monotone due to the greedy forward fashion, the returned scores simply reflect the selection order: 1, (k-1)/k, ..., 1/k where k is the number of selected features.

Threading is disabled by default (hyperparameter threads is set to 1). Set to a number ⁠>= 2⁠ to enable threading, or to 0 for auto-detecting the number of available cores.

Super class

mlr3filters::Filter -> FilterJMI

Methods

Public methods

Inherited methods

Method new()

Create a FilterJMI object.

Usage
FilterJMI$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterJMI$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("praznik")) {
  task = mlr3::tsk("iris")
  filter = flt("jmi")
  filter$calculate(task, nfeat = 2)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("jmi"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Minimal Joint Mutual Information Maximization Filter

Description

Minimal joint mutual information maximization filter calling praznik::JMIM() in package praznik.

This filter supports partial scoring (see Filter).

Details

As the scores calculated by the praznik package are not monotone due to the greedy forward fashion, the returned scores simply reflect the selection order: 1, (k-1)/k, ..., 1/k where k is the number of selected features.

Threading is disabled by default (hyperparameter threads is set to 1). Set to a number ⁠>= 2⁠ to enable threading, or to 0 for auto-detecting the number of available cores.

Super class

mlr3filters::Filter -> FilterJMIM

Methods

Public methods

Inherited methods

Method new()

Create a FilterJMIM object.

Usage
FilterJMIM$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterJMIM$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("praznik")) {
  task = mlr3::tsk("iris")
  filter = flt("jmim")
  filter$calculate(task, nfeat = 2)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("jmim"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Kruskal-Wallis Test Filter

Description

Kruskal-Wallis rank sum test filter calling stats::kruskal.test().

The filter value is -log10(p) where p is the pp-value. This transformation is necessary to ensure numerical stability for very small pp-values.

Super class

mlr3filters::Filter -> FilterKruskalTest

Methods

Public methods

Inherited methods

Method new()

Create a FilterKruskalTest object.

Usage
FilterKruskalTest$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterKruskalTest$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

This filter, in its default settings, can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.

If a feature has not at least one non-missing observation per label, the resulting score will be NA. Missing scores appear in a random, non-deterministic order at the end of the vector of scores.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

task = mlr3::tsk("iris")
filter = flt("kruskal_test")
filter$calculate(task)
as.data.table(filter)

# transform to p-value
10^(-filter$scores)

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("kruskal_test"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Mutual Information Maximization Filter

Description

Conditional mutual information based feature selection filter calling praznik::MIM() in package praznik.

This filter supports partial scoring (see Filter).

Details

As the scores calculated by the praznik package are not monotone due to the greedy forward fashion, the returned scores simply reflect the selection order: 1, (k-1)/k, ..., 1/k where k is the number of selected features.

Threading is disabled by default (hyperparameter threads is set to 1). Set to a number ⁠>= 2⁠ to enable threading, or to 0 for auto-detecting the number of available cores.

Super class

mlr3filters::Filter -> FilterMIM

Methods

Public methods

Inherited methods

Method new()

Create a FilterMIM object.

Usage
FilterMIM$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterMIM$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("praznik")) {
  task = mlr3::tsk("iris")
  filter = flt("mim")
  filter$calculate(task, nfeat = 2)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("mim"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Minimum Redundancy Maximal Relevancy Filter

Description

Minimum redundancy maximal relevancy filter calling praznik::MRMR() in package praznik.

This filter supports partial scoring (see Filter).

Details

As the scores calculated by the praznik package are not monotone due to the greedy forward fashion, the returned scores simply reflect the selection order: 1, (k-1)/k, ..., 1/k where k is the number of selected features.

Threading is disabled by default (hyperparameter threads is set to 1). Set to a number ⁠>= 2⁠ to enable threading, or to 0 for auto-detecting the number of available cores.

Super class

mlr3filters::Filter -> FilterMRMR

Methods

Public methods

Inherited methods

Method new()

Create a FilterMRMR object.

Usage
FilterMRMR$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterMRMR$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("praznik")) {
  task = mlr3::tsk("iris")
  filter = flt("mrmr")
  filter$calculate(task, nfeat = 2)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("mrmr"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Minimal Normalised Joint Mutual Information Maximization Filter

Description

Minimal normalised joint mutual information maximization filter calling praznik::NJMIM() from package praznik.

This filter supports partial scoring (see Filter).

Details

As the scores calculated by the praznik package are not monotone due to the greedy forward fashion, the returned scores simply reflect the selection order: 1, (k-1)/k, ..., 1/k where k is the number of selected features.

Threading is disabled by default (hyperparameter threads is set to 1). Set to a number ⁠>= 2⁠ to enable threading, or to 0 for auto-detecting the number of available cores.

Super class

mlr3filters::Filter -> FilterNJMIM

Methods

Public methods

Inherited methods

Method new()

Create a FilterNJMIM object.

Usage
FilterNJMIM$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterNJMIM$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819.

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("praznik")) {
  task = mlr3::tsk("iris")
  filter = flt("njmim")
  filter$calculate(task, nfeat = 2)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("njmim"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Predictive Performance Filter

Description

Filter which uses the predictive performance of a mlr3::Learner as filter score. Performs a mlr3::resample() for each feature separately. The filter score is the aggregated performance of the mlr3::Measure, or the negated aggregated performance if the measure has to be minimized.

Super classes

mlr3filters::Filter -> mlr3filters::FilterLearner -> FilterPerformance

Public fields

learner

(mlr3::Learner)

resampling

(mlr3::Resampling)

measure

(mlr3::Measure)

Methods

Public methods

Inherited methods

Method new()

Create a FilterDISR object.

Usage
FilterPerformance$new(
  learner = mlr3::lrn("classif.featureless"),
  resampling = mlr3::rsmp("holdout"),
  measure = NULL
)
Arguments
learner

(mlr3::Learner)
mlr3::Learner to use for model fitting.

resampling

(mlr3::Resampling)
mlr3::Resampling to be used within resampling.

measure

(mlr3::Measure)
mlr3::Measure to be used for evaluating the performance.


Method clone()

The objects of this class are cloneable with this method.

Usage
FilterPerformance$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("rpart")) {
  task = mlr3::tsk("iris")
  learner = mlr3::lrn("classif.rpart")
  filter = flt("performance", learner = learner)
  filter$calculate(task)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("iris")
  l = lrn("classif.rpart")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("performance", learner = l), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Permutation Score Filter

Description

The permutation filter randomly permutes the values of a single feature in a mlr3::Task to break the association with the response. The permuted feature, together with the unmodified features, is used to perform a mlr3::resample(). The permutation filter score is the difference between the aggregated performance of the mlr3::Measure and the performance estimated on the unmodified mlr3::Task.

Parameters

standardize

logical(1)
Standardize feature importance by maximum score.

nmc

integer(1)


Number of Monte-Carlo iterations to use in computing the feature importance.

Super classes

mlr3filters::Filter -> mlr3filters::FilterLearner -> FilterPermutation

Public fields

learner

(mlr3::Learner)

resampling

(mlr3::Resampling)

measure

(mlr3::Measure)

Active bindings

hash

(character(1))
Hash (unique identifier) for this object.

phash

(character(1))
Hash (unique identifier) for this partial object, excluding some components which are varied systematically during tuning (parameter values) or feature selection (feature names).

Methods

Public methods

Inherited methods

Method new()

Create a FilterPermutation object.

Usage
FilterPermutation$new(
  learner = mlr3::lrn("classif.featureless"),
  resampling = mlr3::rsmp("holdout"),
  measure = NULL
)
Arguments
learner

(mlr3::Learner)
mlr3::Learner to use for model fitting.

resampling

(mlr3::Resampling)
mlr3::Resampling to be used within resampling.

measure

(mlr3::Measure)
mlr3::Measure to be used for evaluating the performance.


Method clone()

The objects of this class are cloneable with this method.

Usage
FilterPermutation$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("rpart")) {
  learner = mlr3::lrn("classif.rpart")
  resampling = mlr3::rsmp("holdout")
  measure = mlr3::msr("classif.acc")
  filter = flt("permutation", learner = learner, measure = measure, resampling = resampling,
    nmc = 2)
  task = mlr3::tsk("iris")
  filter$calculate(task)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("iris")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("permutation", nmc = 2), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

RELIEF Filter

Description

Information gain filter calling FSelectorRcpp::relief() in package FSelectorRcpp.

Super class

mlr3filters::Filter -> FilterRelief

Methods

Public methods

Inherited methods

Method new()

Create a FilterRelief object.

Usage
FilterRelief$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterRelief$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Note

This filter can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.

If a feature has no non-missing observation, the resulting score will be (close to) 0.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_selected_features, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("FSelectorRcpp")) {
  ## Relief (default)
  task = mlr3::tsk("iris")
  filter = flt("relief")
  filter$calculate(task)
  head(filter$scores, 3)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "FSelectorRcpp", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("iris")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("relief"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}

Filter for Embedded Feature Selection

Description

Filter using embedded feature selection of machine learning algorithms. Takes a mlr3::Learner which is capable of extracting the selected features (property "selected_features"), fits the model and extracts the selected features.

Note that contrary to mlr_filters_importance, there is no ordering in the selected features. Selected features get a score of 1, deselected features get a score of 0. The order of selected features is random and different from the order in the learner. In combination with mlr3pipelines, only the filter criterion cutoff makes sense.

Super classes

mlr3filters::Filter -> mlr3filters::FilterLearner -> FilterSelectedFeatures

Public fields

learner

(mlr3::Learner)
Learner to extract the importance values from.

Methods

Public methods

Inherited methods

Method new()

Create a FilterImportance object.

Usage
FilterSelectedFeatures$new(learner = mlr3::lrn("classif.featureless"))
Arguments
learner

(mlr3::Learner)
Learner to extract the selected features from.


Method clone()

The objects of this class are cloneable with this method.

Usage
FilterSelectedFeatures$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_univariate_cox, mlr_filters_variance

Examples

if (requireNamespace("rpart")) {
  task = mlr3::tsk("iris")
  learner = mlr3::lrn("classif.rpart")
  filter = flt("selected_features", learner = learner)
  filter$calculate(task)
  as.data.table(filter)
}

if (mlr3misc::require_namespaces(c("mlr3pipelines", "mlr3learners", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  library("mlr3learners")
  task = mlr3::tsk("sonar")

  filter = flt("selected_features", learner = lrn("classif.rpart"))

  # Note: All filter scores are either 0 or 1, i.e. setting `filter.cutoff = 0.5` means that
  # we select all "selected features".

  graph = po("filter", filter = filter, filter.cutoff = 0.5) %>>%
    po("learner", mlr3::lrn("classif.log_reg"))

  graph$train(task)
}

Univariate Cox Survival Filter

Description

Calculates scores for assessing the relationship between individual features and the time-to-event outcome (right-censored survival data) using a univariate Cox proportional hazards model. The goal is to determine which features have a statistically significant association with the event of interest, typically in the context of clinical or biomedical research.

This filter fits a Cox Proportional Hazards model using each feature independently and extracts the pp-value that quantifies the significance of the feature's impact on survival. The filter value is -log10(p) where p is the pp-value. This transformation is necessary to ensure numerical stability for very small pp-values. Also higher values denote more important features. The filter works only for numeric features so please ensure that factor variables are properly encoded, e.g. using PipeOpEncode.

Super class

mlr3filters::Filter -> FilterUnivariateCox

Methods

Public methods

Inherited methods

Method new()

Create a FilterUnivariateCox object.

Usage
FilterUnivariateCox$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterUnivariateCox$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_variance

Examples

filter = flt("univariate_cox")
filter

Variance Filter

Description

Variance filter calling stats::var().

Argument na.rm defaults to TRUE here.

Super class

mlr3filters::Filter -> FilterVariance

Methods

Public methods

Inherited methods

Method new()

Create a FilterVariance object.

Usage
FilterVariance$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
FilterVariance$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839.

See Also

Other Filter: Filter, mlr_filters, mlr_filters_anova, mlr_filters_auc, mlr_filters_boruta, mlr_filters_carscore, mlr_filters_carsurvscore, mlr_filters_cmim, mlr_filters_correlation, mlr_filters_disr, mlr_filters_find_correlation, mlr_filters_importance, mlr_filters_information_gain, mlr_filters_jmi, mlr_filters_jmim, mlr_filters_kruskal_test, mlr_filters_mim, mlr_filters_mrmr, mlr_filters_njmim, mlr_filters_performance, mlr_filters_permutation, mlr_filters_relief, mlr_filters_selected_features, mlr_filters_univariate_cox

Examples

task = mlr3::tsk("mtcars")
filter = flt("variance")
filter$calculate(task)
head(filter$scores, 3)
as.data.table(filter)

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("variance"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}