| Title: | Preprocessing Operators and Pipelines for 'mlr3' |
|---|---|
| Description: | Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned. |
| Authors: | Martin Binder [aut, cre], Florian Pfisterer [aut] (ORCID: <https://orcid.org/0000-0001-8867-762X>), Lennart Schneider [aut] (ORCID: <https://orcid.org/0000-0003-4152-5308>), Bernd Bischl [aut] (ORCID: <https://orcid.org/0000-0001-6002-6980>), Michel Lang [aut] (ORCID: <https://orcid.org/0000-0001-9754-0393>), Sebastian Fischer [aut] (ORCID: <https://orcid.org/0000-0002-9609-3197>), Susanne Dandl [aut], Keno Mersmann [ctb], Maximilian Mücke [ctb] (ORCID: <https://orcid.org/0009-0000-9432-9795>), Lona Koers [ctb], Alexander Winterstetter [ctb] |
| Maintainer: | Martin Binder <[email protected]> |
| License: | LGPL-3 |
| Version: | 0.11.0 |
| Built: | 2026-06-01 09:37:49 UTC |
| Source: | https://github.com/mlr-org/mlr3pipelines |
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintainer: Martin Binder [email protected]
Authors:
Florian Pfisterer [email protected] (ORCID)
Lennart Schneider [email protected] (ORCID)
Bernd Bischl [email protected] (ORCID)
Michel Lang [email protected] (ORCID)
Sebastian Fischer [email protected] (ORCID)
Susanne Dandl [email protected]
Other contributors:
Keno Mersmann [email protected] [contributor]
Maximilian Mücke [email protected] (ORCID) [contributor]
Lona Koers [email protected] [contributor]
Alexander Winterstetter [email protected] [contributor]
Useful links:
Report bugs at https://github.com/mlr-org/mlr3pipelines/issues
These operators creates a connection that "pipes" data from the source g1 into the sink g2.
Both source and sink can either be
a Graph or a PipeOp (or an object that can be automatically converted into a Graph or PipeOp, see as_graph() and as_pipeop()).
%>>% and %>>!% try to automatically match output channels of g1 to input channels of g2; this is only possible if either
the number of output channels of g1 (as given by g1$output) is equal to the
number of input channels of g2 (as given by g2$input), or
g1 has only one output channel (i.e. g1$output has one line), or
g2 has only one input channel, which is a vararg channel (i.e. g2$input has one line, with name entry "...").
Connections between channels are created in the
order in which they occur in g1 and g2, respectively: g1's output channel 1 is connected to g2's input
channel 1, channel 2 to 2 etc.
%>>% always creates deep copies of its input arguments, so they cannot be modified by reference afterwards.
To access individual PipeOps after composition, use the resulting Graph's $pipeops list.
%>>!%, on the other hand, tries to avoid cloning its first argument: If it is a Graph, then this Graph
will be modified in-place.
When %>>!% fails, then it leaves g1 in an incompletely modified state. It is therefore usually recommended to use
%>>%, since the very marginal gain of performance from
using %>>!% often does not outweigh the risk of either modifying objects by-reference that should not be modified or getting
graphs that are in an incompletely modified state. However,
when creating long Graphs, chaining with %>>!% instead of %>>% can give noticeable performance benefits
because %>>% makes a number of clone()-calls that is quadratic in chain length, %>>!% only linear.
concat_graphs(g1, g2, in_place = FALSE) is equivalent to g1 %>>% g2. concat_graphs(g1, g2, in_place = TRUE) is equivalent to g1 %>>!% g2.
Both arguments of %>>% are automatically converted to Graphs using as_graph(); this means that objects on either side may be objects
that can be automatically converted to PipeOps (such as Learners or Filters), or that can
be converted to Graphs. This means, in particular, lists of Graphs, PipeOps or objects convertible to that, because
as_graph() automatically applies gunion() to lists. See examples. If the first argument of %>>!% is not a Graph, then
it is cloned just as when %>>% is used; %>>!% only avoids clone() if the first argument is a Graph.
Note that if g1 is NULL, g2 converted to a Graph will be returned.
Analogously, if g2 is NULL, g1 converted to a Graph will be returned.
g1 %>>% g2 concat_graphs(g1, g2, in_place = FALSE) g1 %>>!% g2g1 %>>% g2 concat_graphs(g1, g2, in_place = FALSE) g1 %>>!% g2
g1 |
( |
g2 |
( |
in_place |
( |
Other Graph operators:
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
o1 = PipeOpScale$new() o2 = PipeOpPCA$new() o3 = PipeOpFeatureUnion$new(2) # The following two are equivalent: pipe1 = o1 %>>% o2 pipe2 = Graph$new()$ add_pipeop(o1)$ add_pipeop(o2)$ add_edge(o1$id, o2$id) # Note automatical gunion() of lists. # The following three are equivalent: graph1 = list(o1, o2) %>>% o3 graph2 = gunion(list(o1, o2)) %>>% o3 graph3 = Graph$new()$ add_pipeop(o1)$ add_pipeop(o2)$ add_pipeop(o3)$ add_edge(o1$id, o3$id, dst_channel = 1)$ add_edge(o2$id, o3$id, dst_channel = 2) pipe1 %>>!% o3 # modify pipe1 in-place pipe1 # contains o1, o2, and o3 now. o1 %>>!% o2 o1 # not changed, because not a Graph.o1 = PipeOpScale$new() o2 = PipeOpPCA$new() o3 = PipeOpFeatureUnion$new(2) # The following two are equivalent: pipe1 = o1 %>>% o2 pipe2 = Graph$new()$ add_pipeop(o1)$ add_pipeop(o2)$ add_edge(o1$id, o2$id) # Note automatical gunion() of lists. # The following three are equivalent: graph1 = list(o1, o2) %>>% o3 graph2 = gunion(list(o1, o2)) %>>% o3 graph3 = Graph$new()$ add_pipeop(o1)$ add_pipeop(o2)$ add_pipeop(o3)$ add_edge(o1$id, o3$id, dst_channel = 1)$ add_edge(o2$id, o3$id, dst_channel = 2) pipe1 %>>!% o3 # modify pipe1 in-place pipe1 # contains o1, o2, and o3 now. o1 %>>!% o2 o1 # not changed, because not a Graph.
Add a class hierarchy to the class hierarchy cache. This is necessary whenever an S3 class's class hierarchy is important when inferring compatibility between types.
add_class_hierarchy_cache(hierarchy)add_class_hierarchy_cache(hierarchy)
hierarchy |
|
NULL
Other class hierarchy operations:
register_autoconvert_function(),
reset_autoconvert_register(),
reset_class_hierarchy_cache()
# This lets mlr3pipelines handle "data.table" as "data.frame". # This is an example and not necessary, because mlr3pipelines adds it by default. add_class_hierarchy_cache(c("data.table", "data.frame"))# This lets mlr3pipelines handle "data.table" as "data.frame". # This is an example and not necessary, because mlr3pipelines adds it by default. add_class_hierarchy_cache(c("data.table", "data.frame"))
The argument is turned into a Graph if possible.
If clone is TRUE, a deep copy is made
if the incoming object is a Graph to ensure the resulting
object is a different reference from the incoming object.
as_graph() is an S3 method and can therefore be implemented
by other packages that may add objects that can naturally be converted to Graphs.
By default, as_graph() tries to
apply gunion() to x if it is a list, which recursively applies as_graph() to all list elements first
create a Graph with only one element if x is a PipeOp or can be converted to one using as_pipeop().
as_graph(x, clone = FALSE)as_graph(x, clone = FALSE)
x |
( |
clone |
( |
Graph x or a deep clone of it.
Other Graph operators:
%>>%(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
The argument is turned into a PipeOp
if possible.
If clone is TRUE, a deep copy is made
if the incoming object is a PipeOp to ensure the resulting
object is a different reference from the incoming object.
as_pipeop() is an S3 method and can therefore be implemented by other packages
that may add objects that can naturally be converted to PipeOps. Objects that
can be converted are for example Learner (using PipeOpLearner) or
Filter (using PipeOpFilter).
as_pipeop(x, clone = FALSE)as_pipeop(x, clone = FALSE)
x |
( |
clone |
( |
PipeOp x or a deep clone of it.
Other Graph operators:
%>>%(),
as_graph(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Convert an object to a Multiplicity.
as.Multiplicity(x)as.Multiplicity(x)
x |
( |
Function that checks that a given object is a Graph and
throws an error if not.
assert_graph(x)assert_graph(x)
x |
( |
Graph invisible(x)
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Function that checks that a given object is a PipeOp and
throws an error if not.
assert_pipeop(x)assert_pipeop(x)
x |
( |
PipeOp invisible(x)
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
chain_graphs(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Takes an arbitrary amount of Graphs or PipeOps (or objects that can be automatically
converted into Graphs or PipeOps, see as_graph() and as_pipeop()) as inputs and joins
them in a serial Graph, as if connecting them using %>>%.
Care is taken to avoid unnecessarily cloning of components. A call of
chain_graphs(list(g1, g2, g3, g4, ...), in_place = FALSE) is equivalent to
g1 %>>% g2 %>>!% g3 %>>!% g4 %>>!% ....
A call of chain_graphs(list(g1, g2, g3, g4, ...), in_place = FALSE)
is equivalent to g1 %>>!% g2 %>>!% g3 %>>!% g4 %>>!% ... (differing in the
first operator being %>>!% as well).
chain_graphs(graphs, in_place = FALSE)chain_graphs(graphs, in_place = FALSE)
graphs |
|
in_place |
( |
Graph the resulting Graph, or NULL if there are no non-null values in graphs.
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
greplicate(),
gunion(),
mlr_graphs_greplicate
Remove all NO_OP elements from a list.
filter_noop(x)filter_noop(x)
x |
|
list: The input list, with all NO_OP elements removed.
Other Path Branching:
NO_OP,
is_noop(),
mlr_pipeops_branch,
mlr_pipeops_unbranch
A Graph is a representation of a machine learning pipeline graph. It can be trained, and subsequently used for prediction.
A Graph is most useful when used together with Learner objects encapsulated as PipeOpLearner. In this case,
the Graph produces Prediction data during its $predict() phase and can be used as a Learner
itself (using the GraphLearner wrapper). However, the Graph can also be used without Learner objects to simply
perform preprocessing of data, and, in principle, does not even need to handle data at all but can be used for general processes with
dependency structure (although the PipeOps for this would need to be written).
Graph$new()
A Graph is made up of a list of PipeOps, and a data.table of edges. Both for training and prediction, the Graph
performs topological sorting of the PipeOps and executes their respective $train() or $predict() functions in order, moving
the PipeOp results along the edges as input to other PipeOps.
pipeops :: named list of PipeOp
Contains all PipeOps in the Graph, named by the PipeOp's $ids.
edges :: data.table with columns src_id (character), src_channel (character), dst_id (character), dst_channel (character)
Table of connections between the PipeOps. A data.table. src_id and dst_id are $ids of PipeOps that must be present in
the $pipeops list. src_channel and dst_channel must respectively be $output and $input channel names of the
respective PipeOps.
is_trained :: logical(1)
Is the Graph, i.e. are all of its PipeOps, trained, and can the Graph be used for prediction?
lhs :: character
Ids of the 'left-hand-side' PipeOps that have some unconnected input channels and therefore act as Graph input layer.
rhs :: character
Ids of the 'right-hand-side' PipeOps that have some unconnected output channels and therefore act as Graph output layer.
input :: data.table with columns name (character), train (character), predict (character), op.id (character), channel.name (character)
Input channels of the Graph. For each channel lists the name, input type during training, input type during prediction,
PipeOp $id of the PipeOp the channel pertains to, and channel name as the PipeOp knows it.
output :: data.table with columns name (character), train (character), predict (character), op.id (character), channel.name (character)
Output channels of the Graph. For each channel lists the name, output type during training, output type during prediction,
PipeOp $id of the PipeOp the channel pertains to, and channel name as the PipeOp knows it.
packages :: character
Set of all required packages for the various methods in the Graph, a set union of all required packages of all contained
PipeOp objects.
state :: named list
Get / Set the $state of each of the members of PipeOp.
param_set :: ParamSet
Parameters and parameter constraints. Parameter values are in $param_set$values. These are the union of $param_sets
of all PipeOps in the Graph. Parameter names
as seen by the Graph have the naming scheme <PipeOp$id>.<PipeOp original parameter name>.
Changing $param_set$values also propagates the changes directly to the contained
PipeOps and is an alternative to changing a PipeOps $param_set$values directly.
hash :: character(1)
Stores a checksum calculated on the Graph configuration, which includes all PipeOp hashes
(and therefore their $param_set$values) and a hash of $edges.
phash :: character(1)
Stores a checksum calculated on the Graph configuration, which includes all PipeOp hashes
except their $param_set$values, and a hash of $edges.
keep_results :: logical(1)
Whether to store intermediate results in the PipeOp's $.result slot, mostly for debugging purposes. Default FALSE.
man :: character(1)
Identifying string of the help page that shows with help().
ids(sorted = FALSE)
(logical(1)) -> character
Get IDs of all PipeOps. This is in order that PipeOps were added if
sorted is FALSE, and topologically sorted if sorted is TRUE.
add_pipeop(op, clone = TRUE)
(PipeOp | Learner | Filter | ..., logical(1)) -> self
Mutates Graph by adding a PipeOp to the Graph. This does not add any edges, so the new PipeOp
will not be connected within the Graph at first.
Instead of supplying a PipeOp directly, an object that can naturally be converted to a PipeOp can also
be supplied, e.g. a Learner or a Filter; see as_pipeop().
The argument given as op is cloned if clone is TRUE (default); to access a Graph's PipeOps
by-reference, use $pipeops.
Note that $add_pipeop() is a relatively low-level operation, it is recommended to build graphs using %>>%.
add_edge(src_id, dst_id, src_channel = NULL, dst_channel = NULL)
(character(1), character(1),
character(1) | numeric(1) | NULL,
character(1) | numeric(1) | NULL) -> self
Add an edge from PipeOp src_id, and its channel src_channel
(identified by its name or number as listed in the PipeOp's $output), to PipeOp dst_id's
channel dst_channel (identified by its name or number as listed in the PipeOp's $input).
If source or destination PipeOp have only one input / output channel and src_channel / dst_channel
are therefore unambiguous, they can be omitted (i.e. left as NULL).
chain(gs, clone = TRUE)
(list of Graphs, logical(1)) -> self
Takes a list of Graphs or PipeOps (or objects that can be automatically converted into Graphs or PipeOps,
see as_graph() and as_pipeop()) as inputs and joins them in a serial Graph coming after self, as if
connecting them using %>>%.
plot(html = FALSE, horizontal = FALSE)
(logical(1), logical(1)) -> NULL
Plot the Graph, using either the igraph package (for html = FALSE, default) or
the visNetwork package for html = TRUE producing a htmlWidget.
The htmlWidget can be rescaled using visOptions.
For html = FALSE, the orientation of the plotted graph can be controlled through horizontal.
print(dot = FALSE, dotname = "dot", fontsize = 24L)
(logical(1), character(1), integer(1)) -> NULL
Print a representation of the Graph on the console. If dot is FALSE, output is a table with one row for each contained PipeOp and
columns ID ($id of PipeOp), State (short representation of $state of PipeOp), sccssors (PipeOps that
take their input directly from the PipeOp on this line), and prdcssors (the PipeOps that produce the data
that is read as input by the PipeOp on this line). If dot is TRUE, print a DOT representation of the Graph on the console.
The DOT output can be named via the argument dotname and the fontsize can also be specified.
set_names(old, new)
(character, character) -> self
Rename PipeOps: Change ID of each PipeOp as identified by old to the corresponding item in new. This should be used
instead of changing a PipeOp's $id value directly!
update_ids(prefix = "", postfix = "")
(character, character) -> self
Pre- or postfix PipeOp's existing ids. Both prefix and postfix default to "", i.e. no changes.
train(input, single_input = TRUE)
(any, logical(1)) -> named list
Train Graph by traversing the Graphs' edges and calling all the PipeOp's $train methods in turn.
Return a named list of outputs for each unconnected
PipeOp out-channel, named according to the Graph's $output name column. During training, the $state
member of each PipeOps will be set and the $is_trained slot of the Graph (and each individual PipeOp) will
consequently be set to TRUE.
If single_input is TRUE, the input value will be sent to each unconnected PipeOp's input channel
(as listed in the Graph's $input). Typically, input should be a Task, although this is dependent
on the PipeOps in the Graph. If single_input is FALSE, then
input should be a list with the same length as the Graph's $input table has rows; each list item will be sent
to a corresponding input channel of the Graph. If input is a named list, names must correspond to input channel
names ($input$name) and inputs will be sent to the channels by name; otherwise they will be sent to the channels
in order in which they are listed in $input.
predict(input, single_input = TRUE)
(any, logical(1)) -> list of any
Predict with the Graph by calling all the PipeOp's $train methods. Input and output, as well as the function
of the single_input argument, are analogous to $train().
help(help_type)
(character(1)) -> help file
Displays the help file of the concrete PipeOp instance. help_type is one of "text", "html", "pdf" and behaves
as the help_type argument of R's help().
Other mlr3pipelines backend related:
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
library("mlr3") g = Graph$new()$ add_pipeop(PipeOpScale$new(id = "scale"))$ add_pipeop(PipeOpPCA$new(id = "pca"))$ add_edge("scale", "pca") g$input g$output task = tsk("iris") trained = g$train(task) trained[[1]]$data() task$filter(1:10) predicted = g$predict(task) predicted[[1]]$data()library("mlr3") g = Graph$new()$ add_pipeop(PipeOpScale$new(id = "scale"))$ add_pipeop(PipeOpPCA$new(id = "pca"))$ add_edge("scale", "pca") g$input g$output task = tsk("iris") trained = g$train(task) trained[[1]]$data() task$filter(1:10) predicted = g$predict(task) predicted[[1]]$data()
Create a new Graph containing n copies of the input Graph / PipeOp.
To avoid ID collisions, PipeOp IDs are suffixed with _i
where i ranges from 1 to n.
This function is deprecated and will be removed in the next version in favor of using pipeline_greplicate / ppl("greplicate").
greplicate(graph, n)greplicate(graph, n)
graph |
|
n |
|
Graph containing n copies of input graph.
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
gunion(),
mlr_graphs_greplicate
Takes an arbitrary amount of Graphs or PipeOps (or objects that can be automatically
converted into Graphs or PipeOps, see as_graph() and as_pipeop()) as inputs and joins
them in a new Graph.
The PipeOps of the input Graphs are not joined with new edges across
Graphs, so if length(graphs) > 1, the resulting Graph will be disconnected.
This operation always creates deep copies of its input arguments, so they cannot be modified by reference afterwards.
To access individual PipeOps after composition, use the resulting Graph's $pipeops list.
gunion(graphs, in_place = FALSE)gunion(graphs, in_place = FALSE)
graphs |
|
in_place |
( |
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
mlr_graphs_greplicate
Test whether a given object is a NO_OP.
is_noop(x)is_noop(x)
x |
|
logical(1): Whether x is a NO_OP.
Other Path Branching:
NO_OP,
filter_noop(),
mlr_pipeops_branch,
mlr_pipeops_unbranch
Check if an object is a Multiplicity.
is.Multiplicity(x)is.Multiplicity(x)
x |
( |
logical(1)
FilterEnsemble aggregates several Filters by averaging their scores
(or ranks) with user-defined weights. Each wrapped filter is evaluated on the supplied task,
and the resulting feature scores are combined feature-wise by a convex combination determined
through the weights parameter. This allows leveraging complementary inductive biases of
multiple filters without committing to a single criterion. The concept was introduced by
Binder et al. (2020). This implementation follows the idea but leaves the exact choice of
weights to the user.
R6Class object inheriting from Filter.
FilterEnsemble$new(filters)
filters :: list of Filter
Filters that are evaluated and aggregated. Each filter must be cloneable and support the
task type and feature types of the ensemble. The ensemble identifier defaults to the wrapped
filter ids concatenated by ".".
weights :: numeric()
Required non-negative weights, one for each wrapped filter, with at least one strictly positive value.
Values are used as given when calculating the weighted mean. If named, names must match the wrapped filter ids.
rank_transform :: logical(1)
If TRUE, ranks of individual filter scores are used instead of the raw scores. Initialized to FALSE.
filter_score_transform :: function
Function to be applied to the vector of individual filter scores after they were potentially transformed by
rank_transform but before weighting and aggregation. Initialized to identity.
aggregator :: function
Function to aggregate the (potentially transformed) and weighted filter scores across filters. Must take
arguments w for weights and na.rm, the latter of which is always set to TRUE. Defaults to stats::weighted.mean.
result_score_transform :: function
Function to be applied to the vector of aggregated scores after they were potentially transformed by rank_transform and/or
filter_score_transform. Initialized to identity.
Parameters of wrapped filters are available via $param_set and can be referenced using
the wrapped filter id followed by ".", e.g. "variance.na.rm".
$wrapped :: named list of Filter
Read-only access to the wrapped filters.
get_weights_search_space(weights_param_name = "weights", normalize_weights = "uniform", prefix = "w")
(character(1), character(1), character(1)) -> ParamSet
Construct a ParamSet describing a weight search space.
get_weights_tunetoken(normalize_weights = "uniform")
(character(1)) -> TuneToken
Shortcut returning a TuneToken for tuning the weights.
set_weights_to_tune(normalize_weights = "uniform")
(character(1)) -> self
Convenience wrapper that stores the TuneToken returned by
get_weights_tunetoken() in $param_set$values$weights.
All wrapped filters are called with nfeat equal to the number of features to ensure that
complete score vectors are available for aggregation.
Scores are combined per feature by computing a weighted aggregation of transformed (default: identity)
scores or ranks. Additionally, the final scores may also be transformed (default: identity).
The order of transformations is as follows:
$calculate the filter's scores for all features;
If rank_transform is TRUE, convert filter scores to ranks;
Apply filter_score_transform to the scores / ranks;
Calculate the weighted aggregation across all filters using aggregator;
Potentially apply result_score_transform to the vector of scores for each feature aggreagted across filters.
Binder M, Moosbauer J, Thomas J, Bischl B (2020). “Multi-objective hyperparameter tuning and feature selection using filter ensembles.” In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 471–479. doi:10.1145/3377930.3389815.
library("mlr3") library("mlr3filters") task = tsk("sonar") filter = flt("ensemble", filters = list(FilterVariance$new(), FilterAUC$new())) filter$param_set$values$weights = c(variance = 0.5, auc = 0.5) filter$calculate(task) head(as.data.table(filter)) # Weighted median as aggregator filter$param_set$set_values(aggregator = function(x, w, na.rm) { if (na.rm) x <- x[!is.na(x)] o <- order(x) x <- x[o] w <- w[o] x[match(TRUE, which(cumsum(w) >= sum(w) / 2))] }) filter$calculate(task) head(as.data.table(filter)) # Aggregate reciprocal ranking filter$param_set$set_values(rank_transform = TRUE, filter_score_transform = function(x) 1 / x, result_score_transform = function(x) rank(1 / x, ties.method = "average")) filter$calculate(task) head(as.data.table(filter))library("mlr3") library("mlr3filters") task = tsk("sonar") filter = flt("ensemble", filters = list(FilterVariance$new(), FilterAUC$new())) filter$param_set$values$weights = c(variance = 0.5, auc = 0.5) filter$calculate(task) head(as.data.table(filter)) # Weighted median as aggregator filter$param_set$set_values(aggregator = function(x, w, na.rm) { if (na.rm) x <- x[!is.na(x)] o <- order(x) x <- x[o] w <- w[o] x[match(TRUE, which(cumsum(w) >= sum(w) / 2))] }) filter$calculate(task) head(as.data.table(filter)) # Aggregate reciprocal ranking filter$param_set$set_values(rank_transform = TRUE, filter_score_transform = function(x) 1 / x, result_score_transform = function(x) rank(1 / x, ties.method = "average")) filter$calculate(task) head(as.data.table(filter))
A simple Dictionary storing objects of class Graph.
The dictionary contains a collection of often-used graph structures, and it's aim
is solely to make often-used functions more accessible.
Each Graph has an associated help page, which can be accessed via ?mlr_graphs_<key>, i.e.
?mlr_graphs_bagging.
R6Class object inheriting from mlr3misc::Dictionary.
Methods inherited from Dictionary, as well as:
add(key, value)
(character(1), function)
Adds constructor value to the dictionary with key key, potentially
overwriting a previously stored item.
as.data.table(dict)Dictionary -> data.table::data.table
Returns a data.table with column key (character).
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_updatetarget
Other Dictionaries:
mlr_pipeops
library(mlr3) lrn = lrn("regr.rpart") task = mlr_tasks$get("boston_housing") # Robustify the learner for the task. gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn) # or equivalently gr = mlr_graphs$get("robustify", task = task, learner = lrn) %>>% po(lrn) # or equivalently gr = ppl("robustify", task, lrn) %>>% po("learner", lrn) # all Graphs currently in the dictionary: as.data.table(mlr_graphs)library(mlr3) lrn = lrn("regr.rpart") task = mlr_tasks$get("boston_housing") # Robustify the learner for the task. gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn) # or equivalently gr = mlr_graphs$get("robustify", task = task, learner = lrn) %>>% po(lrn) # or equivalently gr = ppl("robustify", task, lrn) %>>% po("learner", lrn) # all Graphs currently in the dictionary: as.data.table(mlr_graphs)
Creates a Graph that performs bagging for a supplied graph.
This is done as follows:
Subsample the data in each step using PipeOpSubsample, afterwards apply graph
Replicate this step iterations times (in parallel via multiplicities)
Average outputs of replicated graphs predictions using the averager
(note that setting collect_multipliciy = TRUE is required)
All input arguments are cloned and have no references in common with the returned Graph.
pipeline_bagging( graph, iterations = 10, frac = 0.7, averager = NULL, replace = FALSE )pipeline_bagging( graph, iterations = 10, frac = 0.7, averager = NULL, replace = FALSE )
graph |
|
iterations |
|
frac |
|
averager |
|
replace |
|
library(mlr3) lrn_po = po("learner", lrn("regr.rpart")) task = mlr_tasks$get("boston_housing") gr = pipeline_bagging(lrn_po, 3, averager = po("regravg", collect_multiplicity = TRUE)) resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate() # The original bagging method uses boosting by sampling with replacement. gr = ppl("bagging", lrn_po, frac = 1, replace = TRUE, averager = po("regravg", collect_multiplicity = TRUE)) resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate()library(mlr3) lrn_po = po("learner", lrn("regr.rpart")) task = mlr_tasks$get("boston_housing") gr = pipeline_bagging(lrn_po, 3, averager = po("regravg", collect_multiplicity = TRUE)) resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate() # The original bagging method uses boosting by sampling with replacement. gr = ppl("bagging", lrn_po, frac = 1, replace = TRUE, averager = po("regravg", collect_multiplicity = TRUE)) resample(task, GraphLearner$new(gr), rsmp("holdout"))$aggregate()
Create a multiplexed graph.
All input arguments are cloned and have no references in common with the returned Graph.
pipeline_branch(graphs, prefix_branchops = "", prefix_paths = FALSE)pipeline_branch(graphs, prefix_branchops = "", prefix_paths = FALSE)
graphs |
|
prefix_branchops |
|
prefix_paths |
|
library("mlr3") po_pca = po("pca") po_nop = po("nop") branches = pipeline_branch(list(pca = po_pca, nothing = po_nop)) # gives the same as branches = c("pca", "nothing") po("branch", branches) %>>% gunion(list(po_pca, po_nop)) %>>% po("unbranch", branches) pipeline_branch(list(pca = po_pca, nothing = po_nop), prefix_branchops = "br_", prefix_paths = "xy_") # gives the same as po("branch", branches, id = "br_branch") %>>% gunion(list(xy_pca = po_pca, xy_nothing = po_nop)) %>>% po("unbranch", branches, id = "br_unbranch")library("mlr3") po_pca = po("pca") po_nop = po("nop") branches = pipeline_branch(list(pca = po_pca, nothing = po_nop)) # gives the same as branches = c("pca", "nothing") po("branch", branches) %>>% gunion(list(po_pca, po_nop)) %>>% po("unbranch", branches) pipeline_branch(list(pca = po_pca, nothing = po_nop), prefix_branchops = "br_", prefix_paths = "xy_") # gives the same as po("branch", branches, id = "br_branch") %>>% gunion(list(xy_pca = po_pca, xy_nothing = po_nop)) %>>% po("unbranch", branches, id = "br_unbranch")
Converts all columns of type type_from to type_to, using the corresponding R function (e.g. as.numeric(), as.factor()).
It is possible to further subset the columns that should be affected using the affect_columns argument.
The resulting Graph contains a PipeOpColApply, followed, if appropriate, by a PipeOpFixFactors.
Unlike R's as.factor() function, ppl("convert_types") will convert ordered types into (unordered) factor vectors.
pipeline_convert_types( type_from, type_to, affect_columns = NULL, id = NULL, fixfactors = NULL, more_args = list() )pipeline_convert_types( type_from, type_to, affect_columns = NULL, id = NULL, fixfactors = NULL, more_args = list() )
type_from |
|
type_to |
|
affect_columns |
|
id |
|
fixfactors |
|
more_args |
|
library("mlr3") data_chr = data.table::data.table( x = factor(letters[1:3]), y = letters[1:3], z = letters[1:3] ) task_chr = TaskClassif$new("task_chr", data_chr, "x") str(task_chr$data()) graph = ppl("convert_types", "character", "factor") str(graph$train(task_chr)[[1]]$data()) graph_z = ppl("convert_types", "character", "factor", affect_columns = selector_name("z")) graph_z$train(task_chr)[[1]]$data() # `affect_columns` and `type_from` are both applied. The following # looks for a 'numeric' column with name 'z', which is not present; # the task is therefore unchanged. graph_z = ppl("convert_types", "numeric", "factor", affect_columns = selector_name("z")) graph_z$train(task_chr)[[1]]$data()library("mlr3") data_chr = data.table::data.table( x = factor(letters[1:3]), y = letters[1:3], z = letters[1:3] ) task_chr = TaskClassif$new("task_chr", data_chr, "x") str(task_chr$data()) graph = ppl("convert_types", "character", "factor") str(graph$train(task_chr)[[1]]$data()) graph_z = ppl("convert_types", "character", "factor", affect_columns = selector_name("z")) graph_z$train(task_chr)[[1]]$data() # `affect_columns` and `type_from` are both applied. The following # looks for a 'numeric' column with name 'z', which is not present; # the task is therefore unchanged. graph_z = ppl("convert_types", "numeric", "factor", affect_columns = selector_name("z")) graph_z$train(task_chr)[[1]]$data()
Create a new Graph containing n copies of the input Graph / PipeOp. To avoid ID
collisions, PipeOp IDs are suffixed with _i where i ranges from 1 to n.
All input arguments are cloned and have no references in common with the returned Graph.
pipeline_greplicate(graph, n)pipeline_greplicate(graph, n)
graph |
|
n |
|
Graph containing n copies of input graph.
Other Graph operators:
%>>%(),
as_graph(),
as_pipeop(),
assert_graph(),
assert_pipeop(),
chain_graphs(),
greplicate(),
gunion()
library("mlr3") po_pca = po("pca") pipeline_greplicate(po_pca, n = 2)library("mlr3") po_pca = po("pca") pipeline_greplicate(po_pca, n = 2)
Create a new Graph for a classification Task to
perform "One vs. Rest" classification.
All input arguments are cloned and have no references in common with the returned Graph.
pipeline_ovr(graph)pipeline_ovr(graph)
graph |
|
library("mlr3") task = tsk("wine") learner = lrn("classif.rpart") learner$predict_type = "prob" # Simple OVR g1 = pipeline_ovr(learner) g1$train(task) g1$predict(task) # Bagged Learners gr = po("replicate", reps = 3) %>>% po("subsample") %>>% learner %>>% po("classifavg", collect_multiplicity = TRUE) g2 = pipeline_ovr(gr) g2$train(task) g2$predict(task) # Bagging outside OVR g3 = po("replicate", reps = 3) %>>% pipeline_ovr(po("subsample") %>>% learner) %>>% po("classifavg", collect_multiplicity = TRUE) g3$train(task) g3$predict(task)library("mlr3") task = tsk("wine") learner = lrn("classif.rpart") learner$predict_type = "prob" # Simple OVR g1 = pipeline_ovr(learner) g1$train(task) g1$predict(task) # Bagged Learners gr = po("replicate", reps = 3) %>>% po("subsample") %>>% learner %>>% po("classifavg", collect_multiplicity = TRUE) g2 = pipeline_ovr(gr) g2$train(task) g2$predict(task) # Bagging outside OVR g3 = po("replicate", reps = 3) %>>% pipeline_ovr(po("subsample") %>>% learner) %>>% po("classifavg", collect_multiplicity = TRUE) g3$train(task) g3$predict(task)
Creates a Graph that can be used to robustify any subsequent learner.
Performs the following steps:
Drops empty factor levels using PipeOpFixFactors
Imputes numeric features using PipeOpImputeHist and PipeOpMissInd
Imputes factor features using PipeOpImputeOOR
Encodes factors using one-hot-encoding. Factors with a cardinality > max_cardinality are
collapsed using PipeOpCollapseFactors
The graph is built conservatively, i.e. the function always tries to assure everything works. If a learner is provided, some steps can be left out, i.e. if the learner can deal with factor variables, no encoding is performed.
All input arguments are cloned and have no references in common with the returned Graph.
pipeline_robustify( task = NULL, learner = NULL, impute_missings = NULL, factors_to_numeric = NULL, max_cardinality = 1000, ordered_action = "factor", character_action = "factor", POSIXct_action = "numeric" )pipeline_robustify( task = NULL, learner = NULL, impute_missings = NULL, factors_to_numeric = NULL, max_cardinality = 1000, ordered_action = "factor", character_action = "factor", POSIXct_action = "numeric" )
task |
|
learner |
|
impute_missings |
|
factors_to_numeric |
|
max_cardinality |
|
ordered_action |
|
character_action |
|
POSIXct_action |
|
library(mlr3) lrn = lrn("regr.rpart") task = mlr_tasks$get("boston_housing") gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn) resample(task, GraphLearner$new(gr), rsmp("holdout"))library(mlr3) lrn = lrn("regr.rpart") task = mlr_tasks$get("boston_housing") gr = pipeline_robustify(task, lrn) %>>% po("learner", lrn) resample(task, GraphLearner$new(gr), rsmp("holdout"))
Create a new Graph for stacking. A stacked learner uses predictions of
several base learners and fits a super learner using these predictions as
features in order to predict the outcome.
All input arguments are cloned and have no references in common with the returned Graph.
pipeline_stacking( base_learners, super_learner, method = "cv", folds = 3, use_features = TRUE )pipeline_stacking( base_learners, super_learner, method = "cv", folds = 3, use_features = TRUE )
base_learners |
|
super_learner |
|
method |
|
folds |
|
use_features |
|
library(mlr3) library(mlr3learners) base_learners = list( lrn("classif.rpart", predict_type = "prob"), lrn("classif.nnet", predict_type = "prob") ) super_learner = lrn("classif.log_reg") graph_stack = pipeline_stacking(base_learners, super_learner) graph_learner = as_learner(graph_stack) graph_learner$train(tsk("german_credit"))library(mlr3) library(mlr3learners) base_learners = list( lrn("classif.rpart", predict_type = "prob"), lrn("classif.nnet", predict_type = "prob") ) super_learner = lrn("classif.log_reg") graph_stack = pipeline_stacking(base_learners, super_learner) graph_learner = as_learner(graph_stack) graph_learner$train(tsk("german_credit"))
Wraps a Graph that transforms a target during training and inverts the transformation
during prediction. This is done as follows:
Specify a transformation and inversion function using any subclass of PipeOpTargetTrafo, defaults to
PipeOpTargetMutate, afterwards apply graph.
At the very end, during prediction the transformation is inverted using PipeOpTargetInvert.
To set a transformation and inversion function for PipeOpTargetMutate see the
parameters trafo and inverter of the param_set of the resulting Graph.
Note that the input graph is not explicitly checked to actually return a
Prediction during prediction.
All input arguments are cloned and have no references in common with the returned Graph.
pipeline_targettrafo( graph, trafo_pipeop = PipeOpTargetMutate$new(), id_prefix = "" )pipeline_targettrafo( graph, trafo_pipeop = PipeOpTargetMutate$new(), id_prefix = "" )
graph |
|
trafo_pipeop |
|
id_prefix |
|
library("mlr3") tt = pipeline_targettrafo(PipeOpLearner$new(LearnerRegrRpart$new())) tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2) tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response) # gives the same as g = Graph$new() g$add_pipeop(PipeOpTargetMutate$new(param_vals = list( trafo = function(x) log(x, base = 2), inverter = function(x) list(response = 2 ^ x$response)) ) ) g$add_pipeop(LearnerRegrRpart$new()) g$add_pipeop(PipeOpTargetInvert$new()) g$add_edge(src_id = "targetmutate", dst_id = "targetinvert", src_channel = 1, dst_channel = 1) g$add_edge(src_id = "targetmutate", dst_id = "regr.rpart", src_channel = 2, dst_channel = 1) g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert", src_channel = 1, dst_channel = 2)library("mlr3") tt = pipeline_targettrafo(PipeOpLearner$new(LearnerRegrRpart$new())) tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2) tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response) # gives the same as g = Graph$new() g$add_pipeop(PipeOpTargetMutate$new(param_vals = list( trafo = function(x) log(x, base = 2), inverter = function(x) list(response = 2 ^ x$response)) ) ) g$add_pipeop(LearnerRegrRpart$new()) g$add_pipeop(PipeOpTargetInvert$new()) g$add_edge(src_id = "targetmutate", dst_id = "targetinvert", src_channel = 1, dst_channel = 1) g$add_edge(src_id = "targetmutate", dst_id = "regr.rpart", src_channel = 2, dst_channel = 1) g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert", src_channel = 1, dst_channel = 2)
Computes a weighted average of inputs. Used in the context of computing weighted averages of predictions.
Predictions are averaged using weights (in order of appearance in the data) which are optimized using
nonlinear optimization from the package nloptr for a measure provided in
measure. (defaults to classif.ce for LearnerClassifAvg and regr.mse for LearnerRegrAvg).
Learned weights can be obtained from $model.
This Learner implements and generalizes an approach proposed in LeDell (2015) that uses non-linear
optimization in order to learn base-learner weights that optimize a given performance metric (e.g AUC).
The approach is similar but not exactly the same as the one implemented as AUC in the SuperLearner
R package (when metric is "classif.auc").
For a more detailed analysis and the general idea, the reader is referred to LeDell (2015).
Note, that weights always sum to 1 by division by sum(weights) before weighting
incoming features.
mlr_learners_classif.avg mlr_learners_regr.avgmlr_learners_classif.avg mlr_learners_regr.avg
R6Class object inheriting from mlr3::LearnerClassif/mlr3::Learner.
The parameters are the parameters inherited from LearnerClassif, as well as:
measure :: Measure | character Measure to optimize for.
Will be converted to a Measure in case it is character.
Initialized to "classif.ce", i.e. misclassification error for classification
and "regr.mse", i.e. mean squared error for regression.
optimizer :: Optimizer | character(1)Optimizer used to find optimal thresholds.
If character, converts to Optimizer
via opt. Initialized to OptimizerNLoptr.
Nloptr hyperparameters are initialized to xtol_rel = 1e-8, algorithm = "NLOPT_LN_COBYLA"
and equal initial weights for each learner.
For more fine-grained control, it is recommended to supply a instantiated Optimizer.
log_level :: character(1) | integer(1)
Set a temporary log-level for lgr::get_logger("mlr3/bbotk"). Initialized to: "warn".
LearnerClassifAvg$new(), id = "classif.avg")
(chr) -> self
Constructor.
LearnerRegrAvg$new(), id = "regr.avg")
(chr) -> self
Constructor.
LeDell, Erin (2015). Scalable Ensemble Learning and Computationally Efficient Variance Estimation. Ph.D. thesis, UC Berkeley.
Other Learners:
mlr_learners_graph
Other Ensembles:
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
A Learner that encapsulates a Graph to be used in
mlr3 resampling and benchmarks.
The Graph must return a single Prediction on its $predict()
call. The result of the $train() call is discarded, only the
internal state changes during training are used.
The predict_type of a GraphLearner can be obtained or set via it's predict_type active binding.
Setting a new predict type will try to set the predict_type in all relevant
PipeOp / Learner encapsulated within the Graph.
Similarly, the predict_type of a Graph will always be the smallest denominator in the Graph.
A GraphLearner is always constructed in an untrained state. When the graph argument has a
non-NULL $state, it is ignored.
R6Class object inheriting from mlr3::Learner.
GraphLearner$new(graph, id = NULL, param_vals = list(), task_type = NULL, predict_type = NULL)
graph :: Graph | PipeOpGraph to wrap. Can be a PipeOp, which is automatically converted to a Graph.
This argument is usually cloned, unless clone_graph is FALSE; to access the Graph inside GraphLearner by-reference, use $graph.
id :: character(1)
Identifier of the resulting Learner.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings . Default list().
task_type :: character(1)
What task_type the GraphLearner should have; usually automatically inferred for Graphs that are simple enough.
predict_type :: character(1)
What predict_type the GraphLearner should have; usually automatically inferred for Graphs that are simple enough.
clone_graph :: logical(1)
Whether to clone graph upon construction. Unintentionally changing graph by reference can lead to unexpected behaviour,
so TRUE (default) is recommended. In particular, note that the $state of $graph is set to NULL by reference on
construction of GraphLearner, during $train(), and during $predict() when clone_graph is FALSE.
Fields inherited from Learner, as well as:
graph :: GraphGraph that is being wrapped. This field contains the prototype of the Graph that is being trained, but does not
contain the model. Use graph_model to access the trained Graph after $train(). Read-only.
graph_model :: LearnerGraph that is being wrapped. This Graph contains a trained state after $train(). Read-only.
pipeops :: named list of PipeOp
Contains all PipeOps in the underlying Graph, named by the PipeOp's $ids. Shortcut for $graph_model$pipeops. See Graph for details.
edges :: data.table with columns src_id (character), src_channel (character), dst_id (character), dst_channel (character)
Table of connections between the PipeOps in the underlying Graph. Shortcut for $graph$edges. See Graph for details.
param_set :: ParamSet
Parameters of the underlying Graph. Shortcut for $graph$param_set. See Graph for details.
pipeops_param_set :: named list()
Named list containing the ParamSets of all PipeOps in the Graph. See there for details.
pipeops_param_set_values :: named list()
Named list containing the set parameter values of all PipeOps in the Graph. See there for details.
internal_tuned_values :: named list() or NULL
The internal tuned parameter values collected from all PipeOps.
NULL is returned if the learner is not trained or none of the wrapped learners supports internal tuning.
internal_valid_scores :: named list() or NULL
The internal validation scores as retrieved from the PipeOps.
The names are prefixed with the respective IDs of the PipeOps.
NULL is returned if the learner is not trained or none of the wrapped learners supports internal validation.
validate :: numeric(1), "predefined", "test" or NULL
How to construct the validation data. This also has to be configured for the individual PipeOps such as
PipeOpLearner, see set_validate.GraphLearner.
For more details on the possible values, see mlr3::Learner.
marshaled :: logical(1)
Whether the learner is marshaled.
impute_selected_features :: logical(1)
Whether to heuristically determine $selected_features() as all $selected_features() of all "base learner" Learners,
even if they do not have the "selected_features" property / do not implement $selected_features().
If impute_selected_features is TRUE and the base learners do not implement $selected_features(),
the GraphLearner's $selected_features() method will return all features seen by the base learners.
This is useful in cases where feature selection is performed inside the Graph:
The $selected_features() will then be the set of features that were selected by the Graph.
If impute_selected_features is FALSE, the $selected_features() method will throw an error if $selected_features()
is not implemented by the base learners.
This is a heuristic and may report more features than actually used by the base learners,
in cases where the base learners do not implement $selected_features().
The default is FALSE.
Methods inherited from Learner, as well as:
ids(sorted = FALSE)
(logical(1)) -> character
Get IDs of all PipeOps. This is in order that PipeOps were added if
sorted is FALSE, and topologically sorted if sorted is TRUE.
plot(html = FALSE, horizontal = FALSE)
(logical(1), logical(1)) -> NULL
Plot the Graph, using either the igraph package (for html = FALSE, default) or
the visNetwork package for html = TRUE producing a htmlWidget.
The htmlWidget can be rescaled using visOptions.
For html = FALSE, the orientation of the plotted graph can be controlled through horizontal.
marshal
(any) -> self
Marshal the model.
unmarshal
(any) -> self
Unmarshal the model.
base_learner(recursive = Inf, return_po = FALSE, return_all = FALSE, resolve_branching = TRUE)
(numeric(1), logical(1), logical(1), character(1)) -> Learner | PipeOp | list of Learner | list of PipeOp
Return the base learner of the GraphLearner. If recursive is 0, the GraphLearner itself is returned.
Otherwise, the Graph is traversed backwards to find the first PipeOp containing a $learner_model field.
If recursive is 1, that $learner_model (or containing PipeOp, if return_po is TRUE) is returned.
If recursive is greater than 1, the discovered base learner's base_learner() method is called with recursive - 1.
recursive must be set to 1 if return_po is TRUE, and must be set to at most 1 if return_all is TRUE.
If return_po is TRUE, the container-PipeOp is returned instead of the Learner.
This will typically be a PipeOpLearner or a PipeOpLearnerCV.
If return_all is TRUE, a list of Learners or PipeOps is returned.
If return_po is FALSE, this list may contain Multiplicity objects, which are not unwrapped.
If return_all is FALSE and there are multiple possible base learners, an error is thrown.
This may also happen if only a single PipeOpLearner is present that was trained with a Multiplicity.
If resolve_branching is TRUE, and when a PipeOpUnbranch is encountered, the
corresponding PipeOpBranch is searched, and its hyperparameter configuration is used to select the base learner.
There may be multiple corresponding PipeOpBranchs, which are all considered.
If resolve_branching is FALSE, PipeOpUnbranch is treated as any other PipeOp with multiple inputs; all possible branch paths are considered equally.
predict_newdata_fast(newdata, task = NULL)
(data.frame, Task | NULL) -> Prediction
Predicts outcomes for new data in newdata using the model fitted during $train().
For the moment, this is merely a thin wrapper around Learner$predict_newdata() to ensure compatibility, meaning that no speedup is currently achieved.
In the future, this method may be optimized to be faster than $predict_newdata().
Unlike $predict_newdata(), this method does not return a mlr3::Prediction object.
Instead, it returns a list with elements depending on $task_type and $predict_type:
for task_type = "classif": response and prob, or quantiles (if predict_type = "quantiles")
for task_type = "regr": response and se
The following standard extractors as defined by the Learner class are available.
Note that these typically only extract information from the $base_learner().
This works well for simple Graphs that do not modify features too much, but may give unexpected results for Graphs that
add new features or move information between features.
As an example, consider a feature A with missing values, and a feature B that is used for imputation, using a po("imputelearner").
In a case where the following Learner performs embedded feature selection and only selects feature A,
the selected_features() method could return only feature A, and $importance() may even report 0 for feature B.
This would not be entirely accurate when considering the entire GraphLearner, as feature B is used for imputation and would therefore have an impact on predictions.
The following should therefore only be used if the Graph is known to not have an impact on the relevant properties.
importance()
() -> numeric
The $importance() returned by the base learner, if it has the "importance property.
Throws an error otherwise.
selected_features()
() -> character
The $selected_features() returned by the base learner, if it has the "selected_features property.
If the base learner does not have the "selected_features" property and impute_selected_features is TRUE,
all features seen by the base learners are returned.
Throws an error otherwise.
oob_error()
() -> numeric(1)
The $oob_error() returned by the base learner, if it has the "oob_error property.
Throws an error otherwise.
loglik()
() -> numeric(1)
The $loglik() returned by the base learner, if it has the "loglik property.
Throws an error otherwise.
as_graph() is called on the graph argument, so it can technically also be a list of things, which is
automatically converted to a Graph via gunion(); however, this will usually not result in a valid Graph that can
work as a Learner. graph can furthermore be a Learner, which is then automatically
wrapped in a Graph, which is then again wrapped in a GraphLearner object; this usually only adds overhead and is not
recommended.
Other Learners:
mlr_learners_avg
library("mlr3") graph = po("pca") %>>% lrn("classif.rpart") lr = GraphLearner$new(graph) lr = as_learner(graph) # equivalent lr$train(tsk("iris")) lr$graph$state # untrained version! # The following is therefore NULL: lr$graph$pipeops$classif.rpart$learner_model$model # To access the trained model from the PipeOpLearner's Learner, use: lr$graph_model$pipeops$classif.rpart$learner_model$model # Feature importance (of principal components): lr$graph_model$pipeops$classif.rpart$learner_model$importance()library("mlr3") graph = po("pca") %>>% lrn("classif.rpart") lr = GraphLearner$new(graph) lr = as_learner(graph) # equivalent lr$train(tsk("iris")) lr$graph$state # untrained version! # The following is therefore NULL: lr$graph$pipeops$classif.rpart$learner_model$model # To access the trained model from the PipeOpLearner's Learner, use: lr$graph_model$pipeops$classif.rpart$learner_model$model # Feature importance (of principal components): lr$graph_model$pipeops$classif.rpart$learner_model$importance()
A simple Dictionary storing objects of class PipeOp.
Each PipeOp has an associated help page, see mlr_pipeops_[id].
R6Class object inheriting from mlr3misc::Dictionary.
Fields inherited from Dictionary, as well as:
metainf :: environment
Environment that stores the metainf argument of the $add() method.
Only for internal use.
Methods inherited from Dictionary, as well as:
add(key, value, metainf = NULL)
(character(1), R6ClassGenerator, NULL | list)
Adds constructor value to the dictionary with key key, potentially
overwriting a previously stored item. If metainf is not NULL (the default),
it must be a list of arguments that will be given to the value constructor (i.e. value$new())
when it needs to be constructed for as.data.table PipeOp listing.
as.data.table(dict)Dictionary -> data.table::data.table
Returns a data.table with the following columns:
key :: (character)
Key with which the PipeOp was registered to the Dictionary using the $add() method.
label :: (character)
Description of the PipeOp's functionality.
packages :: (character)
Set of all required packages for the PipeOp's train and predict methods.
tags :: (character)
A set of tags associated with the PipeOp describing its purpose.
feature_types :: (character)
Feature types the PipeOp operates on. Is NA for PipeOps that do not directly operate on a Task.
input.num, output.num :: (integer)
Number of the PipeOp's input and output channels. Is NA for PipeOps which accept a varying number of input
and/or output channels depending a construction argument.
See input and output fields of PipeOp.
input.type.train, input.type.predict, output.type.train, output.type.predict :: (character)
Types that are allowed as input to or returned as output of the PipeOp's $train() and $predict() methods.
A value of NULL means that a null object, e.g. no data, is taken as input or being returned as output.
A value of "*" means that any type is possible.
If both input.type.train and output.type.train or both input.type.predict and output.type.predict contain
values enclosed by square brackets ("[", "]"), then the respective input or channel is
Multiplicity-aware. For more information, see Multiplicity.
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Dictionaries:
mlr_graphs
library("mlr3") mlr_pipeops$get("learner", lrn("classif.rpart")) # equivalent: po("learner", learner = lrn("classif.rpart")) # all PipeOps currently in the dictionary: as.data.table(mlr_pipeops)[, c("key", "input.num", "output.num", "packages")]library("mlr3") mlr_pipeops$get("learner", lrn("classif.rpart")) # equivalent: po("learner", learner = lrn("classif.rpart")) # all PipeOps currently in the dictionary: as.data.table(mlr_pipeops)[, c("key", "input.num", "output.num", "packages")]
Generates a more balanced data set by creating synthetic instances of the minority classes using the ADASYN algorithm.
The algorithm generates for each minority instance new data points based on its K nearest neighbors and the difficulty of learning for that data point.
It can only be applied to tasks with numeric features that have no missing values.
See smotefamily::ADAS for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpADAS$new(id = "adas", param_vals = list())
id :: character(1)
Identifier of resulting object, default "adas".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
K :: numeric(1)
The number of nearest neighbors used for sampling new values. Default is 5.
See ADAS().
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
He H, Bai Y, Garcia, A. E, Li S (2008). “ADASYN: Adaptive synthetic sampling approach for imbalanced learning.” In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322-1328. doi:10.1109/IJCNN.2008.4633969.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Create example task data = data.frame( target = factor(sample(c("c1", "c2"), size = 300, replace = TRUE, prob = c(0.1, 0.9))), x1 = rnorm(300), x2 = rnorm(300) ) task = TaskClassif$new(id = "example", backend = data, target = "target") task$head() table(task$data(cols = "target")) # Generate synthetic data for minority class pop = po("adas") adas_result = pop$train(list(task))[[1]]$data() nrow(adas_result) table(adas_result$target)library("mlr3") # Create example task data = data.frame( target = factor(sample(c("c1", "c2"), size = 300, replace = TRUE, prob = c(0.1, 0.9))), x1 = rnorm(300), x2 = rnorm(300) ) task = TaskClassif$new(id = "example", backend = data, target = "target") task$head() table(task$data(cols = "target")) # Generate synthetic data for minority class pop = po("adas") adas_result = pop$train(list(task))[[1]]$data() nrow(adas_result) table(adas_result$target)
Adds new data points by generating synthetic instances for the minority class using the Borderline-SMOTE algorithm.
This can only be applied to classification tasks with numeric features that have no missing values.
See smotefamily::BLSMOTE for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpBLSmote$new(id = "blsmote", param_vals = list())
id :: character(1)
Identifier of resulting object, default "smote".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
K :: numeric(1)
The number of nearest neighbors used for sampling from the minority class. Default is 5.
See BLSMOTE().
C :: numeric(1)
The number of nearest neighbors used for classifying sample points as SAFE/DANGER/NOISE. Default is 5.
See BLSMOTE().
dup_size :: numeric(1)
Desired times of synthetic minority instances over the original number of majority instances. 0 leads to balancing minority and majority class.
Default is 0. See BLSMOTE().
method :: character(1)
The type of Borderline-SMOTE algorithm to use. Default is "type1".
See BLSMOTE().
quiet :: logical(1)
Whether to suppress printing status during training. Initialized to TRUE.
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
Han H, Wang W, Mao B (2005). “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning.” In Huang D, Zhang X, Huang G (eds.), Advances in Intelligent Computing, 878–887. ISBN 978-3-540-31902-3, doi:10.1007/11538059_91.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Create example task data = smotefamily::sample_generator(500, 0.8) data$result = factor(data$result) task = TaskClassif$new(id = "example", backend = data, target = "result") task$head() table(task$data(cols = "result")) # Generate synthetic data for minority class pop = po("blsmote") bls_result = pop$train(list(task))[[1]]$data() nrow(bls_result) table(bls_result$result)library("mlr3") # Create example task data = smotefamily::sample_generator(500, 0.8) data$result = factor(data$result) task = TaskClassif$new(id = "example", backend = data, target = "result") task$head() table(task$data(cols = "result")) # Generate synthetic data for minority class pop = po("blsmote") bls_result = pop$train(list(task))[[1]]$data() nrow(bls_result) table(bls_result$result)
Conducts a Box-Cox transformation on numeric features. The lambda parameter
of the transformation is estimated during training and used for both training
and prediction transformation.
See bestNormalize::boxcox() for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpBoxCox$new(id = "boxcox", param_vals = list())
id :: character(1)
Identifier of resulting object, default "boxcox".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their transformed versions.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as a list of class boxcox for each column, which is transformed.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
standardize :: logical(1)
Whether to center and scale the transformed values to attempt a standard
normal distribution. For details see boxcox().
eps :: numeric(1)
Tolerance parameter to identify if lambda parameter is equal to zero.
For details see boxcox().
lower :: numeric(1)
Lower value for estimation of lambda parameter. For details see boxcox().
upper :: numeric(1)
Upper value for estimation of lambda parameter. For details see boxcox().
Uses the bestNormalize::boxcox function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("boxcox") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("boxcox") task$data() pop$train(list(task))[[1]]$data() pop$state
Perform alternative path branching: PipeOpBranch has multiple output channels
that connect to different paths in a Graph. At any time, only one of these
paths will be taken for execution. At the end of the different paths, the
PipeOpUnbranch PipeOp must be used to indicate the end of alternative paths.
Not to be confused with PipeOpCopy, the naming scheme is a bit unfortunate.
R6Class object inheriting from PipeOp.
PipeOpBranch$new(options, id = "branch", param_vals = list())
options :: numeric(1) | character
If options is an integer number, it determines the number of
output channels / options that are created, named output1...output<n>. The
$selection parameter will then be an integer.
If options is a character, it determines the names of channels directly.
The $selection parameter will then be factorial.
id :: character(1)
Identifier of resulting object, default "branch".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpBranch has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpBranch has multiple output channels depending on the options construction argument, named "output1", "output2", ...
if options is numeric, and named after each options value if options is a character.
All output channels produce the object given as input ("*") or NO_OP, both during training and prediction.
The $state is left empty (list()).
selection :: numeric(1) | character(1)
Selection of branching path to take. Is a ParamInt if the options parameter
during construction was a numeric(1), and ranges from 1 to options. Is a
ParamFct if the options parameter was a character and its possible values
are the options values. Initialized to either 1 (if the options construction argument is numeric(1))
or the first element of options (if it is character).
Alternative path branching is handled by the PipeOp backend. To indicate that
a path should not be taken, PipeOpBranch returns the NO_OP object on its
output channel. The PipeOp handles each NO_OP input by automatically
returning a NO_OP output without calling private$.train() or private$.predict(),
until PipeOpUnbranch is reached. PipeOpUnbranch will then take multiple inputs,
all except one of which must be a NO_OP, and forward the only non-NO_OP
object on its output.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Path Branching:
NO_OP,
filter_noop(),
is_noop(),
mlr_pipeops_unbranch
library("mlr3") pca = po("pca") nop = po("nop") choices = c("pca", "nothing") gr = po("branch", choices) %>>% gunion(list(pca, nop)) %>>% po("unbranch", choices) gr$param_set$values$branch.selection = "pca" gr$train(tsk("iris")) gr$param_set$values$branch.selection = "nothing" gr$train(tsk("iris"))library("mlr3") pca = po("pca") nop = po("nop") choices = c("pca", "nothing") gr = po("branch", choices) %>>% gunion(list(pca, nop)) %>>% po("unbranch", choices) gr$param_set$values$branch.selection = "pca" gr$train(tsk("iris")) gr$param_set$values$branch.selection = "nothing" gr$train(tsk("iris"))
Chunks its input into outnum chunks.
Creates outnum Tasks during training, and
simply passes on the input during outnum times during prediction.
R6Class object inheriting from PipeOp.
PipeOpChunk$new(outnum, id = "chunk", param_vals = list())
outnum :: numeric(1)
Number of output channels, and therefore number of chunks created.
id :: character(1)
Identifier of resulting object, default "chunk".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpChunk has one input channel named "input", taking a Task both during training and prediction.
PipeOpChunk has multiple output channels depending on the options construction argument, named "output1", "output2", ...
All output channels produce (respectively disjoint, random) subsets of the input Task during training, and
pass on the original Task during prediction.
The $state is left empty (list()).
shuffle :: logical(1)
Should the data be shuffled before chunking? Initialized to TRUE.
Uses the mlr3misc::chunk_vector() function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("wine") opc = mlr_pipeops$get("chunk", 2) # watch the row number: 89 during training (task is chunked)... opc$train(list(task)) # ... 178 during predict (task is copied) opc$predict(list(task))library("mlr3") task = tsk("wine") opc = mlr_pipeops$get("chunk", 2) # watch the row number: 89 during training (task is chunked)... opc$train(list(task)) # ... 178 during predict (task is copied) opc$predict(list(task))
Both undersamples a Task to keep only a fraction of the rows of the majority class,
as well as oversamples (repeats data points) rows of the minority class.
Sampling happens only during training phase. Class-balancing a Task by sampling may be
beneficial for classification with imbalanced training data.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpClassBalancing$new(id = "classbalancing", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "classbalancing"
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added or removed rows to balance target classes.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
ratio :: numeric(1)
Ratio of number of rows of classes to keep, relative
to the $reference value. Initialized to 1.
reference :: numeric(1)
What the $ratio value is measured against. Can be "all" (mean instance count of
all classes), "major" (instance count of class with most instances), "minor"
(instance count of class with fewest instances), "nonmajor" (average instance
count of all classes except the major one), "nonminor" (average instance count
of all classes except the minor one), and "one" ($ratio determines the number of
instances to have, per class). Initialized to "all".
adjust :: numeric(1)
Which classes to up / downsample. Can be "all" (up and downsample all to match required
instance count), "major", "minor", "nonmajor", "nonminor" (see respective values
for $reference), "upsample" (only upsample), and "downsample". Initialized to "all".
shuffle :: logical(1)
Whether to shuffle the rows of the resulting task.
In case the data is upsampled and shuffle = FALSE, the resulting task will have the original
rows (which were not removed in downsampling) in the original order, followed by all newly added rows
ordered by target class.
Initialized to TRUE.
Up / downsampling happens as follows: At first, a "target class count" is calculated, by taking the mean
class count of all classes indicated by the reference parameter (e.g. if reference is "nonmajor":
the mean class count of all classes that are not the "major" class, i.e. the class with the most samples)
and multiplying this with the value of the ratio parameter. If reference is "one", then the "target
class count" is just the value of ratio (i.e. 1 * ratio).
Then for each class that is referenced by the adjust parameter (e.g. if adjust is "nonminor":
each class that is not the class with the fewest samples), PipeOpClassBalancing either throws out
samples (downsampling), or adds additional rows that are equal to randomly chosen samples (upsampling),
until the number of samples for these classes equals the "target class count".
No upsampling is performed for classes that were not observed during training (i.e. empty factor levels in the target column).
Uses task$filter() to remove rows. When identical rows are added during upsampling, then the task$row_roles$use can not be used
to duplicate rows because of [inaudible]; instead the task$rbind() function is used, and
a new data.table is attached that contains all rows that are being duplicated exactly as many times as they are being added.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("spam") opb = po("classbalancing") # target class counts table(task$truth()) # double the instances in the minority class (spam) opb$param_set$values = list(ratio = 2, reference = "minor", adjust = "minor", shuffle = FALSE) result = opb$train(list(task))[[1L]] table(result$truth()) # up or downsample all classes until exactly 20 per class remain opb$param_set$values = list(ratio = 20, reference = "one", adjust = "all", shuffle = FALSE) result = opb$train(list(task))[[1]] table(result$truth())library("mlr3") task = tsk("spam") opb = po("classbalancing") # target class counts table(task$truth()) # double the instances in the minority class (spam) opb$param_set$values = list(ratio = 2, reference = "minor", adjust = "minor", shuffle = FALSE) result = opb$train(list(task))[[1L]] table(result$truth()) # up or downsample all classes until exactly 20 per class remain opb$param_set$values = list(ratio = 20, reference = "one", adjust = "all", shuffle = FALSE) result = opb$train(list(task))[[1]] table(result$truth())
Perform (weighted) majority vote prediction from classification Predictions by connecting
PipeOpClassifAvg to multiple PipeOpLearner outputs.
Always returns a "prob" prediction, regardless of the incoming Learner's
$predict_type. The label of the class with the highest predicted probability is selected as the
"response" prediction. If the Learner's $predict_type is set to "prob",
the probability aggregation is controlled by prob_aggr (see below). If $predict_type = "response",
predictions are internally converted to one-hot probability vectors (point mass on the predicted class) before aggregation.
"prob" aggregation:prob_aggr = "mean" – Linear opinion pool (arithmetic mean of probabilities; default).
Interpretation. Mixture semantics: choose a base model with probability w[i], then draw from its class distribution.
Decision-theoretically, this is the minimizer of sum(w[i] * KL(p[i] || p)) over probability vectors p, where KL(x || y) is the Kullback-Leibler divergence.
Typical behavior. Conservative / better calibrated and robust to near-zero probabilities (never assigns zero unless all do).
This is the standard choice for probability averaging in ensembles and stacking.
prob_aggr = "log" – Log opinion pool / product of experts (geometric mean in probability space):
Average per-model logs (or equivalently, logits) and apply softmax.
Interpretation. Product semantics: p_ens ~ prod_i p_i^{w[i]}; minimizes sum(w[i] * KL(p || p[i])).
Typical behavior. Sharper / lower entropy (emphasizes consensus regions), but can be overconfident and is sensitive
to zeros; use prob_aggr_eps to clip small probabilities for numerical stability. Often beneficial with strong, similarly
calibrated members (e.g., neural networks), less so when calibration is the priority.
All incoming Learner's $predict_type must agree.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction. Defaults to equal weights for each model.
R6Class inheriting from PipeOpEnsemble/PipeOp.
PipeOpClassifAvg$new(innum = 0, collect_multiplicity = FALSE, id = "classifavg", param_vals = list())
innum :: numeric(1)
Determines the number of input channels.
If innum is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs.
collect_multiplicity :: logical(1)
If TRUE, the input is a Multiplicity collecting channel. This means, a
Multiplicity input, instead of multiple normal inputs, is accepted and the members are aggregated. This requires innum to be 0.
Default is FALSE.
id :: character(1)
Identifier of the resulting object, default "classifavg".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpEnsemble. Instead of a Prediction, a PredictionClassif
is used as input and output during prediction.
The $state is left empty (list()).
The parameters are the parameters inherited from the PipeOpEnsemble, as well as:
prob_aggr :: character(1)
Controls how incoming class probabilities are aggregated. One of "mean" (linear opinion pool; default) or
"log" (log opinion pool / product of experts). See the description above for definitions and interpretation.
Only has an effect if the incoming predictions have "prob" values.
prob_aggr_eps :: numeric(1)
Small positive constant used only for prob_aggr = "log" to clamp probabilities before taking logs, improving numerical
stability and avoiding -Inf. Ignored for prob_aggr = "mean". Default is 1e-12.
Inherits from PipeOpEnsemble by implementing the private$weighted_avg_predictions() method.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpEnsemble/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
library("mlr3") # Simple Bagging gr = ppl("greplicate", po("subsample") %>>% po("learner", lrn("classif.rpart")), n = 3 ) %>>% po("classifavg") resample(tsk("iris"), GraphLearner$new(gr), rsmp("holdout"))library("mlr3") # Simple Bagging gr = ppl("greplicate", po("subsample") %>>% po("learner", lrn("classif.rpart")), n = 3 ) %>>% po("classifavg") resample(tsk("iris"), GraphLearner$new(gr), rsmp("holdout"))
Adds a class-dependent sample weights column to a Task, allowing
Learners and Measures to weight observations
differently during training and evaluation.
Weights are assigned per observation based on the target class and can be written
to the "weights_learner" column, the "weights_measure" column, both, or neither.
Only binary classification tasks (TaskClassif) are supported.
Note: By default, all weights are set to 1. To obtain a meaningful effect, the
minor_weight parameter must be adjusted.
See PipeOpClassWeightsEx for an extended version of this PipeOp which can
handle multiclass classification tasks and offers several methods for automatically
determining weights.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpClassWeights$new(id = "classweights", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "classweights"
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with an added weights column according to the target class.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
minor_weight :: numeric(1)
Weight given to samples of the minor class. Major class samples have weight 1. Initialized to 1.
weights_learner :: logical(1)
Whether the created weights should be stored as a weights_learner column or not. Initialized to TRUE.
weights_measure :: logical(1)
Whether the created weights should be stored as a weights_measure column or not. Initialized to FALSE.
Adds a .WEIGHTS column to the Task, which is removed from the feature role and mapped to the requested weight roles.
There will be a naming conflict if this column already exists and is not a weight column already. For potentially pre-existing weight columns,
the weight column role gets dropped, but they remain in the DataBackend of the Task.
The Learner must support weights for this PipeOp to have an effect.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("spam") opb = po("classweights") # task weights if ("weights_learner" %in% names(task)) { task$weights_learner # recent mlr3-versions } else { task$weights # old mlr3-versions } # double the instances in the minority class (spam) opb$param_set$values$minor_weight = 2 result = opb$train(list(task))[[1L]] if ("weights_learner" %in% names(result)) { result$weights_learner # recent mlr3-versions } else { result$weights # old mlr3-versions }library("mlr3") task = tsk("spam") opb = po("classweights") # task weights if ("weights_learner" %in% names(task)) { task$weights_learner # recent mlr3-versions } else { task$weights # old mlr3-versions } # double the instances in the minority class (spam) opb$param_set$values$minor_weight = 2 result = opb$train(list(task))[[1L]] if ("weights_learner" %in% names(result)) { result$weights_learner # recent mlr3-versions } else { result$weights # old mlr3-versions }
Adds a class-dependent sample weights column to a Task, allowing
Learners and Measures to weight observations
differently during training and evaluation.
Weights are assigned per observation based on the target class and can be written
to the "weights_learner" column, the "weights_measure" column, both, or neither.
Binary as well as multiclass classification tasks (TaskClassif) are supported.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpClassWeightsEx$new(id = "classweightsex", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "classweightsex"
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with an added weights column according to the target class.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
weights_learner :: logical(1)
Whether the created weights should be stored as a weights_learner column or not. Initialized to TRUE.
weights_measure :: logical(1)
Whether the created weights should be stored as a weights_measure column or not. Initialized to FALSE.
weight_method :: character(1)
The method that is chosen to determine the weights of the samples. Methods encompass "inverse_class_frequency", "inverse_square_root_of_frequency", "median_frequency_balancing" and "explicit".
In case of "explicit", the mapping hyperparameter must be use. Initialized to "explicit".
mapping :: named numeric
A named numeric vector that specifies a finite weight for each target class in the task. This only has an effect if weight_method is explicit.
Adds a .WEIGHTS column to the Task, which is removed from the feature role and mapped to the requested weight roles.
There will be a naming conflict if this column already exists and is not a weight column already. For potentially pre-existing weight columns, the weight
column role gets dropped, but they remain in the DataBackend of the Task.
When weight_method = "explicit", the mapping must cover every class present in the training data and may not contain additional classes.
The Learner must support weights for this PipeOp to have an effect.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("spam") poicf = po("classweightsex", param_vals = list(weights_learner = TRUE, weights_measure = TRUE, weight_method = "inverse_class_frequency")) result = poicf$train(list(task))[[1L]] if ("weights_learner" %in% names(result)) { result$weights_learner # recent mlr3-versions } else { result$weights # old mlr3-versions } if ("weights_measure" %in% names(result)) { result$weights_measure # recent mlr3-versions } else { result$weights # old mlr3-versions }library("mlr3") task = tsk("spam") poicf = po("classweightsex", param_vals = list(weights_learner = TRUE, weights_measure = TRUE, weight_method = "inverse_class_frequency")) result = poicf$train(list(task))[[1L]] if ("weights_learner" %in% names(result)) { result$weights_learner # recent mlr3-versions } else { result$weights # old mlr3-versions } if ("weights_measure" %in% names(result)) { result$weights_measure # recent mlr3-versions } else { result$weights # old mlr3-versions }
Applies a function to each column of a task. Use the affect_columns parameter inherited from
PipeOpTaskPreprocSimple to limit the columns this function should be applied to. This can be used
for simple parameter transformations or type conversions (e.g. as.numeric).
The same function is applied during training and prediction. One important relationship for
machine learning preprocessing is that during the prediction phase, the preprocessing on each
data row should be independent of other rows. Therefore, the applicator function should always
return a vector / list where each result component only depends on the corresponding input component and
not on other components. As a rule of thumb, if the function f generates output different
from Vectorize(f), it is not a function that should be used for applicator.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpColApply$new(id = "colapply", param_vals = list())
id :: character(1)
Identifier of resulting object, default "colapply".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with features changed according to the applicator parameter.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
applicator :: function
Function to apply to each column of the task.
The return value should be a vector of the same length as the input, i.e., the function vectorizes over the input.
A typical example would be as.numeric.
The return value can also be a matrix, data.frame, or data.table.
In this case, the length of the input must match the number of returned rows.
The names of the resulting features of the output Task is based on the (column) name(s) of the return value of the applicator function,
prefixed with the original feature name separated by a dot (.).
Use Vectorize to create a vectorizing function from any function that ordinarily only takes one element input.
Calls map on the data, using the value of applicator as f. and coerces the output via as.data.table.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") poca = po("colapply", applicator = as.character) poca$train(list(task))[[1]] # types are converted # function that does not vectorize f1 = function(x) { # we could use `ifelse` here, but that is not the point if (x > 1) { "a" } else { "b" } } poca$param_set$values$applicator = Vectorize(f1) poca$train(list(task))[[1]]$data() # only affect Petal.* columns poca$param_set$values$affect_columns = selector_grep("^Petal") poca$train(list(task))[[1]]$data() # function returning multiple columns f2 = function(x) { cbind(floor = floor(x), ceiling = ceiling(x)) } poca$param_set$values$applicator = f2 poca$param_set$values$affect_columns = selector_all() poca$train(list(task))[[1]]$data()library("mlr3") task = tsk("iris") poca = po("colapply", applicator = as.character) poca$train(list(task))[[1]] # types are converted # function that does not vectorize f1 = function(x) { # we could use `ifelse` here, but that is not the point if (x > 1) { "a" } else { "b" } } poca$param_set$values$applicator = Vectorize(f1) poca$train(list(task))[[1]]$data() # only affect Petal.* columns poca$param_set$values$affect_columns = selector_grep("^Petal") poca$train(list(task))[[1]]$data() # function returning multiple columns f2 = function(x) { cbind(floor = floor(x), ceiling = ceiling(x)) } poca$param_set$values$applicator = f2 poca$param_set$values$affect_columns = selector_all() poca$train(list(task))[[1]]$data()
Collapses factors of type factor, ordered: Collapses the rarest factors in the training samples, until target_level_count
levels remain. Levels that have prevalence strictly above no_collapse_above_prevalence or absolute count strictly above no_collapse_above_absolute
are retained, however. For factor variables, these are collapsed to the next larger level, for ordered variables, rare variables
are collapsed to the neighbouring class, whichever has fewer samples.
In case both no_collapse_above_prevalence and no_collapse_above_absolute are given, the less strict threshold of the two will be used, i.e. if
no_collapse_above_prevalence is 1 and no_collapse_above_absolute is 10 for a task with 100 samples, levels that are seen more than 10 times
will not be collapsed.
Levels not seen during training are not touched during prediction; Therefore it is useful to combine this with the
PipeOpFixFactors.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpCollapseFactors$new(id = "collapsefactors", param_vals = list())
id :: character(1)
Identifier of resulting object, default "collapsefactors".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with rare affected factor and ordered feature levels collapsed.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
collapse_map :: named list of named list of character
List of factor level maps. For each factor, collapse_map contains a named list that indicates what levels
of the input task get mapped to what levels of the output task. If collapse_map has an entry feat_1 with
an entry a = c("x", "y"), it means that levels "x" and "y" get collapsed to level "a" in feature "feat_1".
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
no_collapse_above_prevalence :: numeric(1)
Fraction of samples below which factor levels get collapsed. Default is 1, which causes all levels
to be collapsed until target_level_count remain.
no_collapse_above_absolute :: integer(1)
Number of samples below which factor levels get collapsed. Default is Inf, which causes all levels
to be collapsed until target_level_count remain.
target_level_count :: integer(1)
Number of levels to retain. Default is 2.
Makes use of the fact that levels(fact_var) = list(target1 = c("source1", "source2"), target2 = "source2") causes
renaming of level "source1" and "source2" both to "target1", and also "source2" to "target2".
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") op = PipeOpCollapseFactors$new() # Create example training task df = data.frame( target = runif(100), fct = factor(rep(LETTERS[1:6], times = c(25, 30, 5, 15, 5, 20))), ord = factor(rep(1:6, times = c(20, 25, 30, 5, 5, 15)), ordered = TRUE) ) task = TaskRegr$new(df, target = "target", id = "example_train") # Training train_task_collapsed = op$train(list(task))[[1]] train_task_collapsed$levels(c("fct", "ord")) # Create example prediction task df_pred = data.frame( target = runif(7), fct = factor(LETTERS[1:7]), ord = factor(1:7, ordered = TRUE) ) pred_task = TaskRegr$new(df_pred, target = "target", id = "example_pred") # Prediction pred_task_collapsed = op$predict(list(pred_task))[[1]] pred_task_collapsed$levels(c("fct", "ord"))library("mlr3") op = PipeOpCollapseFactors$new() # Create example training task df = data.frame( target = runif(100), fct = factor(rep(LETTERS[1:6], times = c(25, 30, 5, 15, 5, 20))), ord = factor(rep(1:6, times = c(20, 25, 30, 5, 5, 15)), ordered = TRUE) ) task = TaskRegr$new(df, target = "target", id = "example_train") # Training train_task_collapsed = op$train(list(task))[[1]] train_task_collapsed$levels(c("fct", "ord")) # Create example prediction task df_pred = data.frame( target = runif(7), fct = factor(LETTERS[1:7]), ord = factor(1:7, ordered = TRUE) ) pred_task = TaskRegr$new(df_pred, target = "target", id = "example_pred") # Prediction pred_task_collapsed = op$predict(list(pred_task))[[1]] pred_task_collapsed$levels(c("fct", "ord"))
Changes the column roles of the input Task according to new_role or its inverse new_role_direct.
Setting a new target variable or changing the role of an existing target variable is not supported.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpColRoles$new(id = "colroles", param_vals = list())
id :: character(1)
Identifier of resulting object, default "colroles".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with transformed column roles according to new_role or its inverse new_role_direct.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
new_role :: named list
Named list of new column roles by column. The names must match the column names of the input task that
will later be trained/predicted on. Each entry of the list must contain a character vector with
possible values of mlr_reflections$task_col_roles.
If the value is given as character() or NULL, the column will be dropped from the input task. Changing the role
of a column results in this column loosing its previous role(s).
new_role_direct :: named list
#
Named list of new column roles by role. The names must match the possible column roles, i.e. values of
mlr_reflections$task_col_roles. Each entry of the list must contain a character
vector with column names of the input task that will later be trained/predicted on.
If the value is given as character() or NULL, all columns will be dropped from the role given in the element
name. The value given for a role overwrites the previous entry in task$col_roles for that role, completely.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("penguins") pop = po("colroles", param_vals = list( new_role = list(body_mass = c("order", "feature")) )) train_out1 = pop$train(list(task))[[1L]] train_out1$col_roles pop$param_set$set_values( new_role = NULL, new_role_direct = list(order = character(), group = "island") ) train_out2 = pop$train(list(train_out1)) train_out2$col_roleslibrary("mlr3") task = tsk("penguins") pop = po("colroles", param_vals = list( new_role = list(body_mass = c("order", "feature")) )) train_out1 = pop$train(list(task))[[1L]] train_out1$col_roles pop$param_set$set_values( new_role = NULL, new_role_direct = list(order = character(), group = "island") ) train_out2 = pop$train(list(train_out1)) train_out2$col_roles
Copies its input outnum times. This PipeOp usually not needed, because copying happens automatically when one
PipeOp is followed by multiple different PipeOps. However, when constructing big Graphs using the
%>>%-operator, PipeOpCopy can be helpful to specify which PipeOp gets connected to which.
R6Class object inheriting from PipeOp.
PipeOpCopy$new(outnum, id = "copy", param_vals = list())
outnum :: numeric(1)
Number of output channels, and therefore number of copies being made.
id :: character(1)
Identifier of resulting object, default "copy".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpCopy has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpCopy has multiple output channels depending on the outnum construction argument, named "output1", "output2", ...
All output channels produce the object given as input ("*").
The $state is left empty (list()).
PipeOpCopy has no parameters.
Note that copies are not clones, but only reference copies. This affects R6-objects: If R6 objects are copied using
PipeOpCopy, they must be cloned beforehand.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Placeholder Pipeops:
mlr_pipeops_nop
# The following copies the output of 'scale' automatically to both # 'pca' and 'nop' po("scale") %>>% gunion(list( po("pca"), po("nop") )) # The following would not work: the '%>>%'-operator does not know # which output to connect to which input # > gunion(list( # > po("scale"), # > po("select") # > )) %>>% # > gunion(list( # > po("pca"), # > po("nop"), # > po("imputemean") # > )) # Instead, the 'copy' operator makes clear which output gets copied. gunion(list( po("scale") %>>% po("copy", outnum = 2), po("select") )) %>>% gunion(list( po("pca"), po("nop"), po("imputemean") ))# The following copies the output of 'scale' automatically to both # 'pca' and 'nop' po("scale") %>>% gunion(list( po("pca"), po("nop") )) # The following would not work: the '%>>%'-operator does not know # which output to connect to which input # > gunion(list( # > po("scale"), # > po("select") # > )) %>>% # > gunion(list( # > po("pca"), # > po("nop"), # > po("imputemean") # > )) # Instead, the 'copy' operator makes clear which output gets copied. gunion(list( po("scale") %>>% po("copy", outnum = 2), po("select") )) %>>% gunion(list( po("pca"), po("nop"), po("imputemean") ))
Based on POSIXct/Date columns of the data, a set of date related features is computed and
added to the feature set of the output task. If no POSIXct or Date column is found, the
original task is returned unaltered. This functionality is based on the add_datepart() and
add_cyclic_datepart() functions from the fastai package. If operation on only
particular POSIXct/Date columns is requested, use the affect_columns parameter inherited
from PipeOpTaskPreprocSimple.
For Date columns, the features "hour", "minute", "second", and "is_day" are skipped.
If cyclic = TRUE, cyclic features are computed for the features "month", "week_of_year",
"day_of_year", "day_of_month", "day_of_week", "hour", "minute" and "second". This
means that for each feature x, two additional features are computed, namely the sine and cosine
transformation of 2 * pi * x / max_x (here max_x is the largest possible value the feature
could take on + 1, assuming the lowest possible value is given by 0, e.g., for hours from 0 to
23, this is 24). This is useful to respect the cyclical nature of features such as seconds, i.e.,
second 21 and second 22 are one second apart, but so are second 60 and second 1 of the next
minute.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpDateFeatures$new(id = "datefeatures", param_vals = list())
id :: character(1)
Identifier of resulting object, default "datefeatures".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with date-related features computed and added to the
feature set of the output task and the POSIXct columns of the data removed from the
feature set (depending on the value of keep_date_var).
The $state is a named list with the $state elements inherited from
PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
keep_date_var :: logical(1)
Should the POSIXct columns be kept as features? Default FALSE.
cyclic :: logical(1)
Should cyclic features be computed? See Internals. Default FALSE.
year :: logical(1)
Should the year be extracted as a feature? Default TRUE.
quarter :: logical(1)
Should the quarter be extracted as a feature? Default TRUE.
month :: logical(1)
Should the month be extracted as a feature? Default TRUE.
week_of_year :: logical(1)
Should the week of the year be extracted as a feature? Default TRUE.
day_of_year :: logical(1)
Should the day of the year be extracted as a feature? Default TRUE.
day_of_month :: logical(1)
Should the day of the month be extracted as a feature? Default TRUE.
day_of_week :: logical(1)
Should the day of the week (ISO 8601) be extracted as a feature? Default TRUE.
hour :: logical(1)
Should the hour be extracted as a feature? Default TRUE.
minute :: logical(1)
Should the minute be extracted as a feature? Default TRUE.
second :: logical(1)
Should the second be extracted as a feature? Default TRUE.
is_day :: logical(1)
Should a feature be extracted indicating whether it is day time (06:00am - 08:00pm)?
Default TRUE.
The cyclic feature transformation always assumes that values range from 0, so some values (e.g. day of the month) are shifted before sine/cosine transform.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") dat = iris set.seed(1) dat$date = sample( seq(as.POSIXct("2020-02-01"), to = as.POSIXct("2020-02-29"), by = "hour"), size = 150L ) task = TaskClassif$new("iris_date", backend = dat, target = "Species") pop = po("datefeatures", param_vals = list(cyclic = FALSE, minute = FALSE, second = FALSE)) pop$train(list(task)) pop$statelibrary("mlr3") dat = iris set.seed(1) dat$date = sample( seq(as.POSIXct("2020-02-01"), to = as.POSIXct("2020-02-29"), by = "hour"), size = 150L ) task = TaskClassif$new("iris_date", backend = dat, target = "Species") pop = po("datefeatures", param_vals = list(cyclic = FALSE, minute = FALSE, second = FALSE)) pop$train(list(task)) pop$state
Reverses one-hot or treatment encoding of columns. It collapses multiple numeric or integer columns into one factor
column based on a pre-specified grouping pattern of column names.
May be applied to multiple groups of columns, grouped by matching a common naming pattern. The grouping pattern is
extracted to form the name of the newly derived factor column, and levels are constructed from the previous column
names, with parts matching the grouping pattern removed (see examples). The level per row of the new factor column is generally
determined as the name of the column with the maximum value in the group.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpEncode$new(id = "decode", param_vals = list())
id :: character(1)
Identifier of resulting object, default "decode".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with encoding columns collapsed into new decoded columns.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
colmaps :: named list
Named list of named character vectors. Each element is named according to the new column name extracted by
group_pattern. Each vector contains the level names for the new factor column that should be created, named by
the corresponding old column name. If treatment_encoding is TRUE, then each vector also contains ref_name as the
reference class with an empty string as name.
treatment_encoding :: logical(1)
Value of treatment_encoding hyperparameter.
cutoff :: numeric(1)
Value of treatment_encoding hyperparameter, or 0 if that is not given.
ties_method :: character(1)
Value of ties_method hyperparameter.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
group_pattern :: character(1)
A regular expression to be applied to column names. Should contain a capturing group for the new
column name, and match everything that should not be interpreted as the new factor levels (which are constructed as
the difference between column names and what group_pattern matches).
If set to "", all columns matching the group_pattern are collapsed into one factor column called
pipeop.decoded. Use PipeOpRenameColumns to rename this column.
Initialized to "^([^.]+)\\.", which would extract everything up to the first dot as the new column name and
construct new levels as everything after the first dot.
treatment_encoding :: logical(1)
If TRUE, treatment encoding is assumed instead of one-hot encoding. Initialized to FALSE.
treatment_cutoff :: numeric(1)
If treatment_encoding is TRUE, specifies a cutoff value for identifying the reference level. The reference level
is set to ref_name in rows where the value is less than or equal to a specified cutoff value (e.g., 0) in all
columns in that group. Default is 0.
ref_name :: character(1)
If treatment_encoding is TRUE, specifies the name for reference levels. Default is "ref".
ties_method :: character(1)
Method for resolving ties if multiple columns have the same value. Specifies the value from which of the columns
with the same value is to be picked. Options are "first", "last", or "random". Initialized to "random".
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Reverse one-hot encoding df = data.frame( target = runif(4), x.1 = rep(c(1, 0), 2), x.2 = rep(c(0, 1), 2), y.1 = rep(c(1, 0), 2), y.2 = rep(c(0, 1), 2), a = runif(4) ) task_one_hot = TaskRegr$new(id = "example", backend = df, target = "target") pop = po("decode") train_out = pop$train(list(task_one_hot))[[1]] # x.1 and x.2 are collapsed into x, same for y; a is ignored. train_out$data() # Reverse treatment encoding from PipeOpEncode df = data.frame( target = runif(6), fct = factor(rep(c("a", "b", "c"), 2)) ) task = TaskRegr$new(id = "example", backend = df, target = "target") po_enc = po("encode", method = "treatment") task_encoded = po_enc$train(list(task))[[1]] task_encoded$data() po_dec = po("decode", treatment_encoding = TRUE) task_decoded = pop$train(list(task))[[1]] # x.1 and x.2 are collapsed into x. All rows where all values # are smaller or equal to 0, the level is set to the reference level. task_decoded$data() # Different group_pattern df = data.frame( target = runif(4), x_1 = rep(c(1, 0), 2), x_2 = rep(c(0, 1), 2), y_1 = rep(c(2, 0), 2), y_2 = rep(c(0, 1), 2) ) task = TaskRegr$new(id = "example", backend = df, target = "target") # Grouped by first underscore pop = po("decode", group_pattern = "^([^_]+)\\_") train_out = pop$train(list(task))[[1]] # x_1 and x_2 are collapsed into x, same for y train_out$data() # Empty string to collapse all matches into one factor column. pop$param_set$set_values(group_pattern = "") train_out = pop$train(list(task))[[1]] # All columns are combined into a single column. # The level for each row is determined by the column with the largest value in that row. # By default, ties are resolved randomly. train_out$data()library("mlr3") # Reverse one-hot encoding df = data.frame( target = runif(4), x.1 = rep(c(1, 0), 2), x.2 = rep(c(0, 1), 2), y.1 = rep(c(1, 0), 2), y.2 = rep(c(0, 1), 2), a = runif(4) ) task_one_hot = TaskRegr$new(id = "example", backend = df, target = "target") pop = po("decode") train_out = pop$train(list(task_one_hot))[[1]] # x.1 and x.2 are collapsed into x, same for y; a is ignored. train_out$data() # Reverse treatment encoding from PipeOpEncode df = data.frame( target = runif(6), fct = factor(rep(c("a", "b", "c"), 2)) ) task = TaskRegr$new(id = "example", backend = df, target = "target") po_enc = po("encode", method = "treatment") task_encoded = po_enc$train(list(task))[[1]] task_encoded$data() po_dec = po("decode", treatment_encoding = TRUE) task_decoded = pop$train(list(task))[[1]] # x.1 and x.2 are collapsed into x. All rows where all values # are smaller or equal to 0, the level is set to the reference level. task_decoded$data() # Different group_pattern df = data.frame( target = runif(4), x_1 = rep(c(1, 0), 2), x_2 = rep(c(0, 1), 2), y_1 = rep(c(2, 0), 2), y_2 = rep(c(0, 1), 2) ) task = TaskRegr$new(id = "example", backend = df, target = "target") # Grouped by first underscore pop = po("decode", group_pattern = "^([^_]+)\\_") train_out = pop$train(list(task))[[1]] # x_1 and x_2 are collapsed into x, same for y train_out$data() # Empty string to collapse all matches into one factor column. pop$param_set$set_values(group_pattern = "") train_out = pop$train(list(task))[[1]] # All columns are combined into a single column. # The level for each row is determined by the column with the largest value in that row. # By default, ties are resolved randomly. train_out$data()
Encodes columns of type factor and ordered.
Possible encodings are "one-hot" encoding, as well as encoding according to stats::contr.helmert(), stats::contr.poly(),
stats::contr.sum() and stats::contr.treatment().
Newly created columns are named via pattern [column-name].[x] where x is the respective factor level for "one-hot" and
"treatment" encoding, and an integer sequence otherwise.
Use the PipeOpTaskPreproc $affect_columns functionality to only encode a subset of columns, or only encode columns of a certain type.
character-type features can be encoded by converting them factor features first, using ppl("convert_types", "character", "factor").
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpEncode$new(id = "encode", param_vals = list())
id :: character(1)
Identifier of resulting object, default "encode".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected factor and ordered columns encoded according to the method
parameter.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
contrasts :: named list of matrix
List of contrast matrices, one for each affected discrete feature. The rows of each matrix correspond to (training task) levels, the
columns to the new columns that replace the old discrete feature. See stats::contrasts.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
method :: character(1)
Initialized to "one-hot". One of:
"one-hot": create a new column for each factor level.
"treatment": create columns leaving out the first factor level of each factor variable (see stats::contr.treatment()).
"helmert": create columns according to Helmert contrasts (see stats::contr.helmert()).
"poly": create columns with contrasts based on orthogonal polynomials (see stats::contr.poly()).
"sum": create columns with contrasts summing to zero, (see stats::contr.sum()).
Uses the stats::contrasts functions. This is relatively inefficient for features with a large number of levels.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") data = data.table::data.table(x = factor(letters[1:3]), y = factor(letters[1:3])) task = TaskClassif$new("task", data, "x") poe = po("encode") # poe is initialized with encoding: "one-hot" poe$train(list(task))[[1]]$data() # other kinds of encoding: poe$param_set$values$method = "treatment" poe$train(list(task))[[1]]$data() poe$param_set$values$method = "helmert" poe$train(list(task))[[1]]$data() poe$param_set$values$method = "poly" poe$train(list(task))[[1]]$data() poe$param_set$values$method = "sum" poe$train(list(task))[[1]]$data() # converting character-columns data_chr = data.table::data.table(x = factor(letters[1:3]), y = letters[1:3]) task_chr = TaskClassif$new("task_chr", data_chr, "x") goe = ppl("convert_types", "character", "factor") %>>% po("encode") goe$train(task_chr)[[1]]$data()library("mlr3") data = data.table::data.table(x = factor(letters[1:3]), y = factor(letters[1:3])) task = TaskClassif$new("task", data, "x") poe = po("encode") # poe is initialized with encoding: "one-hot" poe$train(list(task))[[1]]$data() # other kinds of encoding: poe$param_set$values$method = "treatment" poe$train(list(task))[[1]]$data() poe$param_set$values$method = "helmert" poe$train(list(task))[[1]]$data() poe$param_set$values$method = "poly" poe$train(list(task))[[1]]$data() poe$param_set$values$method = "sum" poe$train(list(task))[[1]]$data() # converting character-columns data_chr = data.table::data.table(x = factor(letters[1:3]), y = letters[1:3]) task_chr = TaskClassif$new("task_chr", data_chr, "x") goe = ppl("convert_types", "character", "factor") %>>% po("encode") goe$train(task_chr)[[1]]$data()
Encodes columns of type factor, character and ordered.
Impact coding for classification Tasks converts factor levels of each (factorial) column to the difference between each target level's conditional log-likelihood given this level, and the target level's global log-likelihood.
Impact coding for regression Tasks converts factor levels of each (factorial) column to the difference between the target's conditional mean given this level, and the target's global mean.
Treats new levels during prediction like missing values.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpEncodeImpact$new(id = "encodeimpact", param_vals = list())
id :: character(1)
Identifier of resulting object, default "encodeimpact".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskSupervised is used as input and output during training and prediction.
The output is the input Task with all affected factor, character or
ordered parameters encoded.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
impact :: a named list
A list with an element for each affected feature:
For regression each element is a single column matrix of impact values for each level of that feature.
For classification, it is a list with an element for each feature level, which is a vector giving the impact of
this feature level on each outcome level.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
smoothing :: numeric(1)
A finite positive value used for smoothing. Mostly relevant for classification Tasks if
a factor does not coincide with a target factor level (and would otherwise give an infinite logit value).
Initialized to 1e-4.
impute_zero :: logical(1)
If TRUE, impute missing values as impact 0; otherwise the respective impact is coded as NA. Default FALSE.
Uses Laplace smoothing, mostly to avoid infinite values for classification Task.
Only fields inherited from PipeOp.
Only methods inherited PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") poe = po("encodeimpact") task = TaskClassif$new("task", data.table::data.table( x = factor(c("a", "a", "a", "b", "b")), y = factor(c("a", "a", "b", "b", "b"))), "x") poe$train(list(task))[[1]]$data() poe$statelibrary("mlr3") poe = po("encodeimpact") task = TaskClassif$new("task", data.table::data.table( x = factor(c("a", "a", "a", "b", "b")), y = factor(c("a", "a", "b", "b", "b"))), "x") poe$train(list(task))[[1]]$data() poe$state
Encodes columns of type factor, character and ordered.
PipeOpEncodeLmer converts factor levels of each factorial column to the
estimated coefficients of a simple random intercept model.
Models are fitted with the glmer function of the lme4 package and are
of the type target ~ 1 + (1 | factor).
If the task is a regression task, the numeric target
variable is used as dependent variable and the factor is used for grouping.
If the task is a classification task, the target variable is used as dependent variable
and the factor is used for grouping.
If the target variable is multiclass, for each level of the multiclass target variable,
binary "one vs. rest" models are fitted.
For training, multiple models can be estimated in a cross-validation scheme to ensure that the same factor level does not always result in identical values in the converted numerical feature. For prediction, a global model (which was fitted on all observations during training) is used for each factor. New factor levels are converted to the value of the intercept coefficient of the global model for prediction. NAs are ignored by the CPO.
Use the PipeOpTaskPreproc $affect_columns functionality to only encode a subset of
columns, or only encode columns of a certain type.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpEncodeLmer$new(id = "encodelmer", param_vals = list())
id :: character(1)
Identifier of resulting object, default "encodelmer".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskSupervised is used as input and output during training and prediction.
The output is the input Task with all affected factor, character or
ordered parameters encoded according to the method parameter.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
target_levels :: character
Levels of the target columns.
control :: a named list
List of coefficients learned via glmer.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
fast_optim :: logical(1)
If fast_optim is TRUE (default), a faster (up to 50 percent) optimizer from the nloptr package is used when
fitting the lmer models. This uses additional stopping criteria which can give suboptimal results.
Initialized to TRUE.
Uses the lme4::glmer. This is relatively inefficient for features with a large number of levels.
Only fields inherited from PipeOp.
Only methods inherited PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") poe = po("encodelmer") task = TaskClassif$new("task", data.table::data.table( x = factor(c("a", "a", "a", "b", "b")), y = factor(c("a", "a", "b", "b", "b"))), "x") poe$train(list(task))[[1]]$data() poe$statelibrary("mlr3") poe = po("encodelmer") task = TaskClassif$new("task", data.table::data.table( x = factor(c("a", "a", "a", "b", "b")), y = factor(c("a", "a", "b", "b", "b"))), "x") poe$train(list(task))[[1]]$data() poe$state
Encodes numeric and integer feature columns using piecewise lienar encoding. For details, see documentation of
PipeOpEncodePL or Gorishniy et al. (2022).
Bins are constructed by taking the quantiles of the respective feature column as bin boundaries. The first and
last boundaries are set to the minimum and maximum value of the feature, respectively. The number of bins can be
controlled with the numsplits hyperparameter.
Affected feature columns may contain NAs. These are ignored when calculating quantiles.
R6Class object inheriting from PipeOpEncodePL/PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpEncodePLQuantiles$new(id = "encodeplquantiles", param_vals = list())
id :: character(1)
Identifier of resulting object, default "encodeplquantiles".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric and integer columns encoded using piecewise
linear encoding with bins being derived from the quantiles of the respective original feature column.
The $state is a named list with the $state elements inherited from PipeOpEncodePL/PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
numsplits :: integer(1)
Number of bins to create. Initialized to 2.
type :: integer(1)
Method used to calculate sample quantiles. See help of stats::quantile. Default is 7.
This overloads the private$.get_bins() method of PipeOpEncodePL and uses the stats::quantile function
to derive the bins used for piecewise linear encoding.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpEncodePL/PipeOpTaskPreproc/PipeOp.
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
PipeOpEncodePL,
mlr_pipeops_encodepltree
library(mlr3) task = tsk("iris")$select(c("Petal.Width", "Petal.Length")) pop = po("encodeplquantiles") train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # Each feature was split into two encoded features using piecewise linear encoding train_out$head() # Prediction works the same as training, using the bins learned during training predict_out = pop$predict(list(task))[[1L]] predict_out$head() # Binning into three bins per feature # Using the nearest even order statistic for caluclating quantiles pop$param_set$set_values(numsplits = 4, type = 3) train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # Each feature was split into three encoded features using # piecewise linear encoding train_out$head()library(mlr3) task = tsk("iris")$select(c("Petal.Width", "Petal.Length")) pop = po("encodeplquantiles") train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # Each feature was split into two encoded features using piecewise linear encoding train_out$head() # Prediction works the same as training, using the bins learned during training predict_out = pop$predict(list(task))[[1L]] predict_out$head() # Binning into three bins per feature # Using the nearest even order statistic for caluclating quantiles pop$param_set$set_values(numsplits = 4, type = 3) train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # Each feature was split into three encoded features using # piecewise linear encoding train_out$head()
Encodes numeric and integer feature columns using piecewise lienar encoding. For details, see documentation of
PipeOpEncodePL or Gorishniy et al. (2022).
Bins are constructed by trainig one decision tree Learner per feature column, taking the target
column into account, and using decision boundaries as bin boundaries.
R6Class object inheriting from PipeOpEncodePL/PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpEncodePLTree$new(task_type, id = "encodepltree", param_vals = list())
task_type :: character(1)
The class of Task that should be accepted as input, given as a character(1). This is used to
construct the appropriate Learner to be used for obtaining the bins for piecewise linear
encoding. Supported options are "TaskClassif"for LearnerClassifRpart or
"TaskRegr"for LearnerRegrRpart.
id :: character(1)
Identifier of resulting object, default "encodeplquantiles".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif or TaskRegr is used as input and output during training and
prediction, depending on the task_type construction argument.
The output is the input Task with all affected numeric and integer columns encoded using piecewise
linear encoding with bins being derived from a decision tree Learner trained on the respective feature column.
The $state is a named list with the $state elements inherited from PipeOpEncodePL/PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as the parameters of
the Learner used for obtaining the bins for piecewise linear encoding.
This overloads the private$.get_bins() method of PipeOpEncodePL. To derive the bins for each feature, the
Task is split into smaller Tasks with only the target and respective feature as columns.
On these Tasks either a LearnerClassifRpart or
LearnerRegrRpart gets trained and the respective splits extracted as bin boundaries used
for piecewise linear encodings.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpEncodePL/PipeOpTaskPreproc/PipeOp.
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
PipeOpEncodePL,
mlr_pipeops_encodeplquantiles
library(mlr3) # For classification task task = tsk("iris")$select(c("Petal.Width", "Petal.Length")) pop = po("encodepltree", task_type = "TaskClassif") train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # Each feature was split into three encoded features using piecewise linear encoding train_out$head() # Prediction works the same as training, using the bins learned during training predict_out = pop$predict(list(task))[[1L]] predict_out$head() # Controlling behavior of the tree learner, here: setting minimum number of # observations per node for a split to be attempted pop$param_set$set_values(minsplit = 5) train_out = pop$train(list(task))[[1L]] # feature "hp" now gets split into five encoded features instead of three pop$state$bins train_out$head() # For regression task task = tsk("mtcars")$select(c("cyl", "hp")) pop = po("encodepltree", task_type = "TaskRegr") train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # First feature was split into three encoded features, # second into two, using piecewise linear encoding train_out$head()library(mlr3) # For classification task task = tsk("iris")$select(c("Petal.Width", "Petal.Length")) pop = po("encodepltree", task_type = "TaskClassif") train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # Each feature was split into three encoded features using piecewise linear encoding train_out$head() # Prediction works the same as training, using the bins learned during training predict_out = pop$predict(list(task))[[1L]] predict_out$head() # Controlling behavior of the tree learner, here: setting minimum number of # observations per node for a split to be attempted pop$param_set$set_values(minsplit = 5) train_out = pop$train(list(task))[[1L]] # feature "hp" now gets split into five encoded features instead of three pop$state$bins train_out$head() # For regression task task = tsk("mtcars")$select(c("cyl", "hp")) pop = po("encodepltree", task_type = "TaskRegr") train_out = pop$train(list(task))[[1L]] # Calculated bin boundaries per feature pop$state$bins # First feature was split into three encoded features, # second into two, using piecewise linear encoding train_out$head()
Aggregates features from all input tasks by cbind()ing them together into a single
Task.
DataBackend primary keys and Task targets have to be equal
across all Tasks. Only the target column(s) of the first Task
are kept.
If assert_targets_equal is TRUE then target column names are compared and an error is thrown
if they differ across inputs.
If input tasks share some feature names but these features are not identical an error is thrown. This check is performed by first comparing the features names and if duplicates are found, also the values of these possibly duplicated features. True duplicated features are only added a single time to the output task.
R6Class object inheriting from PipeOp.
PipeOpFeatureUnion$new(innum = 0, collect_multiplicity = FALSE, id = "featureunion", param_vals = list(), assert_targets_equal = TRUE)
innum :: numeric(1) | character
Determines the number of input channels.
If innum is 0 (default), a vararg input channel is created that can take an arbitrary number
of inputs. If innum is a character vector, the number of input channels is the length of
innum, and the columns of the result are prefixed with the values.
collect_multiplicity :: logical(1)
If TRUE, the input is a Multiplicity collecting channel. This means, a
Multiplicity input, instead of multiple normal inputs, is accepted and the members are aggregated. This requires innum to be 0.
Default is FALSE.
id :: character(1)
Identifier of the resulting object, default "featureunion".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
assert_targets_equal :: logical(1)
If assert_targets_equal is TRUE (Default), task target column names are checked for
agreement. Disagreeing target column names are usually a bug, so this should often be left at
the default.
PipeOpFeatureUnion has multiple input channels depending on the innum construction
argument, named "input1", "input2", ... if innum is nonzero; if innum is 0, there is
only one vararg input channel named "...". All input channels take a Task
both during training and prediction.
PipeOpFeatureUnion has one output channel named "output", producing a Task
both during training and prediction.
The output is a Task constructed by cbind()ing all features from all input
Tasks, both during training and prediction.
The $state is left empty (list()).
PipeOpFeatureUnion has no Parameters.
PipeOpFeatureUnion uses the Task $cbind() method to bind the input values
beyond the first input to the first Task. This means if the Tasks
are database-backed, all of them except the first will be fetched into R memory for this. This
behaviour may change in the future.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
library("mlr3") task1 = tsk("iris") gr = gunion(list( po("nop"), po("pca") )) %>>% po("featureunion") gr$train(task1) task2 = tsk("iris") task3 = tsk("iris") po = po("featureunion", innum = c("a", "b")) po$train(list(task2, task3))library("mlr3") task1 = tsk("iris") gr = gunion(list( po("nop"), po("pca") )) %>>% po("featureunion") gr$train(task1) task2 = tsk("iris") task3 = tsk("iris") po = po("featureunion", innum = c("a", "b")) po$train(list(task2, task3))
Feature filtering using a mlr3filters::Filter object, see the
mlr3filters package.
If a Filter can only operate on a subset of columns based on column type, then only these features are considered and filtered.
nfeat and frac will count for the features of the type that the Filter can operate on;
this means e.g. that setting nfeat to 0 will only remove features of the type that the Filter can work with.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpFilter$new(filter, id = filter$id, param_vals = list())
filter :: FilterFilter used for feature filtering.
This argument is always cloned; to access the Filter inside PipeOpFilter by-reference, use $filter.
id :: character(1)
Identifier of the resulting object, defaulting to the id of the Filter being used.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with features removed that were filtered out.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
scores :: named numeric
Scores calculated for all features of the training Task which are being used
as cutoff for feature filtering. If frac or nfeat is given, the underlying Filter may choose to not calculate scores for
all features that are given. This only includes features on which the Filter can operate; e.g.
if the Filter can only operate on numeric features, then scores for factorial features will not be given.
features :: character
Names of features that are being kept. Features of types that the Filter can not operate on are always being kept.
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as the parameters of the Filter
used by this object. Besides, parameters introduced are:
filter.nfeat :: numeric(1)
Number of features to select.
Mutually exclusive with frac, cutoff, and permuted.
filter.frac :: numeric(1)
Fraction of features to keep.
Mutually exclusive with nfeat, cutoff, and permuted.
filter.cutoff :: numeric(1)
Minimum value of filter heuristic for which to keep features.
Mutually exclusive with nfeat, frac, and permuted.
filter.permuted :: integer(1)
If this parameter is set, a random permutation of each feature is added to the task before
applying the filter. All features selected before the permuted-th permuted features is selected
are kept. This is similar to the approach in Wu (2007) and Thomas (2017).
Mutually exclusive with nfeat, frac, and cutoff.
Note that at least one of filter.nfeat, filter.frac, filter.cutoff, and filter.permuted must be given.
This does not use the $.select_cols feature of PipeOpTaskPreproc to select only features compatible with the Filter;
instead the whole Task is used by private$.get_state() and subset internally.
Fields inherited from PipeOp, as well as:
filter :: FilterFilter that is being used for feature filtering. Do not use this slot to get to the feature filtering scores
after training; instead, use $state$scores. Read-only.
Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
Wu Y, Boos DD, Stefanski LA (2007). “Controlling Variable Selection by the Addition of Pseudovariables.” Journal of the American Statistical Association, 102(477), 235–243. doi:10.1198/016214506000000843.
Thomas J, Hepp T, Mayr A, Bischl B (2017). “Probing for Sparse and Fast Variable Selection with Model-Based Boosting.” Computational and Mathematical Methods in Medicine, 2017, 1–8. doi:10.1155/2017/1421409.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") library("mlr3filters") # setup PipeOpFilter to keep the 5 most important # features of the spam task w.r.t. their AUC task = tsk("spam") filter = flt("auc") po = po("filter", filter = filter) po$param_set po$param_set$values$filter.nfeat = 5 # filter the task filtered_task = po$train(list(task))[[1]] # filtered task + extracted AUC scores filtered_task$feature_names head(po$state$scores, 10) # feature selection embedded in a 3-fold cross validation # keep 30% of features based on their AUC score task = tsk("spam") gr = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>% po("learner", lrn("classif.rpart")) learner = GraphLearner$new(gr) rr = resample(task, learner, rsmp("holdout"), store_models = TRUE) rr$learners[[1]]$model$auc$scoreslibrary("mlr3") library("mlr3filters") # setup PipeOpFilter to keep the 5 most important # features of the spam task w.r.t. their AUC task = tsk("spam") filter = flt("auc") po = po("filter", filter = filter) po$param_set po$param_set$values$filter.nfeat = 5 # filter the task filtered_task = po$train(list(task))[[1]] # filtered task + extracted AUC scores filtered_task$feature_names head(po$state$scores, 10) # feature selection embedded in a 3-fold cross validation # keep 30% of features based on their AUC score task = tsk("spam") gr = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>% po("learner", lrn("classif.rpart")) learner = GraphLearner$new(gr) rr = resample(task, learner, rsmp("holdout"), store_models = TRUE) rr$learners[[1]]$model$auc$scores
Fixes factors of type factor, ordered: Makes sure the factor levels
during prediction are the same as during training; possibly dropping empty
training factor levels before.
Note this may introduce missing values during prediction if unseen factor levels are found.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpFixFactors$new(id = "fixfactors", param_vals = list())
id :: character(1)
Identifier of resulting object, default "fixfactors".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected factor and ordered feature levels fixed.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
levels :: named list of character
List of factor levels of each affected factor or ordered feature that will be fixed.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
droplevels :: logical(1)
Whether to drop empty factor levels of the training task. Default TRUE
Changes factor levels of columns and attaches them with a new data.table backend and the virtual cbind() backend.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Reduced task with no entries for the installment_rate < 20 is defined task = tsk("german_credit") rows = task$row_ids[task$data()[, installment_rate != "< 20"]] reduced_task = task$clone(deep = TRUE)$filter(rows) levels(reduced_task$data()$installment_rate) # PipeOp is trained on the reduced task po = po("fixfactors") processed_task = preproc(reduced_task, po) levels(processed_task$data()$installment_rate) summary(processed_task$data()$installment_rate) predicted_task = preproc(task, po, predict = TRUE) # Predictions are made on the task without any missing data levels(predicted_task$data()$installment_rate) summary(predicted_task$data()$installment_rate)library("mlr3") # Reduced task with no entries for the installment_rate < 20 is defined task = tsk("german_credit") rows = task$row_ids[task$data()[, installment_rate != "< 20"]] reduced_task = task$clone(deep = TRUE)$filter(rows) levels(reduced_task$data()$installment_rate) # PipeOp is trained on the reduced task po = po("fixfactors") processed_task = preproc(reduced_task, po) levels(processed_task$data()$installment_rate) summary(processed_task$data()$installment_rate) predicted_task = preproc(task, po, predict = TRUE) # Predictions are made on the task without any missing data levels(predicted_task$data()$installment_rate) summary(predicted_task$data()$installment_rate)
Splits numeric features into equally spaced bins.
See graphics::hist() for details.
Values that fall out of the training data range during prediction are
binned with the lowest / highest bin respectively.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpHistBin$new(id = "histbin", param_vals = list())
id :: character(1)
Identifier of resulting object, default "histbin".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their binned versions.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
breaks :: list
List of intervals representing the bins for each numeric feature.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
breaks :: character(1) | numeric | function
Either a character(1) string naming an algorithm to compute the number of cells,
a numeric(1) giving the number of breaks for the histogram,
a vector numeric giving the breakpoints between the histogram cells, or
a function to compute the vector of breakpoints or to compute the number
of cells. Default is algorithm "Sturges" (see grDevices::nclass.Sturges()).
For details see hist().
Uses the graphics::hist function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("histbin") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("histbin") task$data() pop$train(list(task))[[1]]$data() pop$state
Extracts statistically independent components from data. Only affects numerical features. See fastICA::fastICA for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpICA$new(id = "ica", param_vals = list())
id :: character(1)
Identifier of resulting object, default "ica".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric parameters replaced by independent components.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as the elements of the function fastICA::fastICA(),
with the exception of the $X and $S slots. These are in particular:
K :: matrix
Matrix that projects data onto the first n.comp principal components.
See fastICA().
W :: matrix
Estimated un-mixing matrix. See fastICA().
A :: matrix
Estimated mixing matrix. See fastICA().
center :: numeric
The mean of each numeric feature during training.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as the following parameters
based on fastICA():
n.comp :: numeric(1)
Number of components to extract. Default is NULL, which sets it
to the number of available numeric columns.
alg.typ:: character(1)
Algorithm type. One of "parallel" (default) or "deflation".
fun :: character(1)
One of "logcosh" (default) or "exp".
alpha :: numeric(1)
In range [1, 2], Used for negentropy calculation when fun is "logcosh".
Default is 1.0.
method :: character(1)
Internal calculation method. "C" (default) or "R".
See fastICA().
row.norm :: logical(1)
Logical value indicating whether rows should be standardized beforehand.
Default is FALSE.
maxit :: numeric(1)
Maximum number of iterations. Default is 200.
tol :: numeric(1)
Tolerance for convergence, default is 1e-4.
verbose logical(1)
Logical value indicating the level of output during the run of the algorithm.
Default is FALSE.
w.init:: matrix
Initial un-mixing matrix. See fastICA().
Default is NULL.
Uses the fastICA() function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("ica") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("ica") task$data() pop$train(list(task))[[1]]$data() pop$state
Impute features by a constant value.
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeConstant$new(id = "imputeconstant", param_vals = list())
id :: character(1)
Identifier of resulting object, default "imputeconstant".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected features missing values imputed by
the value of the constant parameter.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model contains the value of the constant parameter that is used for imputation.
The parameters are the parameters inherited from PipeOpImpute, as well as:
constant :: atomic(1)
The constant value that should be used for the imputation, atomic vector of length 1. The atomic mode must match
the type of the features that will be selected by the affect_columns parameter and this will be checked during
imputation. This is a required hyperparameter and needs to be set by the user.
check_levels :: logical(1)
Should be checked whether the constant value is a valid level of factorial features (i.e., it already is a
level)? Raises an error if unsuccessful. This check is only performed for factorial features (i.e., factor,
ordered; skipped for character). Initialized to TRUE.
Note that empty factor levels can be a problem for many Learners. Thus, PipeOpImputeOOR is
the preferred choice for creating new levels, since it is designed to impute out-of-range values and offers a more
explicit control for handling potentially problematic behavior.
The constructor is called with empty_level_control set to "always", to allow the creation of a new empty level
for factor and ordered (but not character) features during training, if constant is not an already existing
level and check_levels is set to FALSE. This has no impact if check_levels is TRUE, since in that case an
error would be raised before imputation.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpImpute/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
library("mlr3") task = tsk("pima") task$missings() # impute missing values of the numeric feature "glucose" by the constant value -999 po = po("imputeconstant", param_vals = list( constant = -999, affect_columns = selector_name("glucose")) ) new_task = po$train(list(task = task))[[1]] new_task$missings() new_task$data(cols = "glucose")[[1]]library("mlr3") task = tsk("pima") task$missings() # impute missing values of the numeric feature "glucose" by the constant value -999 po = po("imputeconstant", param_vals = list( constant = -999, affect_columns = selector_name("glucose")) ) new_task = po$train(list(task = task))[[1]] new_task$missings() new_task$data(cols = "glucose")[[1]]
Impute numeric, integer, POSIXct or Date features by histogram.
During training, a histogram is fitted on each column using R's hist() function.
The fitted histogram is then sampled from for imputation. Sampling happens in a two-step process:
First, a bin is sampled from the histogram, then a value is sampled uniformly from the bin.
This is an approximation to sampling from the empirical training data distribution (i.e. sampling
from training data with replacement), but is much more memory efficient for large datasets, since the $state
does not need to save the training data.
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeHist$new(id = "imputehist", param_vals = list())
id :: character(1)
Identifier of resulting object, default "imputehist".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric, integer, POSIXct or Date features missing values imputed by (column-wise) histogram; see Description for details.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of lists containing elements $counts and $breaks.
The parameters are the parameters inherited from PipeOpImpute.
Uses the graphics::hist() function. Features that are entirely NA are imputed as 0.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpImpute/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
library("mlr3") task = tsk("pima") task$missings() po = po("imputehist") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$modellibrary("mlr3") task = tsk("pima") task$missings() po = po("imputehist") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$model
Impute features by fitting a Learner for each feature.
Uses the features indicated by the context_columns parameter as features to train the imputation Learner.
Note this parameter is part of the PipeOpImpute base class and explained there.
Additionally, only features supported by the learner can be imputed; i.e. learners of type
regr can only impute features of type integer, numeric, POSIXct and Date, while classif can impute
features of type factor, ordered and logical.
The Learner used for imputation is trained on all context_columns; if these contain missing values,
the Learner typically either needs to be able to handle missing values itself, or needs to do its
own imputation (see examples).
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeLearner$new(learner, id = NULL, param_vals = list())
id :: character(1)
Identifier of resulting object, default "impute.", followed by the id of the Learner.
learner :: Learner | character(1)
Learner to wrap, or a string identifying a Learner in the mlr3::mlr_learners Dictionary.
The Learner usually needs to be able to handle missing values, i.e. have the missings property, unless care is taken
that context_columns do not contain missings; see examples.
This argument is always cloned; to access the Learner inside PipeOpImputeLearner by-reference, use $learner.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with missing values from all affected features imputed by the trained model.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$models is a named list of models created by the Learner's $.train() function
for each column. If a column consists of missing values only during training, the model is 0 or the levels of the
feature; these are used for sampling during prediction.
This state is given the class "pipeop_impute_learner_state".
The parameters are the parameters inherited from PipeOpImpute, in addition to the parameters of the Learner
used for imputation.
Uses the $train and $predict functions of the provided learner. Features that are entirely NA are imputed as 0
or randomly sampled from available (factor / logical) levels.
The Learner does not necessarily need to handle missing values in cases
where context_columns is chosen well (or there is only one column with missing values present).
Fields inherited from PipeOpTaskPreproc/PipeOp, as well as:
learner_models :: list of Learner | NULLLearner that is being wrapped. This list is named by features for which a Learner was fitted, and
contains the same Learner, but with different respective models for each feature. If this PipeOp is not trained,
this is an empty list. For features that were entirely NA during training, the list contains NULL elements.
Only methods inherited from PipeOpImpute/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
library("mlr3") task = tsk("pima") task$missings() po = po("imputelearner", lrn("regr.rpart")) new_task = po$train(list(task = task))[[1]] new_task$missings() # '$state' of the "regr.rpart" Learner, trained to predict the 'mass' column: po$state$model$mass library("mlr3learners") # To use the "regr.lm" Learner, prefix it with its own imputation method! # The "imputehist" PipeOp is used to train "regr.lm"; predictions of this # trained Learner are then used to impute the missing values in the Task. po = po("imputelearner", po("imputehist") %>>% lrn("regr.lm") ) new_task = po$train(list(task = task))[[1]] new_task$missings()library("mlr3") task = tsk("pima") task$missings() po = po("imputelearner", lrn("regr.rpart")) new_task = po$train(list(task = task))[[1]] new_task$missings() # '$state' of the "regr.rpart" Learner, trained to predict the 'mass' column: po$state$model$mass library("mlr3learners") # To use the "regr.lm" Learner, prefix it with its own imputation method! # The "imputehist" PipeOp is used to train "regr.lm"; predictions of this # trained Learner are then used to impute the missing values in the Task. po = po("imputelearner", po("imputehist") %>>% lrn("regr.lm") ) new_task = po$train(list(task = task))[[1]] new_task$missings()
Impute numeric, integer, POSIXct or Date features by their mean.
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeMean$new(id = "imputemean", param_vals = list())
id :: character(1)
Identifier of resulting object, default "imputemean".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric, integer, POSIXct and Date features missing values imputed by (column-wise) mean.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of either numeric(1), integer(1), POSIXct(1) or Date(1) indicating the mean of the respective feature.
The parameters are the parameters inherited from PipeOpImpute.
Uses the mean() function. Features that are entirely NA are imputed as 0.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpImpute/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
library("mlr3") task = tsk("pima") task$missings() po = po("imputemean") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$modellibrary("mlr3") task = tsk("pima") task$missings() po = po("imputemean") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$model
Impute numerical, integer, POSIXct or Date features by their median.
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeMedian$new(id = "imputemedian", param_vals = list())
id :: character(1)
Identifier of resulting object, default "imputemedian".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric, integer, POSIXct and Date features missing values imputed by (column-wise) median.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of numeric(1), integer(1), POSIXct(1) or Date(1) indicating the median of the respective feature.
The parameters are the parameters inherited from PipeOpImpute.
Uses the stats::median() function. Features that are entirely NA are imputed as 0.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpImpute/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
library("mlr3") task = tsk("pima") task$missings() po = po("imputemedian") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$modellibrary("mlr3") task = tsk("pima") task$missings() po = po("imputemedian") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$model
Impute features by their mode. Supports factors, logical, numerical, POSIXct and Date features. If multiple modes are present then imputed values are sampled randomly from them.
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeMode$new(id = "imputemode", param_vals = list())
id :: character(1)
Identifier of resulting object, default "imputemode".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected features missing values imputed by (column-wise) mode.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of a vector of length one of the type of the feature, indicating the mode of the respective feature.
The parameters are the parameters inherited from PipeOpImpute.
Features that are entirely NA are imputed as
the following: For factor or ordered, random levels are sampled uniformly at random.
For logicals, TRUE or FALSE are sampled uniformly at random.
Numerics and integers are imputed as 0.
Note that every random imputation is drawn independently, so different values may be imputed if multiple values are missing.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpImpute/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
library("mlr3") task = tsk("pima") task$missings() po = po("imputemode") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$modellibrary("mlr3") task = tsk("pima") task$missings() po = po("imputemode") new_task = po$train(list(task = task))[[1]] new_task$missings() po$state$model
Impute factorial features by adding a new level ".MISSING".
Impute numeric, integer, POSIXct or Date features by constant values shifted below the minimum or above the maximum by
using or
.
This type of imputation is especially sensible in the context of tree-based methods, see also Ding & Simonoff (2010).
Learners expect input Tasks to have the same factor (or ordered) levels during
training as well as prediction. This PipeOp modifies the levels of factor and ordered features,
and since it may occur that a factor or ordered feature contains missing values only during prediction, but not
during training, the output Task could also have different levels during the two stages.
To avoid problems with the Learners' expectation, controlling the PipeOps' handling of this edge-case is necessary.
For this, use the create_empty_level hyperparameter inherited from PipeOpImpute.
If create_empty_level is set to TRUE, then an unseen level ".MISSING" is added to the feature during
training and missing values are imputed as ".MISSING" during prediction.
However, empty factor levels during training can be a problem for many Learners.
If create_empty_level is set to FALSE, then no empty level is introduced during training, but columns that
have missing values only during prediction will not be imputed. This is why it may still be necessary to use
po("imputesample", affect_columns = selector_type(types = c("factor", "ordered")))
(or another imputation method) after this imputation method.
Note that setting create_empty_level to FALSE is the same as setting it to TRUE and using PipeOpFixFactors
after this PipeOp.
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeOOR$new(id = "imputeoor", param_vals = list())
id :: character(1)
Identifier of resulting object, default "imputeoor".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected features having missing values imputed as described above.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model contains either ".MISSING" used for character and factor (also
ordered) features or numeric(1) indicating the constant value used for imputation of
integer, numeric, POSIXct or Date features.
The parameters are the parameters inherited from PipeOpImpute, as well as:
min :: logical(1)
Should integer and numeric features be shifted below the minimum? Initialized to TRUE. If FALSE
they are shifted above the maximum. See also the description above.
offset :: numeric(1)
Numerical non-negative offset as used in the description above for integer, numeric, POSIXCT and Date.
features. Initialized to 1.
multiplier :: numeric(1)
Numerical non-negative multiplier as used in the description above for integer, numeric, POSIXct and Date.
features. Initialized to 1.
Adds an explicit new level() to factor and ordered features, but not to character features.
For integer and numeric features uses the min, max, diff and range functions.
integer and numeric features that are entirely NA are imputed as 0. factor and ordered features that are
entirely NA are imputed as ".MISSING". For POSIXct and Date features the value 0 is transformed into the respective data type.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpImpute/PipeOp.
Ding Y, Simonoff JS (2010). “An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data.” Journal of Machine Learning Research, 11(6), 131-170. https://jmlr.org/papers/v11/ding10a.html.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputesample
library("mlr3") set.seed(2409) data = tsk("pima")$data() data$y = factor(c(NA, sample(letters, size = 766, replace = TRUE), NA)) data$z = ordered(c(NA, sample(1:10, size = 767, replace = TRUE))) task = TaskClassif$new("task", backend = data, target = "diabetes") task$missings() po = po("imputeoor") new_task = po$train(list(task = task))[[1]] new_task$missings() new_task$data() # recommended use when missing values are expected during prediction on # factor columns that had no missing values during training gr = po("imputeoor", create_empty_level = FALSE) %>>% po("imputesample", affect_columns = selector_type(types = c("factor", "ordered"))) t1 = as_task_classif(data.frame(l = as.ordered(letters[1:3]), t = letters[1:3]), target = "t") t2 = as_task_classif(data.frame(l = as.ordered(c("a", NA, NA)), t = letters[1:3]), target = "t") gr$train(t1)[[1]]$data() # missing values during prediction are sampled randomly gr$predict(t2)[[1]]$data()library("mlr3") set.seed(2409) data = tsk("pima")$data() data$y = factor(c(NA, sample(letters, size = 766, replace = TRUE), NA)) data$z = ordered(c(NA, sample(1:10, size = 767, replace = TRUE))) task = TaskClassif$new("task", backend = data, target = "diabetes") task$missings() po = po("imputeoor") new_task = po$train(list(task = task))[[1]] new_task$missings() new_task$data() # recommended use when missing values are expected during prediction on # factor columns that had no missing values during training gr = po("imputeoor", create_empty_level = FALSE) %>>% po("imputesample", affect_columns = selector_type(types = c("factor", "ordered"))) t1 = as_task_classif(data.frame(l = as.ordered(letters[1:3]), t = letters[1:3]), target = "t") t2 = as_task_classif(data.frame(l = as.ordered(c("a", NA, NA)), t = letters[1:3]), target = "t") gr$train(t1)[[1]]$data() # missing values during prediction are sampled randomly gr$predict(t2)[[1]]$data()
Impute features by sampling from non-missing training data.
R6Class object inheriting from PipeOpImpute/PipeOp.
PipeOpImputeSample$new(id = "imputesample", param_vals = list())
id :: character(1)
Identifier of resulting object, default "imputesample".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected numeric features missing values imputed by values sampled (column-wise) from training data.
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model is a named list of training data with missings removed.
The parameters are the parameters inherited from PipeOpImpute.
Uses the sample() function. Features that are entirely NA are imputed as
the following: For factor or ordered, random levels are sampled uniformly at random.
For logical, TRUE or FALSE are sampled uniformly at random.
numeric and integer are imputed as 0.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpImpute/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor
library("mlr3") task = tsk("pima") task$missings() po = po("imputesample") new_task = po$train(list(task = task))[[1]] new_task$missings()library("mlr3") task = tsk("pima") task$missings() po = po("imputesample") new_task = po$train(list(task = task))[[1]] new_task$missings()
PipeOpInfo prints its input to the console or a logger in a customizable way.
Users can define how specific object classes should be displayed using custom printer functions.
R6Class object inheriting from PipeOp
PipeOpInfo$new(id = "info", collect_multiplicity = FALSE, log_target = "lgr::mlr3/mlr3pipelines::info")
id :: character(1)
Identifier of resulting object, default "info"
printer :: list
Optional mapping from object classes to printer functions. Custom functions override default printer-functions.
collect_multiplicity :: logical(1)
If TRUE, the input is a Multiplicity collecting channel. Multiplicity input/output is accepted and the members are aggregated.
log_target :: character(1)
Specifies how the input object is printed to the console. By default it is
directed to a logger, whose address can be customized using the form
<output>::<argument1>::<argument2>. Otherwise it can be printed
as "message", "warning" or "cat". When set to "none", no customized
information about the object will be printed.
PipeOpInfo has one input channel called "input", it can take any type of input (*).
PipeOpInfo has one output channel called "output", it can take any type of output (*).
The $state is left empty (list()).
PipeOpInfo forwards its input unchanged, but prints information about it
depending on the printer and log_target settings.
Fields inherited from PipeOp, as well as:
printer :: list
Mapping of object classes to printer functions. Includes printer-specifications for Task, Prediction, NULL. Otherwise object is printed as is.
log_target :: character(1)
Specifies current output target.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") poinfo = po("info") poinfo$train(list(tsk("mtcars"))) poinfo$predict(list(tsk("mtcars"))) # Specify customized console output for Task-objects poinfo = po("info", log_target = "cat", printer = list(Task = function(x) list(head_data = head(x$data()), nrow = nrow(x$data()))) ) poinfo$train(list(tsk("iris"))) poinfo$predict(list(tsk("iris")))library("mlr3") poinfo = po("info") poinfo$train(list(tsk("mtcars"))) poinfo$predict(list(tsk("mtcars"))) # Specify customized console output for Task-objects poinfo = po("info", log_target = "cat", printer = list(Task = function(x) list(head_data = head(x$data()), nrow = nrow(x$data()))) ) poinfo$train(list(tsk("iris"))) poinfo$predict(list(tsk("iris")))
Reduces the dimensionality of the data of the input Task using the
Isomap algorithm from the dimRed-package, preserving geodesic distances
between observations. The number of neighbors (knn) and embedding
dimensions (ndim) control the transformation.
R6Class object inheriting from PipeOpTaskPreproc
PipeOpIsomap$new(id = "isomap", ...)
id :: character(1)
Identifier of resulting object, default "isomap"
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with the data projected to the lower-dimensional space.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
embed_result :: dimRedResult
The resulting object after applying the "Isomap"-method from the dimRed-package to the data.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
knn :: integer(1)
The number of nearest neighbors in the graph.
Initialized to 50.
ndim :: integer(1)
The number of embedding dimensions.
Initialized to 2.
get_geod :: logical(1)
Determines whether the distance matrix should be kept in the $state.
Initialized to FALSE.
.mute :: character
A character vector of elements to mute during training (e.g. c("message", "output")).
Initialized to NULL.
Applies the Isomap embedding from the dimRed-package.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") po = po("isomap", .mute = c("message", "output")) po$train(list(tsk("iris")))[[1]]$data() po$predict(list(tsk("iris")))[[1]]$data()library("mlr3") po = po("isomap", .mute = c("message", "output")) po$train(list(tsk("iris")))[[1]]$data() po$predict(list(tsk("iris")))[[1]]$data()
Extracts kernel principal components from data. Only affects numerical features. See kernlab::kpca for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpKernelPCA$new(id = "kernelpca", param_vals = list())
id :: character(1)
Identifier of resulting object, default "kernelpca".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric parameters replaced by their principal components.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the returned S4 object of the function kernlab::kpca().
The @rotated slot of the "kpca" object is overwritten with an empty matrix for memory efficiency.
The slots of the S4 object can be accessed by accessor function. See kernlab::kpca.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
kernel :: character(1)
The standard deviations of the principal components. See kpca().
kpar :: list
List of hyper-parameters that are used with the kernel function. See kpca().
features :: numeric(1)
Number of principal components to return. Default 0 means that all
principal components are returned. See kpca().
th :: numeric(1)
The value of eigenvalue under which principal components are ignored. Default is 0.0001. See kpca().
na.action :: function
Function to specify NA action. Default is na.omit. See kpca().
Uses the kpca() function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("kernelpca", features = 3) # only keep top 3 components task$data() pop$train(list(task))[[1]]$data()library("mlr3") task = tsk("iris") pop = po("kernelpca", features = 3) # only keep top 3 components task$data() pop$train(list(task))[[1]]$data()
Wraps an mlr3::Learner into a PipeOp.
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
Using PipeOpLearner, it is possible to embed mlr3::Learners into Graphs, which themselves can be
turned into Learners using GraphLearner. This way, preprocessing and ensemble methods can be included
into a machine learning pipeline which then can be handled as singular object for resampling, benchmarking
and tuning.
R6Class object inheriting from PipeOp.
PipeOpLearner$new(learner, id = NULL, param_vals = list())
learner :: Learner | character(1)Learner to wrap, or a string identifying a Learner in the mlr3::mlr_learners Dictionary.
This argument is always cloned; to access the Learner inside PipeOpLearner by-reference, use $learner.
id :: character(1)
Identifier of the resulting object, internally defaulting to the id of the Learner being wrapped.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpLearner has one input channel named "input", taking a Task specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearner has one output channel named "output", producing NULL during training and a Prediction subclass
during prediction; this subclass is specific to the Learner type given to learner during construction.
The output during prediction is the Prediction on the prediction input data, produced by the Learner
trained on the training input data.
The $state is set to the $state slot of the Learner object. It is a named list with members:
model :: any
Model created by the Learner's $.train() function.
train_log :: data.table with columns class (character), msg (character)
Errors logged during training.
train_time :: numeric(1)
Training time, in seconds.
predict_log :: NULL | data.table with columns class (character), msg (character)
Errors logged during prediction.
predict_time :: NULL | numeric(1)
Prediction time, in seconds.
The parameters are exactly the parameters of the Learner wrapped by this object.
The $state is currently not updated by prediction, so the $state$predict_log and $state$predict_time will always be NULL.
Fields inherited from PipeOp, as well as:
learner_model :: LearnerLearner that is being wrapped. This learner contains the model if the PipeOp is trained. Read-only.
validate :: "predefined" or NULL
This field can only be set for Learners that have the "validation" property.
Setting the field to "predefined" means that the wrapped Learner will use the internal validation task,
otherwise it will be ignored.
Note that specifying how the validation data is created is possible via the $validate field of the GraphLearner.
For each PipeOp it is then only possible to either use it ("predefined") or not use it (NULL).
Also see set_validate.GraphLearner for more information.
internal_tuned_values :: named list() or NULL
The internally tuned values if the wrapped Learner supports internal tuning, NULL otherwise.
internal_valid_scores :: named list() or NULL
The internal validation scores if the wrapped Learner supports internal validation, NULL otherwise.
Methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Meta PipeOps:
mlr_pipeops_learner_cv,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles
library("mlr3") task = tsk("iris") learner = lrn("classif.rpart", cp = 0.1) lrn_po = mlr_pipeops$get("learner", learner) lrn_po$train(list(task)) lrn_po$predict(list(task))library("mlr3") task = tsk("iris") learner = lrn("classif.rpart", cp = 0.1) lrn_po = mlr_pipeops$get("learner", learner) lrn_po$train(list(task)) lrn_po$predict(list(task))
Wraps an mlr3::Learner into a PipeOp.
Returns cross-validated predictions during training as a Task and stores a model of the
Learner trained on the whole data in $state. This is used to create a similar
Task during prediction.
Optionally, the fitted models obtained during the resampling phase can be reused for prediction by averaging
their predictions, avoiding the need for an additional fit on the complete training data.
The Task gets features depending on the capsuled Learner's
$predict_type. If the Learner's $predict.type is "response", a feature <ID>.response is created,
for $predict.type "prob" the <ID>.prob.<CLASS> features are created, and for $predict.type "se" the new columns
are <ID>.response and <ID>.se. <ID> denotes the $id of the PipeOpLearnerCV object.
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
PipeOpLearnerCV can be used to create "stacking" or "super learning" Graphs that use the output of one Learner
as feature for another Learner. Because the PipeOpLearnerCV erases the original input features, it is often
useful to use PipeOpFeatureUnion to bind the prediction Task to the original input Task.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpLearnerCV$new(learner, id = NULL, param_vals = list())
learner :: Learner Learner to use for cross validation / prediction, or a string identifying a
Learner in the mlr3::mlr_learners Dictionary.
This argument is always cloned; to access the Learner inside PipeOpLearnerCV by-reference, use $learner.
id :: character(1)
Identifier of the resulting object, internally defaulting to the id of the Learner being wrapped.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpLearnerCV has one input channel named "input", taking a Task specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearnerCV has one output channel named "output", producing a Task specific to the Learner
type given to learner during construction; both during training and prediction.
The output is a task with the same target as the input task, with features replaced by predictions made by the Learner.
During training, this prediction is the out-of-sample prediction made by resample, during prediction, this is the
ordinary prediction made on the data by a Learner trained on the training phase data.
The $state is set to the $state slot of the Learner object, together with the $state elements inherited from the
PipeOpTaskPreproc. It is a named list with the inherited members, as well as:
model :: any
Model created by the Learner's $.train() function.
train_log :: data.table with columns class (character), msg (character)
Errors logged during training.
train_time :: numeric(1)
Training time, in seconds.
predict_log :: NULL | data.table with columns class (character), msg (character)
Errors logged during prediction.
predict_time :: NULL | numeric(1)
Prediction time, in seconds.
predict_method :: character(1)"full" when prediction uses a learner fitted on all training data, "cv_ensemble" when predictions are averaged over
models trained on resampling folds.
cv_model_states :: NULL | list
Present for predict_method = "cv_ensemble". Contains the states of the learners trained on each resampling fold.
This state is given the class "pipeop_learner_cv_state".
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as the parameters of the Learner wrapped by this object.
Besides that, parameters introduced are:
resampling.method :: character(1)
Which resampling method do we want to use. Currently only supports "cv" and "insample". "insample" generates
predictions with the model trained on all training data.
resampling.folds :: numeric(1)
Number of cross validation folds. Initialized to 3. Only used for resampling.method = "cv".
resampling.keep_response :: logical(1)
Only effective during "prob" prediction: Whether to keep response values, if available. Initialized to FALSE.
resampling.predict_method :: character(1)
Controls how predictions are produced after training. "full" (default) fits the wrapped learner on the entire training data.
"cv_ensemble" reuses the models fitted during resampling and averages their predictions. This option currently supports
classification and regression learners together with resampling.method = "cv".
resampling.prob_aggr :: character(1)
Probability aggregation used when "cv_ensemble" predictions are produced for classification learners that can emit class probabilities.
Shares the semantics with PipeOpClassifAvg: "mean" (linear opinion pool, default) and "log" (log opinion pool / product of experts).
Only present for learners that support "prob" predictions.
resampling.prob_aggr_eps :: numeric(1)
Stabilization constant applied when resampling.prob_aggr = "log" to clamp probabilities before taking logarithms.
Defaults to 1e-12. Only present for learners that support "prob" predictions.
resampling.se_aggr :: character(1)
Standard error aggregation used when "cv_ensemble" predictions are produced for regression learners with predict_type
containing "se". Shares the definitions with PipeOpRegrAvg, i.e. "predictive", "mean", "within", "between", "none".
Initialized to "predictive" (within-fold variance plus between-fold disagreement) when constructed with a Learner that has predict_type = "se";
otherwise to "none".
Only present for learners that support "se" predictions.
resampling.se_aggr_rho :: numeric(1)
Equicorrelation parameter for resampling.se_aggr = "mean", interpreted as in PipeOpRegrAvg. Ignored otherwise.
Defaults to 0 when resampling.se_aggr = "mean".
Only present for learners that support "se" predictions.
The $state is currently not updated by prediction, so the $state$predict_log and $state$predict_time will always be NULL.
Fields inherited from PipeOp, as well as:
learner_model :: LearnerLearner that is being wrapped. This learner contains the model if the PipeOp is trained. Read-only.
Methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other Meta PipeOps:
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles
library("mlr3") task = tsk("iris") learner = lrn("classif.rpart") lrncv_po = po("learner_cv", learner) lrncv_po$learner$predict_type = "response" nop = mlr_pipeops$get("nop") graph = gunion(list( lrncv_po, nop )) %>>% po("featureunion") graph$train(task) graph$pipeops$classif.rpart$learner$predict_type = "prob" graph$pipeops$classif.rpart$param_set$values$resampling.predict_method = "cv_ensemble" graph$train(task)library("mlr3") task = tsk("iris") learner = lrn("classif.rpart") lrncv_po = po("learner_cv", learner) lrncv_po$learner$predict_type = "response" nop = mlr_pipeops$get("nop") graph = gunion(list( lrncv_po, nop )) %>>% po("featureunion") graph$train(task) graph$pipeops$classif.rpart$learner$predict_type = "prob" graph$pipeops$classif.rpart$param_set$values$resampling.predict_method = "cv_ensemble" graph$train(task)
Wraps an mlr3::Learner into a PipeOp.
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
Using PipeOpLearnerPICVPlus, it is possible to embed a mlr3::Learner into a Graph.
PipeOpLearnerPICVPlus can then be used to perform cross validation plus (or jackknife plus).
During training, PipeOpLearnerPICVPlus performs cross validation on the training data.
During prediction, the models from the training stage are used to construct predictive confidence intervals for the prediction data based on
out-of-fold residuals and out-of-fold predictions.
R6Class object inheriting from PipeOp.
PipeOpLearnerPICVPlus$new(learner, id = NULL, param_vals = list())
learner :: LearnerRegr
LearnerRegr to use for the cross validation models in the Cross Validation Plus method.
This argument is always cloned; to access the Learner inside PipeOpLearnerPICVPlus by-reference, use $learner.
id :: character(1)
Identifier of the resulting object, internally defaulting to the id of the Learner being wrapped.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction.
Default is list().
PipeOpLearnerPICVPlus has one input channel named "input", taking a Task specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearnerPICVPlus has one output channel named "output", producing NULL during training and a PredictionRegr
during prediction.
The output during prediction is a PredictionRegr with predict_type quantiles on the prediction input data.
The alpha and 1 - alpha quantiles are the quantiles of the prediction interval produced by the cross validation plus method.
The response is the median of the prediction of all cross validation models on the prediction data.
The $state is a named list with members:
cv_model_states :: list
List of the state of each cross validation model created by the Learner's $.train() function during resampling with method "cv".
residuals :: data.tabledata.table with columns fold and residual. Lists the Regression residuals for each observation and cross validation fold.
This state is given the class "pipeop_learner_cv_state".
The parameters of the Learner wrapped by this object, as well as:
folds :: numeric(1)
Number of cross validation folds. Initialized to 3.
alpha :: numeric(1)
Quantile to use for the cross validation plus prediction intervals. Initialized to 0.05.
The $state is updated during training.
Fields inherited from PipeOp, as well as:
learner_model :: Learner or list
If the PipeOpLearnerPICVPlus has been trained, this is a list containing the Learners of the cross validation models.
Otherwise, this contains the Learner that is being wrapped.
Read-only.
predict_type
Predict type of the PipeOpLearnerPICVPlus, which is always "response" "quantiles".
This can be different to the predict type of the Learner that is being wrapped.
Methods inherited from PipeOp.
Barber RF, Candes EJ, Ramdasa A, Tibshirani RJ (2021). “Predictive inference with the jackknife+.” Annals of Statistics, 49, 486–507. doi:10.1214/20-AOS1965.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Meta PipeOps:
mlr_pipeops_learner,
mlr_pipeops_learner_cv,
mlr_pipeops_learner_quantiles
library("mlr3") task = tsk("mtcars") learner = lrn("regr.rpart") lrncvplus_po = mlr_pipeops$get("learner_pi_cvplus", learner) lrncvplus_po$train(list(task)) lrncvplus_po$predict(list(task))library("mlr3") task = tsk("mtcars") learner = lrn("regr.rpart") lrncvplus_po = mlr_pipeops$get("learner_pi_cvplus", learner) lrncvplus_po$train(list(task)) lrncvplus_po$predict(list(task))
Wraps a LearnerRegr into a PipeOp to predict multiple quantiles.
PipeOpLearnerQuantiles only supports LearnerRegrs that have quantiles as a possible pedict_type.
It produces quantile-based predictions for multiple quantiles in one PredictionRegr. This is especially helpful if the LearnerRegr can only predict one quantile (like for example LearnerRegrGBM in mlr3extralearners)
Inherits the $param_set (and therefore $param_set$values) from the Learner it is constructed from.
R6Class object inheriting from PipeOp.
PipeOpLearnerQuantiles$new(learner, id = NULL, param_vals = list())
learner :: Learner | character(1)Learner to wrap, or a string identifying a Learner in the mlr3::mlr_learners Dictionary.
The Learner has to be a LearnerRegr with predict_type "quantiles".
This argument is always cloned; to access the Learner inside PipeOpLearnerQuantiles by-reference, use $learner.
id :: character(1)
Identifier of the resulting object, internally defaulting to the id of the Learner being wrapped.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpLearnerQuantiles has one input channel named "input", taking a TaskRegr specific to the Learner
type given to learner during construction; both during training and prediction.
PipeOpLearnerQuantiles has one output channel named "output", producing NULL during training and a PredictionRegr object
during prediction.
The output during prediction is a PredictionRegr on the prediction input data that aggregates all results produced by the Learner for each quantile in quantiles.
trained on the training input data.
The $state is set during training. It is a named list with the member:
model_states :: list
List of the states of all models created by the Learner's $.train() function.
The parameters are exactly the parameters of the Learner wrapped by this object.
q_vals :: numeric
Quantiles to use for training and prediction.
Initialized to c(0.05, 0.5, 0.95)
q_response :: numeric(1)
Which quantile in quantiles to use as a response for the PredictionRegr during prediction.
Initialized to 0.5.
The $state is updated during training.
Fields inherited from PipeOp, as well as:
learner :: LearnerRegrLearner that is being wrapped. Read-only.
learner_model :: Learner
If PipeOpLearnerQuantiles has been trained, this is a list containing the Learners for each quantile.
Otherwise, this contains the Learner that is being wrapped.
Read-only.
predict_type :: character(1)
Predict type of the PipeOpLearnerQuantiles, which is always "response" "quantiles".
Methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Meta PipeOps:
mlr_pipeops_learner,
mlr_pipeops_learner_cv,
mlr_pipeops_learner_pi_cvplus
library("mlr3") task = tsk("boston_housing") learner = lrn("regr.debug") po = mlr_pipeops$get("learner_quantiles", learner) po$train(list(task)) po$predict(list(task))library("mlr3") task = tsk("boston_housing") learner = lrn("regr.debug") po = mlr_pipeops$get("learner_quantiles", learner) po$train(list(task)) po$predict(list(task))
Add missing indicator columns ("dummy columns") to the Task.
Drops original features; should probably be used in combination with PipeOpFeatureUnion and imputation PipeOps (see examples).
Note the affect_columns is initialized with selector_invert(selector_type(c("factor", "ordered", "character"))), since missing
values in factorial columns are often indicated by out-of-range imputation (PipeOpImputeOOR).
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpMissInd$new(id = "missind", param_vals = list())
id :: character(1)
Identifier of the resulting object, defaulting to "missind".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
$state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
indicand_cols :: character
Names of columns for which indicator columns are added. If the which parameter is "all", this is just the names of all features,
otherwise it is the names of all features that had missing values during training.
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as:
which :: character(1)
Determines for which features the indicator columns are added. Can either be "missing_train" (default), adding indicator columns
for each feature that actually has missing values, or "all", adding indicator columns for all features.
type :: character(1)
Determines the type of the newly created columns. Can be one of "factor" (default), "integer", "logical", "numeric".
This PipeOp should cover most cases where "dummy columns" or "missing indicators" are desired. Some edge cases:
If imputation
for factorial features is performed and only numeric features should gain missing indicators, the affect_columns parameter
can be set to selector_type("numeric").
If missing indicators should only be added for features that have more than a fraction of x missing values, the
PipeOpRemoveConstants can be used with affect_columns = selector_grep("^missing_") and ratio = x.
Fields inherited from PipeOp.
Methods inherited from PipeOpTaskPreprocSimple(PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("pima")$select(c("insulin", "triceps")) sum(complete.cases(task$data())) task$missings() tail(task$data()) po = po("missind") new_task = po$train(list(task))[[1]] tail(new_task$data()) # proper imputation + missing indicators impgraph = list( po("imputesample"), po("missind") ) %>>% po("featureunion") tail(impgraph$train(task)[[1]]$data())library("mlr3") task = tsk("pima")$select(c("insulin", "triceps")) sum(complete.cases(task$data())) task$missings() tail(task$data()) po = po("missind") new_task = po$train(list(task))[[1]] tail(new_task$data()) # proper imputation + missing indicators impgraph = list( po("imputesample"), po("missind") ) %>>% po("featureunion") tail(impgraph$train(task)[[1]]$data())
Transforms columns using a given formula using the stats::model.matrix() function.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpModelMatrix$new(id = "modelmatrix", param_vals = list())
id :: character(1)
Identifier of resulting object, default "modelmatrix".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with transformed columns according to the used formula.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
formula :: formula
Formula to use. Higher order interactions can be created using constructs like ~. ^ 2.
By default, an (Intercept) column of all 1s is created, which can be avoided by adding 0 + to the term.
See model.matrix().
Uses the model.matrix() function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("modelmatrix", formula = ~ . ^ 2) task$data() pop$train(list(task))[[1]]$data() pop$param_set$values$formula = ~ 0 + . ^ 2 pop$train(list(task))[[1]]$data()library("mlr3") task = tsk("iris") pop = po("modelmatrix", formula = ~ . ^ 2) task$data() pop$train(list(task))[[1]]$data() pop$param_set$values$formula = ~ 0 + . ^ 2 pop$train(list(task))[[1]]$data()
Explicate a Multiplicity by turning the input Multiplicity into multiple outputs.
This PipeOp has multiple output channels; the members of the input Multiplicity
are forwarded each along a single edge. Therefore, only multiplicities with exactly as many
members as outnum are accepted.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
R6Class object inheriting from PipeOp.
PipeOpMultiplicityExply$new(outnum , id = "multiplicityexply", param_vals = list())
outnum :: numeric(1) | character
Determines the number of output channels.
id :: character(1)
Identifier of the resulting object, default "multiplicityexply".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
PipeOpMultiplicityExply has a single input channel named "input", collecting a
Multiplicity of type any ("[*]") both during training and prediction.
PipeOpMultiplicityExply has multiple output channels depending on the outnum construction
argument, named "output1", "output2" returning the elements of the unclassed input
Multiplicity.
The $state is left empty (list()).
PipeOpMultiplicityExply has no Parameters.
outnum should match the number of elements of the unclassed input Multiplicity.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
library("mlr3") task1 = tsk("iris") task2 = tsk("mtcars") po = po("multiplicityexply", outnum = 2) po$train(list(Multiplicity(task1, task2))) po$predict(list(Multiplicity(task1, task2)))library("mlr3") task1 = tsk("iris") task2 = tsk("mtcars") po = po("multiplicityexply", outnum = 2) po$train(list(Multiplicity(task1, task2))) po$predict(list(Multiplicity(task1, task2)))
Implicate a Multiplicity by returning the input(s) converted to a Multiplicity.
This PipeOp has multiple input channels; all inputs are collected into a Multiplicity
and then are forwarded along a single edge, causing the following PipeOps to be called
multiple times, once for each Multiplicity member.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
R6Class object inheriting from PipeOp.
PipeOpMultiplicityImply$new(innum = 0, id = "multiplicityimply", param_vals = list())
innum :: numeric(1) | character
Determines the number of input channels.
If innum is 0 (default), a vararg input channel is created that can take an arbitrary number
of inputs. If innum is a character vector, the number of input channels is the length of
innum.
id :: character(1)
Identifier of the resulting object, default "multiplicityimply".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
PipeOpMultiplicityImply has multiple input channels depending on the innum construction
argument, named "input1", "input2", ... if innum is nonzero; if innum is 0, there is
only one vararg input channel named "...". All input channels take any input ("*") both
during training and prediction.
PipeOpMultiplicityImply has one output channel named "output", emitting a Multiplicity
of type any ("[*]"), i.e., returning the input(s) converted to a Multiplicity both during
training and prediction.
The $state is left empty (list()).
PipeOpMultiplicityImply has no Parameters.
If innum is not numeric, e.g., a character, the output Multiplicity will be named based
on the input channel names
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
library("mlr3") task1 = tsk("iris") task2 = tsk("mtcars") po = po("multiplicityimply") po$train(list(task1, task2)) po$predict(list(task1, task2))library("mlr3") task1 = tsk("iris") task2 = tsk("mtcars") po = po("multiplicityimply") po$train(list(task1, task2)) po$predict(list(task1, task2))
Adds features according to expressions given as formulas that may depend on values of other features. This can add new features, or can change existing features.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpMutate$new(id = "mutate", param_vals = list())
id :: character(1)
Identifier of resulting object, default "mutate".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with added and/or mutated features according to the mutation parameter.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
mutation :: named list of formula
Expressions for new features to create (or present features to change), in the form of formula.
Each element of the list is a formula with the name of the element naming the feature to create or
change, and the formula expression determining the result. This expression may reference
other features, as well as variables visible at the creation of the formula (see examples).
Initialized to list().
delete_originals :: logical(1)
Whether to delete original features. Even when this is FALSE,
present features may still be overwritten. Initialized to FALSE.
A formula created using the ~ operator always contains a reference to the environment in which
the formula is created. This makes it possible to use variables in the ~-expressions that both
reference either column names or variable names.
Note that the formulas in mutation are evaluated sequentially. This allows for using
variables that were constructed during evaluation of a previous formula. However, if existing
features are changed, precedence is given to the original ones before the newly constructed ones.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") constant = 1 pom = po("mutate") pom$param_set$values$mutation = list( Sepal.Length_plus_constant = ~ Sepal.Length + constant, Sepal.Area = ~ Sepal.Width * Sepal.Length, Petal.Area = ~ Petal.Width * Petal.Length, Sepal.Area_plus_Petal.Area = ~ Sepal.Area + Petal.Area ) pom$train(list(tsk("iris")))[[1]]$data()library("mlr3") constant = 1 pom = po("mutate") pom$param_set$values$mutation = list( Sepal.Length_plus_constant = ~ Sepal.Length + constant, Sepal.Area = ~ Sepal.Width * Sepal.Length, Petal.Area = ~ Petal.Width * Petal.Length, Sepal.Area_plus_Petal.Area = ~ Sepal.Area + Petal.Area ) pom$train(list(tsk("iris")))[[1]]$data()
Generates a more balanced data set by down-sampling the instances of non-minority classes using the NEARMISS algorithm.
The algorithm down-samples by selecting instances from the non-minority classes that have the smallest mean distance
to their k nearest neighbors of different classes.
For this only numeric and integer features are taken into account. These must have no missing values.
This can only be applied to classification tasks. Multiclass classification is supported.
See themis::nearmiss for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpNearmiss$new(id = "nearmiss", param_vals = list())
id :: character(1)
Identifier of resulting object, default "nearmiss".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with the rows removed from the non-minority classes.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as
k :: integer(1)
Number of nearest neighbors used for calculating the mean distances. Default is 5.
under_ratio :: numeric(1)
Ratio of the minority-to-majority frequencies. This specifies the ratio to which the number of instances
in the non-minority classes get down-sampled to, relative to the number of instances of the minority class.
Default is 1. For details, see themis::nearmiss.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
Zhang, J., Mani, I. (2003). “KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction.” In Proceedings of Workshop on Learning from Imbalanced Datasets (ICML).
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Create example task task = tsk("wine") task$head() table(task$data(cols = "type")) # Down-sample and balance data pop = po("nearmiss") nearmiss_result = pop$train(list(task))[[1]]$data() nrow(nearmiss_result) table(nearmiss_result$type)library("mlr3") # Create example task task = tsk("wine") task$head() table(task$data(cols = "type")) # Down-sample and balance data pop = po("nearmiss") nearmiss_result = pop$train(list(task))[[1]]$data() nrow(nearmiss_result) table(nearmiss_result$type)
Extracts non-negative components from data by performing non-negative matrix factorization. Only
affects non-negative numerical features. See nmf() for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpNMF$new(id = "nmf", param_vals = list())
id :: character(1)
Identifier of resulting object, default "nmf".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their
non-negative components.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the elements of the object returned by nmf().
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
rank :: integer(1)
Factorization rank, i.e., number of components. Initialized to 2.
See nmf().
method :: character(1)
Specification of the NMF algorithm. Initialized to "brunet".
See nmf().
seed :: character(1) | integer(1) | list() | object of class NMF | function()
Specification of the starting point.
See nmf().
nrun :: integer(1)
Number of runs to performs. Default is 1.
More than a single run allows for the computation of a consensus matrix which will also be stored in the $state.
See nmf().
debug :: logical(1)
Whether to toggle debug mode. Default is FALSE.
See nmf().
keep.all :: logical(1)
Whether all factorizations are to be saved and returned. Default is FALSE.
Only has an effect if nrun > 1.
See nmf().
parallel :: character(1) | integer(1) | logical(1)
Specification of parallel handling if nrun > 1.
Initialized to FALSE, as it is recommended to use mlr3's future-based parallelization.
See nmf().
parallel.required :: character(1) | integer(1) | logical(1)
Same as parallel, but an error is thrown if the computation cannot be performed in parallel or
with the specified number of processors.
Initialized to FALSE, as it is recommended to use mlr3's future-based parallelization.
See nmf().
shared.memory :: logical(1)
Whether shared memory should be enabled.
See nmf().
simplifyCB :: logical(1)
Whether callback results should be simplified. Default is TRUE.
See nmf().
track :: logical(1)
Whether error tracking should be enabled. Default is FALSE.
See nmf().
verbose :: integer(1) | logical(1)
Specification of verbosity. Default is FALSE.
See nmf().
pbackend :: character(1) | integer(1) | NULL
Specification of the parallel backend.
It is recommended to use mlr3's future-based parallelization.
See nmf().
callback | function()
Callback function that is called after each run (if nrun > 1).
See nmf().
Uses the nmf() function as well as basis(), coef() and
ginv().
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("nmf") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("nmf") task$data() pop$train(list(task))[[1]]$data() pop$state
Simply pushes the input forward.
Can be useful during Graph construction using the %>>%-operator to specify which PipeOp gets connected to which.
R6Class object inheriting from PipeOp.
PipeOpNOP$new(id = "nop", param_vals = list())
id :: character(1)
Identifier of resulting object, default "nop".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpNOP has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpNOP has one output channel named "output", producing the object given as input ("*") without changes.
The $state is left empty (list()).
PipeOpNOP has no parameters.
PipeOpNOP is a useful "default" stand-in for a PipeOp/Graph that does nothing.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Placeholder Pipeops:
mlr_pipeops_copy
library("mlr3") nop = po("nop") nop$train(list(1)) # use `gunion` and `%>>%` to create a "bypass" # next to "pca" gr = gunion(list( po("pca"), nop )) %>>% po("featureunion") gr$train(tsk("iris"))[[1]]$data()library("mlr3") nop = po("nop") nop$train(list(1)) # use `gunion` and `%>>%` to create a "bypass" # next to "pca" gr = gunion(list( po("pca"), nop )) %>>% po("featureunion") gr$train(tsk("iris"))[[1]]$data()
Splits a classification Task into several binary classification Tasks to perform "One vs. Rest" classification. This works in combination
with PipeOpOVRUnite.
For each target level a new binary classification Task is constructed with
the respective target level being the positive class and all other target levels being the
new negative class "rest".
This PipeOp creates a Multiplicity, which means that subsequent PipeOps are executed
multiple times, once for each created binary Task, until a PipeOpOVRUnite
is reached.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
R6Class inheriting from PipeOp.
PipeOpOVRSplit$new(id = "ovrsplit", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "ovrsplit".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpOVRSplit has one input channel named "input" taking a TaskClassif
both during training and prediction.
PipeOpOVRSplit has one output channel named "output" returning a Multiplicity of
TaskClassifs both during training and prediction, i.e., the newly
constructed binary classification Tasks.
The $state contains the original target levels of the TaskClassif supplied
during training.
PipeOpOVRSplit has no parameters.
The original target levels stored in the $state are also used during prediction when creating the new
binary classification Tasks.
The names of the element of the output Multiplicity are given by the levels of the target.
If a target level "rest" is present in the input TaskClassif, the
negative class will be labeled as "rest." (using as many "."' postfixes needed to yield a
valid label).
Should be used in combination with PipeOpOVRUnite.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
library(mlr3) task = tsk("iris") po = po("ovrsplit") po$train(list(task)) po$predict(list(task))library(mlr3) task = tsk("iris") po = po("ovrsplit") po$train(list(task)) po$predict(list(task))
Perform "One vs. Rest" classification by (weighted) majority vote prediction from classification Predictions. This works in combination with PipeOpOVRSplit.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction.
Always returns a "prob" prediction, regardless of the incoming Learner's
$predict_type. The label of the class with the highest predicted probability is selected as the
"response" prediction.
Missing values during prediction are treated as each class label being equally likely.
This PipeOp uses a Multiplicity input, which is created by PipeOpOVRSplit and causes
PipeOps on the way to this PipeOp to be called once for each individual binary Task.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
R6Class inheriting from PipeOpEnsemble/PipeOp.
PipeOpOVRUnite$new(id = "ovrunite", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "ovrunite".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpEnsemble. Instead of a
Prediction, a PredictionClassif is used as
input and output during prediction and PipeOpEnsemble's collect parameter is initialized
with TRUE to allow for collecting a Multiplicity input.
The $state is left empty (list()).
The parameters are the parameters inherited from the PipeOpEnsemble.
Inherits from PipeOpEnsemble by implementing the private$.predict() method.
Should be used in combination with PipeOpOVRSplit.
Only fields inherited from PipeOpEnsemble/PipeOp.
Only methods inherited from PipeOpEnsemble/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_classifavg,
mlr_pipeops_regravg
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_replicate
library(mlr3) task = tsk("iris") gr = po("ovrsplit") %>>% lrn("classif.rpart") %>>% po("ovrunite") gr$train(task) gr$predict(task) gr$pipeops$classif.rpart$learner$predict_type = "prob" gr$predict(task)library(mlr3) task = tsk("iris") gr = po("ovrsplit") %>>% lrn("classif.rpart") %>>% po("ovrunite") gr$train(task) gr$predict(task) gr$pipeops$classif.rpart$learner$predict_type = "prob" gr$predict(task)
Extracts principal components from data. Only affects numerical features.
See stats::prcomp() for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpPCA$new(id = "pca", param_vals = list())
id :: character(1)
Identifier of resulting object, default "pca".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their principal components.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as the elements of the class stats::prcomp,
with the exception of the $x slot. These are in particular:
sdev :: numeric
The standard deviations of the principal components.
rotation :: matrix
The matrix of variable loadings.
center :: numeric | logical(1)
The centering used, or FALSE.
scale :: numeric | logical(1)
The scaling used, or FALSE.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
center :: logical(1)
Indicating whether the features should be centered. Default is TRUE. See prcomp().
scale. :: logical(1)
Whether to scale features to unit variance before analysis. Default is FALSE, but scaling is advisable. See prcomp().
rank. :: integer(1)
Maximal number of principal components to be used. Default is NULL: use all components. See prcomp().
Uses the prcomp() function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("pca") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("pca") task$data() pop$train(list(task))[[1]]$data() pop$state
Wraps another PipeOp or Graph as determined by the content hyperparameter.
Input is routed through the content and the contents' output is returned.
The content hyperparameter can be changed during tuning, this is useful as an alternative to PipeOpBranch.
Abstract R6Class inheriting from PipeOp.
PipeOpProxy$new(innum = 0, outnum = 1, id = "proxy", param_vals = list())
innum :: numeric(1)\cr Determines the number of input channels. If innum' is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs.
outnum :: 'numeric(1)
Determines the number of output channels.
id :: character(1)
Identifier of resulting object. See $id slot of PipeOp.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
PipeOpProxy has multiple input channels depending on the innum construction argument, named
"input1", "input2", ... if innum is nonzero; if innum is 0, there is only one vararg
input channel named "...".
PipeOpProxy has multiple output channels depending on the outnum construction argument,
named "output1", "output2", ...
The output is determined by the output of the content operation (a PipeOp or Graph).
The $state is the trained content PipeOp or Graph.
content :: PipeOp | Graph
The PipeOp or Graph that is being proxied (or an object that is
converted to a Graph by as_graph()). Defaults to an instance of
PipeOpFeatureUnion (combines all input if they are Tasks).
The content will internally be coerced to a graph via
as_graph() prior to train and predict.
The default value for content is PipeOpFeatureUnion,
Fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") set.seed(1234) task = tsk("iris") # use a proxy for preprocessing and a proxy for learning, i.e., # no preprocessing and classif.rpart g = po("proxy", id = "preproc", param_vals = list(content = po("nop"))) %>>% po("proxy", id = "learner", param_vals = list(content = lrn("classif.rpart"))) rr_rpart = resample(task, learner = GraphLearner$new(g), resampling = rsmp("cv", folds = 3)) rr_rpart$aggregate(msr("classif.ce")) # use pca for preprocessing and classif.rpart as the learner g$param_set$values$preproc.content = po("pca") g$param_set$values$learner.content = lrn("classif.rpart") rr_pca_rpart = resample(task, learner = GraphLearner$new(g), resampling = rsmp("cv", folds = 3)) rr_pca_rpart$aggregate(msr("classif.ce"))library("mlr3") set.seed(1234) task = tsk("iris") # use a proxy for preprocessing and a proxy for learning, i.e., # no preprocessing and classif.rpart g = po("proxy", id = "preproc", param_vals = list(content = po("nop"))) %>>% po("proxy", id = "learner", param_vals = list(content = lrn("classif.rpart"))) rr_rpart = resample(task, learner = GraphLearner$new(g), resampling = rsmp("cv", folds = 3)) rr_rpart$aggregate(msr("classif.ce")) # use pca for preprocessing and classif.rpart as the learner g$param_set$values$preproc.content = po("pca") g$param_set$values$learner.content = lrn("classif.rpart") rr_pca_rpart = resample(task, learner = GraphLearner$new(g), resampling = rsmp("cv", folds = 3)) rr_pca_rpart$aggregate(msr("classif.ce"))
Splits numeric features into quantile bins.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpQuantileBin$new(id = "quantilebin", param_vals = list())
id :: character(1)
Identifier of resulting object, default "quantilebin".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their binned versions.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
bins :: list
List of intervals representing the bins for each numeric feature.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
numsplits :: integer(1)
Number of bins to create. Default is 2.
Uses the stats::quantile function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("quantilebin") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("quantilebin") task$data() pop$train(list(task))[[1]]$data() pop$state
Projects numeric features onto a randomly sampled subspace. All numeric features
(or the ones selected by affect_columns) are replaced by numeric features
PR1, PR2, ... PRn
Samples with features that contain missing values result in all PR1..PRn being
NA for that sample, so it is advised to do imputation before random projections
if missing values can be expected.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpRandomProjection$new(id = "randomprojection", param_vals = list())
id :: character(1)
Identifier of resulting object, default "randomprojection".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that
would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with affected numeric features
projected onto a random subspace.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as an element $projection, a matrix.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
rank :: integer(1)
The dimension of the subspace to project onto. Initialized to 1.
If there are n (affected) numeric features in the input Task,
then $state$projection is a rank x m matrix. The output is calculated as
input %*% state$projection.
The random projection matrix is obtained through Gram-Schmidt orthogonalization from a matrix with values standard normally distributed, which gives a distribution that is rotation invariant, as per Eaton: Multivariate Statistics, A Vector Space Approach, Pg. 234.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("randomprojection", rank = 2) task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("randomprojection", rank = 2) task$data() pop$train(list(task))[[1]]$data() pop$state
Takes in a Prediction of predict_type "prob"
(for PredictionClassif) or "se"
(for PredictionRegr) and generates a randomized "response"
prediction.
For "prob", the responses are sampled according to
the probabilities of the input PredictionClassif. For "se",
responses are randomly drawn according to the rdistfun parameter (default is rnorm) by using
the original responses of the input PredictionRegr as the mean and the
original standard errors of the input PredictionRegr as the standard
deviation (sampling is done observation-wise).
R6Class object inheriting from PipeOp.
PipeOpRandomResponse$new(id = "randomresponse", param_vals = list(), packages = character(0))
id :: character(1)
Identifier of the resulting object, default "randomresponse".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
packages :: character
Set of all required packages for the private$.predict() methods related to the rdistfun
parameter. Default is character(0).
PipeOpRandomResponse has one input channel named "input", taking NULL during training and
a Prediction during prediction.
PipeOpRandomResponse has one output channel named "output", producing NULL during
training and a Prediction with random responses during prediction.
The $state is left empty (list()).
rdistfun :: function
A function for generating random responses when the predict type is "se". This function must
accept the arguments n (integerish number of responses), mean (numeric for the mean),
and sd (numeric for the standard deviation), and must vectorize over mean
and sd. Default is rnorm.
If the predict_type of the input Prediction does not match "prob" or
"se", the input Prediction will be returned unaltered.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library(mlr3) library(mlr3learners) task1 = tsk("iris") g1 = LearnerClassifRpart$new() %>>% PipeOpRandomResponse$new() g1$train(task1) g1$pipeops$classif.rpart$learner$predict_type = "prob" set.seed(2409) g1$predict(task1) task2 = tsk("mtcars") g2 = LearnerRegrLM$new() %>>% PipeOpRandomResponse$new() g2$train(task2) g2$pipeops$regr.lm$learner$predict_type = "se" set.seed(2906) g2$predict(task2)library(mlr3) library(mlr3learners) task1 = tsk("iris") g1 = LearnerClassifRpart$new() %>>% PipeOpRandomResponse$new() g1$train(task1) g1$pipeops$classif.rpart$learner$predict_type = "prob" set.seed(2409) g1$predict(task1) task2 = tsk("mtcars") g2 = LearnerRegrLM$new() %>>% PipeOpRandomResponse$new() g2$train(task2) g2$pipeops$regr.lm$learner$predict_type = "se" set.seed(2906) g2$predict(task2)
Perform (weighted) prediction averaging from regression Predictions by connecting
PipeOpRegrAvg to multiple PipeOpLearner outputs.
The resulting "response" prediction is a weighted average of the incoming "response" predictions.
Aggregation of "se" predictions is controlled by the se_aggr parameter (see below). When "se" is not requested
or se_aggr = "none", "se" is dropped.
R6Class inheriting from PipeOpEnsemble/PipeOp.
"se" AggregationLet there be K incoming predictions with weights w (sum to 1). For a given row j, denote
per-model means mu_i[j] and, if available, per-model standard errors se_i[j].
Define
mu_bar[j] = sum_i w[i] * mu_i[j] var_between[j] = sum_i w[i] * (mu_i[j] - mu_bar[j])^2 # weighted var of means var_within[j] = sum_i w[i] * se_i[j]^2 # weighted mean of SE^2s
The following aggregation methods are available:
se_aggr = "predictive" – Within + Between (mixture/predictive SD)
se[j] = sqrt(var_within[j] + var_between[j])
Interpretation. Treats each incoming se_i as that model's predictive SD at the point (or, if the learner
reports SE of the conditional mean–as many mlr3 regression learners do–then as that mean-SE). The returned se
is the SD of the mixture ensemble under weighted averaging: it increases when base models disagree (epistemic spread)
and when individual models are uncertain (aleatoric spread).
Notes. If se_i represents mean SE (common in predict.lm(se.fit=TRUE)-style learners), the result
aggregates those mean-SEs and still adds model disagreement correctly, but it will underestimate a true predictive SD
that would additionally include irreducible noise. Requires "se" to be present from all inputs.
se_aggr = "mean" – SE of the weighted average of means under equicorrelation
With a correlation parameter se_aggr_rho = rho, assume
Cov(mu_i_hat, mu_j_hat) = rho * se_i * se_j for all i != j. Then
# components: a[j] = sum_i (w[i]^2 * se_i[j]^2) b[j] = (sum_i w[i] * se_i[j])^2 var_mean[j] = (1 - rho) * a[j] + rho * b[j] se[j] = sqrt(var_mean[j])
Interpretation. Returns the standard error of the averaged estimator sum_i w[i] * mu_i, not a predictive SD.
Use when you specifically care about uncertainty of the averaged mean itself.
Notes. rho is clamped to the PSD range [-1/(K-1), 1] for K > 1. Typical settings:
rho = 0 (assume independence; often optimistic for CV/bagging) and rho = 1 (perfect correlation; conservative and
equal to the weighted arithmetic mean of SEs). Requires "se" from all inputs.
se_aggr = "within" – Within-model component only
se[j] = sqrt(var_within[j])
Interpretation. Aggregates only the average per-model uncertainty and ignores disagreement between models.
Useful as a diagnostic of the aleatoric component; not a full ensemble uncertainty.
Notes. Typically underestimates the uncertainty of the ensemble prediction when models disagree.
Requires "se" from all inputs.
se_aggr = "between" – Between-model component only (works without "se")
se[j] = sqrt(var_between[j])
Interpretation. Captures only the spread of the base means (epistemic/model disagreement).
Notes. This is the only method that does not use incoming "se". It is a lower bound on a full predictive SD,
because it omits within-model noise.
se_aggr = "none" – Do not return "se"
"se" is dropped from the output prediction.
Relationships and edge cases. For any row, se("predictive") >= max(se("within"), se("between")).
With a single input (K = 1), "predictive" and "within" return the input "se", "between" returns 0.
Methods "predictive", "mean", and "within" require all inputs to provide "se"; otherwise aggregation errors.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction.
PipeOpRegrAvg$new(innum = 0, collect_multiplicity = FALSE, id = "regravg", param_vals = list())
innum :: numeric(1)
Determines the number of input channels.
If innum is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs.
collect_multiplicity :: logical(1)
If TRUE, the input is a Multiplicity collecting channel. This means, a
Multiplicity input, instead of multiple normal inputs, is accepted and the members are aggregated. This requires innum to be 0.
Default is FALSE.
id :: character(1)
Identifier of the resulting object, default "regravg".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpEnsemble. Instead of a Prediction, a PredictionRegr
is used as input and output during prediction.
The $state is left empty (list()).
The parameters are the parameters inherited from the PipeOpEnsemble, as well as:
se_aggr :: character(1)
Controls how incoming "se" values are aggregated into an ensemble "se". One of
"predictive", "mean", "within", "between", "none". See the description above for definitions and interpretation.
se_aggr_rho :: numeric(1)
Equicorrelation parameter used only for se_aggr = "mean". Interpreted as the common correlation between
per-model mean estimators. Recommended range [0, 1]; values are clamped to [-1/(K-1), 1] for validity.
Inherits from PipeOpEnsemble by implementing the private$weighted_avg_predictions() method.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpEnsemble/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Other Ensembles:
PipeOpEnsemble,
mlr_learners_avg,
mlr_pipeops_classifavg,
mlr_pipeops_ovrunite
library("mlr3") # Simple Bagging for Regression gr = ppl("greplicate", po("subsample") %>>% po("learner", lrn("regr.rpart")), n = 5 ) %>>% po("regravg") resample(tsk("mtcars"), GraphLearner$new(gr), rsmp("holdout"))library("mlr3") # Simple Bagging for Regression gr = ppl("greplicate", po("subsample") %>>% po("learner", lrn("regr.rpart")), n = 5 ) %>>% po("regravg") resample(tsk("mtcars"), GraphLearner$new(gr), rsmp("holdout"))
Remove constant features from a mlr3::Task. For each feature, calculates the ratio of features which differ from their mode value. All features with a ratio below a settable threshold are removed from the task. Missing values can be ignored or treated as a regular value distinct from non-missing values.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpRemoveConstants$new(id = "removeconstants")
id :: character(1)
Identifier of the resulting object, defaulting to "removeconstants".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
$state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
features :: character()
Names of features that are being kept. Features of types that the Filter can not operate on are always being kept.
The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as:
ratio :: numeric(1)
Ratio of values which must be different from the mode value in order to keep a feature in the task.
Initialized to 0, which means only constant features with exactly one observed level are removed.
rel_tol :: numeric(1)
Relative tolerance within which to consider a numeric feature constant. Set to 0 to disregard relative tolerance. Initialized to 1e-8.
abs_tol :: numeric(1)
Absolute tolerance within which to consider a numeric feature constant. Set to 0 to disregard absolute tolerance. Initialized to 1e-8.
na_ignore :: logical(1)
If TRUE, the ratio is calculated after removing all missing values first, so a column can be "constant" even if some but not all values are NA.
Initialized to TRUE.
Fields inherited from PipeOp.
Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") data = data.table::data.table(y = runif(10), a = 1:10, b = rep(1, 10), c = rep(1:2, each = 5)) task = TaskRegr$new("example", data, target = "y") po = po("removeconstants") po$train(list(task = task))[[1]]$data() po$statelibrary("mlr3") data = data.table::data.table(y = runif(10), a = 1:10, b = rep(1, 10), c = rep(1:2, each = 5)) task = TaskRegr$new("example", data, target = "y") po = po("removeconstants") po$train(list(task = task))[[1]]$data() po$state
Renames the columns of a Task both during training and prediction.
Uses the $rename() mutator of the Task.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpRenameColumns$new(id = "renamecolumns", param_vals = list())
id :: character(1)
Identifier of resulting object, default "renamecolumns".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with the old column names changed to the new ones.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
renaming :: named character | function
Takes the form of either a named character or a function.
For a named character vector, the names of the vector elements specify the
old column names and the corresponding element values give the new column names.
A function specifies how the old column names should be changed to the new column names.
The function must return a character vector with one entry per input column name so that each selected column receives a new name.
To choose columns use the affect_columns parameter.
Initialized to character(0).
ignore_missing :: logical(1)
Ignore if columns named in renaming are not found in the input Task. If this is
FALSE, then names found in renaming not found in the Task cause an error.
Initialized to FALSE.
Uses the $rename() mutator of the Task to set new column names.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("renamecolumns", param_vals = list(renaming = c("Petal.Length" = "PL"))) pop$train(list(task)) pof = po("renamecolumns", param_vals = list(renaming = function(colnames) { sub("Petal", "P", colnames) })) pof$train(list(task))library("mlr3") task = tsk("iris") pop = po("renamecolumns", param_vals = list(renaming = c("Petal.Length" = "PL"))) pop$train(list(task)) pof = po("renamecolumns", param_vals = list(renaming = function(colnames) { sub("Petal", "P", colnames) })) pof$train(list(task))
Replicate the input as a Multiplicity, causing subsequent PipeOps to be executed multiple
reps times.
Note that Multiplicity is currently an experimental features and the implementation or UI
may change.
R6Class object inheriting from PipeOp.
PipeOpReplicate$new(id = "replicate", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "replicate".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
PipeOpReplicate has one input channel named "input", taking any input ("*") both during training and prediction.
PipeOpReplicate has one output channel named "output" returning the replicated input as a
Multiplicity of type any ("[*]") both during training and prediction.
The $state is left empty (list()).
reps :: numeric(1)
Integer indicating the number of times the input should be replicated.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
Other Experimental Features:
Multiplicity(),
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite
library("mlr3") task = tsk("iris") po = po("replicate", param_vals = list(reps = 3)) po$train(list(task)) po$predict(list(task))library("mlr3") task = tsk("iris") po = po("replicate", param_vals = list(reps = 3)) po$train(list(task)) po$predict(list(task))
Applies a function to each row of a task. Use the affect_columns parameter inherited from
PipeOpTaskPreprocSimple to limit the columns this function should be applied to.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpColApply$new(id = "rowapply", param_vals = list())
id :: character(1)
Identifier of resulting object, default "rowapply".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with the original affected columns replaced by the columns created by
applying applicator to each row.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
applicator :: function
Function to apply to each row in the affected columns of the task.
The return value should be a vector of the same length for every input.
Initialized as identity().
col_prefix :: character(1)
If specified, prefix to be prepended to the column names of affected columns, separated by a dot (.). Initialized as "".
Calls apply on the data, using the value of applicator as FUN.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pora = po("rowapply", applicator = scale) pora$train(list(task))[[1]] # rows are standardizedlibrary("mlr3") task = tsk("iris") pora = po("rowapply", applicator = scale) pora$train(list(task))[[1]] # rows are standardized
Centers all numeric features to mean = 0 (if center parameter is TRUE) and scales them
by dividing them by their root-mean-square (if scale parameter is TRUE).
The root-mean-square here is defined as sqrt(sum(x^2)/(length(x)-1)). If the center parameter
is TRUE, this corresponds to the sd().
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpScale$new(id = "scale", param_vals = list())
id :: character(1)
Identifier of resulting object, default "scale".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric parameters centered and/or scaled.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
center :: numeric
The mean / median (depending on robust) of each numeric feature during training, or 0 if center is FALSE. Will be subtracted during the predict phase.
scale :: numeric
The value by which features are divided. 1 if scale is FALSE
If robust is FALSE, this is the root mean square, defined as sqrt(sum(x^2)/(length(x)-1)), of each feature, possibly after centering.
If robust is TRUE, this is the median absolute deviation multiplied by 1.4826 (see stats::mad) of each feature, possibly after centering.
This is 1 for features that are constant during training if center is TRUE, to avoid division-by-zero.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
center :: logical(1)
Whether to center features, i.e. subtract their mean() from them. Default TRUE.
scale :: logical(1)
Whether to scale features, i.e. divide them by sqrt(sum(x^2)/(length(x)-1)). Default TRUE.
robust :: logical(1)
Whether to use robust scaling; instead of scaling / centering with mean / standard deviation,
median and median absolute deviation mad are used.
Initialized to FALSE.
Imitates the scale() function for robust = FALSE and alternatively subtracts the
median and divides by mad for robust = TRUE.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pos = po("scale") pos$train(list(task))[[1]]$data() one_line_of_iris = task$filter(13) one_line_of_iris$data() pos$predict(list(one_line_of_iris))[[1]]$data()library("mlr3") task = tsk("iris") pos = po("scale") pos$train(list(task))[[1]]$data() one_line_of_iris = task$filter(13) one_line_of_iris$data() pos$predict(list(one_line_of_iris))[[1]]$data()
Scales the numeric data columns so their maximum absolute value is maxabs,
if possible. NA, Inf are ignored, and features that are constant 0
are not scaled.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpScaleMaxAbs$new(id = "scalemaxabs", param_vals = list())
id :: character(1)
Identifier of resulting object, default "scalemaxabs".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with scaled numeric features.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the maximum absolute values of each numeric feature.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
maxabs :: numeric(1)
The maximum absolute value for each column after transformation. Default is 1.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("scalemaxabs") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("scalemaxabs") task$data() pop$train(list(task))[[1]]$data() pop$state
Linearly transforms numeric data columns so they are between lower
and upper. The formula for this is ,
where is and
is . The same transformation is applied during training and
prediction.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpScaleRange$new(id = "scalerange", param_vals = list())
id :: character(1)
Identifier of resulting object, default "scalerange".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with scaled numeric features.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as the two transformation parameters and for each numeric
feature.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
lower :: numeric(1)
Target value of smallest item of input data. Initialized to 0.
upper :: numeric(1)
Target value of greatest item of input data. Initialized to 1.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("scalerange", param_vals = list(lower = -1, upper = 1)) task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("scalerange", param_vals = list(lower = -1, upper = 1)) task$data() pop$train(list(task))[[1]]$data() pop$state
Removes features from Task depending on a Selector function:
The selector parameter gives the features to keep.
See Selector for selectors that are provided and how to write custom Selectors.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpSelect$new(id = "select", param_vals = list())
id :: character(1)
Identifier of resulting object, default "select".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with features removed that were not selected by the Selector/function in selector.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
selection :: character
A vector of all feature names that are kept (i.e. not dropped) in the Task. Initialized to selector_all()
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
selector :: function | Selector Selector function, takes a Task as argument and returns a character
of features to keep.
See Selector for example functions. Defaults to selector_all().
Uses task$select().
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Selectors:
Selector
library("mlr3") task = tsk("boston_housing") pos = po("select") pos$param_set$values$selector = selector_all() pos$train(list(task))[[1]]$feature_names pos$param_set$values$selector = selector_type("factor") pos$train(list(task))[[1]]$feature_names pos$param_set$values$selector = selector_invert(selector_type("factor")) pos$train(list(task))[[1]]$feature_names pos$param_set$values$selector = selector_grep("^r") pos$train(list(task))[[1]]$feature_nameslibrary("mlr3") task = tsk("boston_housing") pos = po("select") pos$param_set$values$selector = selector_all() pos$train(list(task))[[1]]$feature_names pos$param_set$values$selector = selector_type("factor") pos$train(list(task))[[1]]$feature_names pos$param_set$values$selector = selector_invert(selector_type("factor")) pos$train(list(task))[[1]]$feature_names pos$param_set$values$selector = selector_grep("^r") pos$train(list(task))[[1]]$feature_names
Generates a more balanced data set by creating synthetic instances of the minority class using the SMOTE algorithm.
The algorithm samples for each minority instance a new data point based on the K nearest neighbors of that data point.
It can only be applied to tasks with purely numeric features. See smotefamily::SMOTE for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpSmote$new(id = "smote", param_vals = list())
id :: character(1)
Identifier of resulting object, default "smote".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
K :: numeric(1)
The number of nearest neighbors used for sampling new values.
See SMOTE().
dup_size :: numeric
Desired times of synthetic minority instances over the original number of
majority instances. See SMOTE().
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002). “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research, 16, 321–357. doi:10.1613/jair.953.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Create example task data = smotefamily::sample_generator(1000, ratio = 0.80) data$result = factor(data$result) task = TaskClassif$new(id = "example", backend = data, target = "result") task$data() table(task$data()$result) # Generate synthetic data for minority class pop = po("smote") smotedata = pop$train(list(task))[[1]]$data() table(smotedata$result)library("mlr3") # Create example task data = smotefamily::sample_generator(1000, ratio = 0.80) data$result = factor(data$result) task = TaskClassif$new(id = "example", backend = data, target = "result") task$data() table(task$data()$result) # Generate synthetic data for minority class pop = po("smote") smotedata = pop$train(list(task))[[1]]$data() table(smotedata$result)
Generates a more balanced data set by creating synthetic instances of the minority class for nominal and continuous data using the SMOTENC algorithm.
The algorithm generates for each minority instance a new data point based on the k nearest
neighbors of that data point.
It treats integer features as numeric. To not change feature types, the numeric, synthetic data
generated for these features are rounded back to integer.
Because of this, data generated through usage of this PipeOp is not exactly equal to data generated by
calling themis::smotenc directly on the same data set.
It can only be applied to classification tasks with factor (or ordered) features and at least one numeric (or integer) feature that have no missing values.
See themis::smotenc for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpSmoteNC$new(id = "smotenc", param_vals = list())
id :: character(1)
Identifier of resulting object, default "smotenc".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added synthetic rows for the minority class.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
k :: integer(1)
Number of nearest neighbors used for generating new values from the minority class. Default is 5.
over_ratio :: numeric(1)
Ratio of the majority to minority class. Default is 1. For details, see themis::smotenc.
If a target level is unobserved during training, no synthetic data points will be generated for that class. No error is raised; the unobserved class is simply ignored.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002). “SMOTE: Synthetic Minority Over-sampling Technique.” Journal of Artificial Intelligence Research, 16, 321–357. doi:10.1613/jair.953.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Create example task data = data.frame( target = factor(sample(c("c1", "c2"), size = 200, replace = TRUE, prob = c(0.1, 0.9))), feature = rnorm(200) ) task = TaskClassif$new(id = "example", backend = data, target = "target") task$head() table(task$data(cols = "target")) # Generate synthetic data for minority class pop = po("smotenc") smotenc_result = pop$train(list(task))[[1]]$data() nrow(smotenc_result) table(smotenc_result$target)library("mlr3") # Create example task data = data.frame( target = factor(sample(c("c1", "c2"), size = 200, replace = TRUE, prob = c(0.1, 0.9))), feature = rnorm(200) ) task = TaskClassif$new(id = "example", backend = data, target = "target") task$head() table(task$data(cols = "target")) # Generate synthetic data for minority class pop = po("smotenc") smotenc_result = pop$train(list(task))[[1]]$data() nrow(smotenc_result) table(smotenc_result$target)
Normalizes the data row-wise. This is a natural generalization of the "sign" function to higher dimensions.
R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpSpatialSign$new(id = "spatialsign", param_vals = list())
id :: character(1)
Identifier of resulting object, default "spatialsign".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their normalized versions.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
length :: numeric(1)
Length to scale rows to. Default is 1.
norm :: numeric(1)
Norm to use. Rows are scaled to sum(x^norm)^(1/norm) == length for finite norm, or to max(abs(x)) == length
if norm is Inf. Default is 2.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") task$data() pop = po("spatialsign") pop$train(list(task))[[1]]$data()library("mlr3") task = tsk("iris") task$data() pop = po("spatialsign") pop$train(list(task))[[1]]$data()
Replaces numeric features with columns representing spline basis expansions.
Depending on the type parameter, constructs polynomial B-splines splines::bs() or natural cubic splines splines::ns() for the respective column.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
po("splines", param_vals = list())
id :: character(1)
Identifier of resulting object, default "splines".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with the selected columns transformed according to the specified Splines Method.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
After training the Boundary.knots will be given to the $state.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
type :: character(1)
Controls the type of splines that are to be created. Can be either polynomial (splines::bs)
or natural (splines::ns). Initializied to "natural".
df :: integer(1)
Number of degrees of freedom for calculation of the spline basis matrix. Initialized to NULL.
Depending on type, see either splines::bs() or splines::ns().
knots :: named list
Internal breakpoints that define the spline, given as a named list of numeric vectors,
where each name corresponds to a feature and its value specifies the knots for that feature.
Default is NULL. Depending on type, see either splines::bs() or splines::ns().
intercept :: logical(1)
If TRUE, an intercept is included in the basis. Default is FALSE.
Depending on type, see either splines::bs() or splines::ns().
degree :: integer(1)
Degree of the polynomial used to compute polynomial splines. Only used if type is "polynomial".
Default is 3. See splines::bs().
Boundary.knots :: named list
Boundary points at which to anchor the spline basis, given as a named list of numeric vectors,
where each name corresponds to a feature and its value specifies the boundary points for that feature.
Default is NULL. Depending on type, see either splines::bs() or splines::ns().
Creates a spline basis using either splines::bs or splines::ns depending on the hyperparameter type.
After training, the Boundary.knots that were either provided by the user or calculated during training are
stored in the PipeOp's $state.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("splines") pop$train(list(task))[[1]]$data() pobk = po("splines", Boundary.knots = list( Petal.Length = c(0, 4), Petal.Width = c(4, 7), Sepal.Length = c(1, 5), Sepal.Width = c(3, 6)) ) pobk$train(list(task))[[1]]$data()library("mlr3") task = tsk("iris") pop = po("splines") pop$train(list(task))[[1]]$data() pobk = po("splines", Boundary.knots = list( Petal.Length = c(0, 4), Petal.Width = c(4, 7), Sepal.Length = c(1, 5), Sepal.Width = c(3, 6)) ) pobk$train(list(task))[[1]]$data()
Subsamples a Task to use a fraction of the rows.
Sampling happens only during training phase. Subsampling a Task may be
beneficial for training time at possibly (depending on original Task size)
negligible cost of predictive performance.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpSubsample$new(id = "subsample", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "subsample"
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output during training is the input Task with added or removed rows according to the sampling.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc; however, the affect_columns parameter is not present. Further parameters are:
frac :: numeric(1)
Fraction of rows in the Task to keep. May only be greater than 1 if replace is TRUE. Initialized to (1 - exp(-1)) == 0.6321.
stratify :: logical(1)
Should the subsamples be stratified by target? Initialized to FALSE. May only be TRUE for TaskClassif input and if use_groups = FALSE.
use_groups :: logical(1)
If TRUE and if the Task has a column with role group, grouped observations are kept together during subsampling. In case of sampling with
replace :: logical(1)
Sample with replacement? Initialized to FALSE.
Uses task$filter() to remove rows. If replace is TRUE and identical rows are added, then the task$row_roles$use can not be used
to duplicate rows because of [inaudible]; instead the task$rbind() function is used, and
a new data.table is attached that contains all rows that are being duplicated exactly as many times as they are being added.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Subsample with stratification pop = po("subsample", frac = 0.7, stratify = TRUE, use_groups = FALSE) pop$train(list(tsk("iris"))) # Subsample, respecting grouping df = data.frame( target = runif(3000), x1 = runif(3000), x2 = runif(3000), grp = sample(paste0("g", 1:100), 3000, replace = TRUE) ) task = TaskRegr$new(id = "example", backend = df, target = "target") task$set_col_roles("grp", "group") pop = po("subsample", frac = 0.7, use_groups = TRUE) pop$train(list(task))library("mlr3") # Subsample with stratification pop = po("subsample", frac = 0.7, stratify = TRUE, use_groups = FALSE) pop$train(list(tsk("iris"))) # Subsample, respecting grouping df = data.frame( target = runif(3000), x1 = runif(3000), x2 = runif(3000), grp = sample(paste0("g", 1:100), 3000, replace = TRUE) ) task = TaskRegr$new(id = "example", backend = df, target = "target") task$set_col_roles("grp", "group") pop = po("subsample", frac = 0.7, use_groups = TRUE) pop$train(list(task))
Inverts target-transformations done during training based on a supplied inversion
function. Typically should be used in combination with a subclass of PipeOpTargetTrafo.
During prediction phase the function supplied through "fun" is called with a list containing
the "prediction" as a single element, and should return a list with a single element
(a Prediction) that is returned by PipeOpTargetInvert.
R6Class object inheriting from PipeOp.
PipeOpTargetInvert$new(id = "targetinvert", param_vals = list())
id :: character(1)
Identifier of resulting object, default "targetinvert".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpTargetInvert has two input channels named "fun" and "prediction". During
training, both take NULL as input. During prediction, "fun" takes a function and
"prediction" takes a Prediction.
PipeOpTargetInvert has one output channel named "output" and returns NULL during
training and a Prediction during prediction.
The $state is left empty (list()).
PipeOpTargetInvert has no parameters.
Should be used in combination with a subclass of PipeOpTargetTrafo.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library(mlr3) task = tsk("boston_housing") po = PipeOpTargetMutate$new("logtrafo", param_vals = list( trafo = function(x) log(x, base = 2), inverter = function(x) list(response = 2 ^ x$response)) ) # Note that this example is ill-equipped to work with # `predict_type == "se"` predictions. po$train(list(task)) po$predict(list(task)) g = Graph$new() g$add_pipeop(po) g$add_pipeop(LearnerRegrRpart$new()) g$add_pipeop(PipeOpTargetInvert$new()) g$add_edge(src_id = "logtrafo", dst_id = "targetinvert", src_channel = 1, dst_channel = 1) g$add_edge(src_id = "logtrafo", dst_id = "regr.rpart", src_channel = 2, dst_channel = 1) g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert", src_channel = 1, dst_channel = 2) g$train(task) g$predict(task) #syntactic sugar using ppl(): tt = ppl("targettrafo", graph = PipeOpLearner$new(LearnerRegrRpart$new())) tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2) tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response)library(mlr3) task = tsk("boston_housing") po = PipeOpTargetMutate$new("logtrafo", param_vals = list( trafo = function(x) log(x, base = 2), inverter = function(x) list(response = 2 ^ x$response)) ) # Note that this example is ill-equipped to work with # `predict_type == "se"` predictions. po$train(list(task)) po$predict(list(task)) g = Graph$new() g$add_pipeop(po) g$add_pipeop(LearnerRegrRpart$new()) g$add_pipeop(PipeOpTargetInvert$new()) g$add_edge(src_id = "logtrafo", dst_id = "targetinvert", src_channel = 1, dst_channel = 1) g$add_edge(src_id = "logtrafo", dst_id = "regr.rpart", src_channel = 2, dst_channel = 1) g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert", src_channel = 1, dst_channel = 2) g$train(task) g$predict(task) #syntactic sugar using ppl(): tt = ppl("targettrafo", graph = PipeOpLearner$new(LearnerRegrRpart$new())) tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2) tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response)
Changes the target of a Task according to a function given as hyperparameter.
An inverter-function that undoes the transformation during prediction must also be given.
R6Class object inheriting from PipeOpTargetTrafo/PipeOp
PipeOpTargetMutate$new(id = "targetmutate", param_vals = list(), new_task_type = NULL)
id :: character(1)
Identifier of resulting object, default "targetmutate".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
new_task_type :: character(1) | NULL
The task type to which the output is converted, must be one of mlr_reflections$task_types$type.
Defaults to NULL: no change in task type.
Input and output channels are inherited from PipeOpTargetTrafo.
The $state is left empty (list()).
The parameters are the parameters inherited from PipeOpTargetTrafo, as well as:
trafo :: function data.table -> data.frame | data.table | matrix
Transformation function for the target. Should only be a function of the target, i.e., taking a
single data.table argument, typically with one column. The return value is used as the new
target of the resulting Task. To change target names, change the column name of the data
using e.g. setnames().
Note that this function also gets called during prediction and should thus gracefully handle NA values.
Initialized to identity().
inverter :: function data.table -> data.table | named list
Inversion of the transformation function for the target. Called on a data.table created from a Prediction
using as.data.table(), without the $row_ids and $truth columns,
and should return a data.table or named list that contains the new relevant slots of a
Prediction subclass (e.g., $response, $prob, $se, ...). Initialized to identity().
Overloads PipeOpTargetTrafo's .transform() and
.invert() functions. Should be used in combination with PipeOpTargetInvert.
Fields inherited from PipeOp, as well as:
new_task_type :: character(1)new_task_type construction argument. Read-only.
Only methods inherited from PipeOpTargetTrafo/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library(mlr3) task = tsk("boston_housing") po = PipeOpTargetMutate$new("logtrafo", param_vals = list( trafo = function(x) log(x, base = 2), inverter = function(x) list(response = 2 ^ x$response)) ) # Note that this example is ill-equipped to work with # `predict_type == "se"` predictions. po$train(list(task)) po$predict(list(task)) g = Graph$new() g$add_pipeop(po) g$add_pipeop(LearnerRegrRpart$new()) g$add_pipeop(PipeOpTargetInvert$new()) g$add_edge(src_id = "logtrafo", dst_id = "targetinvert", src_channel = 1, dst_channel = 1) g$add_edge(src_id = "logtrafo", dst_id = "regr.rpart", src_channel = 2, dst_channel = 1) g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert", src_channel = 1, dst_channel = 2) g$train(task) g$predict(task) #syntactic sugar using ppl(): tt = ppl("targettrafo", graph = PipeOpLearner$new(LearnerRegrRpart$new())) tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2) tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response)library(mlr3) task = tsk("boston_housing") po = PipeOpTargetMutate$new("logtrafo", param_vals = list( trafo = function(x) log(x, base = 2), inverter = function(x) list(response = 2 ^ x$response)) ) # Note that this example is ill-equipped to work with # `predict_type == "se"` predictions. po$train(list(task)) po$predict(list(task)) g = Graph$new() g$add_pipeop(po) g$add_pipeop(LearnerRegrRpart$new()) g$add_pipeop(PipeOpTargetInvert$new()) g$add_edge(src_id = "logtrafo", dst_id = "targetinvert", src_channel = 1, dst_channel = 1) g$add_edge(src_id = "logtrafo", dst_id = "regr.rpart", src_channel = 2, dst_channel = 1) g$add_edge(src_id = "regr.rpart", dst_id = "targetinvert", src_channel = 1, dst_channel = 2) g$train(task) g$predict(task) #syntactic sugar using ppl(): tt = ppl("targettrafo", graph = PipeOpLearner$new(LearnerRegrRpart$new())) tt$param_set$values$targetmutate.trafo = function(x) log(x, base = 2) tt$param_set$values$targetmutate.inverter = function(x) list(response = 2 ^ x$response)
Linearly transforms a numeric target of a TaskRegr so it is between lower
and upper. The formula for this is ,
where is and
is . The same transformation is applied during training and
prediction.
R6Class object inheriting from PipeOpTargetTrafo/PipeOp
PipeOpTargetTrafoScaleRange$new(id = "targettrafoscalerange", param_vals = list())
id :: character(1)
Identifier of resulting object, default "targettrafoscalerange".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise
be set during construction. Default list().
Input and output channels are inherited from PipeOpTargetTrafo.
The $state is a named list containing the slots $offset and $scale.
The parameters are the parameters inherited from PipeOpTargetTrafo, as well as:
lower :: numeric(1)
Target value of smallest item of input target. Initialized to 0.
upper :: numeric(1)
Target value of greatest item of input target. Initialized to 1.
Overloads PipeOpTargetTrafo's .get_state(), .transform(), and
.invert(). Should be used in combination with PipeOpTargetInvert.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTargetTrafo/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library(mlr3) task = tsk("boston_housing") po = PipeOpTargetTrafoScaleRange$new() po$train(list(task)) po$predict(list(task)) #syntactic sugar for a graph using ppl(): ttscalerange = ppl("targettrafo", trafo_pipeop = PipeOpTargetTrafoScaleRange$new(), graph = PipeOpLearner$new(LearnerRegrRpart$new())) ttscalerange$train(task) ttscalerange$predict(task) ttscalerange$state$regr.rpartlibrary(mlr3) task = tsk("boston_housing") po = PipeOpTargetTrafoScaleRange$new() po$train(list(task)) po$predict(list(task)) #syntactic sugar for a graph using ppl(): ttscalerange = ppl("targettrafo", trafo_pipeop = PipeOpTargetTrafoScaleRange$new(), graph = PipeOpLearner$new(LearnerRegrRpart$new())) ttscalerange$train(task) ttscalerange$predict(task) ttscalerange$state$regr.rpart
Computes a bag-of-word representation from a (set of) columns.
Columns of type character are split up into words.
Uses the quanteda::dfm() and quanteda::dfm_trim() functions.
TF-IDF computation works similarly to quanteda::dfm_tfidf()
but has been adjusted for train/test data split using quanteda::docfreq()
and quanteda::dfm_weight().
In short:
Per default, produces a bag-of-words representation
If n is set to values > 1, ngrams are computed
If df_trim parameters are set, the bag-of-words is trimmed.
The scheme_tf parameter controls term-frequency (per-document, i.e. per-row) weighting
The scheme_df parameter controls the document-frequency (per token, i.e. per-column) weighting.
Parameters specify arguments to quanteda's dfm, dfm_trim, docfreq and dfm_weight.
What belongs to what can be obtained from each parameter's tags where tokenizer are
arguments passed on to quanteda::dfm().
Defaults to a bag-of-words representation with token counts as matrix entries.
In order to perform the default dfm_tfidf weighting, set the scheme_df parameter to "inverse".
The scheme_df parameter is initialized to "unary", which disables document frequency weighting.
The PipeOp works as follows:
Words are tokenized using quanteda::tokens.
Ngrams are computed using quanteda::tokens_ngrams.
A document-frequency matrix is computed using quanteda::dfm.
The document-frequency matrix is trimmed using quanteda::dfm_trim during train-time.
The document-frequency matrix is re-weighted (similar to quanteda::dfm_tfidf) if scheme_df is not set to "unary".
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpTextVectorizer$new(id = "textvectorizer", param_vals = list())
id :: character(1)
Identifier of resulting object, default "textvectorizer".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected features converted to a bag-of-words
representation.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
colmodels :: named list
Named list with one entry per extracted column. Each entry has two further elements:
tdm: sparse document-feature matrix resulting from quanteda::dfm()
docfreq: (weighted) document frequency resulting from quanteda::docfreq()
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
return_type :: character(1)
Whether to return an integer representation ("integer-sequence") or a Bag-of-words ("bow").
If set to "integer_sequence", tokens are replaced by an integer and padded/truncated to sequence_length.
If set to "factor_sequence", tokens are replaced by a factor and padded/truncated to sequence_length.
If set to "bow", a possibly weighted bag-of-words matrix is returned.
Defaults to bow.
stopwords_language :: character(1)
Language to use for stopword filtering. Needs to be either "none", a language identifier listed in
stopwords::stopwords_getlanguages("snowball") ("de", "en", ...) or "smart".
"none" disables language-specific stopwords.
"smart" coresponds to stopwords::stopwords(source = "smart"), which
contains English stopwords and also removes one-character strings. Initialized to "smart".
extra_stopwords :: character
Extra stopwords to remove. Must be a character vector containing individual tokens to remove.
When n is set to values greater than 1, this can also contain stop-ngrams.
Initialized to character(0).
tolower :: logical(1)
Whether to convert to lower case. See quanteda::dfm. Default is TRUE.
stem :: logical(1)
Whether to perform stemming. See quanteda::dfm. Default is FALSE.
what :: character(1)
Tokenization splitter. See quanteda::tokens. Default is "word".
remove_punct :: logical(1)
See quanteda::tokens. Default is FALSE.
remove_url :: logical(1)
See quanteda::tokens. Default is FALSE.
remove_symbols :: logical(1)
See quanteda::tokens. Default is FALSE.
remove_numbers :: logical(1)
See quanteda::tokens. Default is FALSE.
remove_separators :: logical(1)
See quanteda::tokens. Default is TRUE.
split_hypens :: logical(1)
See quanteda::tokens. Default is FALSE.
n :: integer
Vector of ngram lengths. See quanteda::tokens_ngrams. Initialized to 1, deviating from the base function's default.
Note that this can be a vector of multiple values, to construct ngrams of multiple orders.
skip :: integer
Vector of skips. See quanteda::tokens_ngrams. Default is 0. Note that this can be a vector of multiple values.
sparsity :: numeric(1)
Desired sparsity of the 'tfm' matrix. See quanteda::dfm_trim. Default is NULL.
max_termfreq :: numeric(1)
Maximum term frequency in the 'tfm' matrix. See quanteda::dfm_trim. Default is NULL.
min_termfreq :: numeric(1)
Minimum term frequency in the 'tfm' matrix. See quanteda::dfm_trim. Default is NULL.
termfreq_type :: character(1)
How to asess term frequency. See quanteda::dfm_trim. Default is "count".
scheme_df :: character(1)
Weighting scheme for document frequency: See quanteda::docfreq. Initialized to "unary" (1 for each document, deviating from base function default).
smoothing_df :: numeric(1)
See quanteda::docfreq. Default is 0.
k_df :: numeric(1)k parameter given to quanteda::docfreq (see there).
Default is 0.
threshold_df :: numeric(1)
See quanteda::docfreq. Default is 0. Only considered if scheme_df is set to "count".
base_df :: numeric(1)
The base for logarithms in quanteda::docfreq (see there). Default is 10.
scheme_tf :: character(1)
Weighting scheme for term frequency: See quanteda::dfm_weight. Default is "count".
k_tf :: numeric(1)k parameter given to quanteda::dfm_weight (see there).
Default is 0.5.
base_df :: numeric(1)
The base for logarithms in quanteda::dfm_weight (see there). Default is 10.
sequence_length :: integer(1)
The length of the integer sequence. Defaults to Inf, i.e. all texts are padded to the length
of the longest text. Only relevant for return_type is set to "integer_sequence".
See Description. Internally uses the quanteda package. Calls quanteda::tokens, quanteda::tokens_ngrams and quanteda::dfm. During training,
quanteda::dfm_trim is also called. Tokens not seen during training are dropped during prediction.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") library("data.table") # create some text data dt = data.table( txt = replicate(150, paste0(sample(letters, 3), collapse = " ")) ) task = tsk("iris")$cbind(dt) pos = po("textvectorizer", param_vals = list(stopwords_language = "en")) pos$train(list(task))[[1]]$data() one_line_of_iris = task$filter(13) one_line_of_iris$data() pos$predict(list(one_line_of_iris))[[1]]$data()library("mlr3") library("data.table") # create some text data dt = data.table( txt = replicate(150, paste0(sample(letters, 3), collapse = " ")) ) task = tsk("iris")$cbind(dt) pos = po("textvectorizer", param_vals = list(stopwords_language = "en")) pos$train(list(task))[[1]]$data() one_line_of_iris = task$filter(13) one_line_of_iris$data() pos$predict(list(one_line_of_iris))[[1]]$data()
Change the threshold of a Prediction during the predict step.
The incoming Learner's $predict_type needs to be "prob".
Internally calls PredictionClassif$set_threshold.
R6Class inheriting from PipeOp.
PipeOpThreshold$new(id = "threshold", param_vals = list())
id :: character(1)
Identifier of the resulting object, default "threshold".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction.
Defaults to numeric(0).
During training, the input and output are NULL.
A PredictionClassif is required as input and returned as output during prediction.
The $state is left empty (list()).
thresholds :: numeric
A numeric vector of thresholds for the different class levels.
May have length 1 for binary classification predictions, must
otherwise have length of the number of target classes; see
PredictionClassif's $set_threshold() method.
Initialized to 0.5, i.e. thresholding for binary classification
at level 0.5.
Fields inherited from PipeOp, as well as:
predict_type :: character(1)
Type of prediction to return. Either "prob" (default) or "response".
Setting to "response" should rarely be used; it may potentially save some memory but has
no other benefits.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") t = tsk("german_credit") gr = po(lrn("classif.rpart", predict_type = "prob")) %>>% po("threshold", param_vals = list(thresholds = 0.9)) gr$train(t) gr$predict(t)library("mlr3") t = tsk("german_credit") gr = po(lrn("classif.rpart", predict_type = "prob")) %>>% po("threshold", param_vals = list(thresholds = 0.9)) gr$train(t) gr$predict(t)
Generates a cleaner data set by removing all majority-minority Tomek links.
The algorithm down-samples the data by removing all pairs of observations that form a Tomek link, i.e. a pair of observations that are nearest neighbors and belong to different classes. For this only numeric and integer features are taken into account. These must have no missing values.
This can only be applied to classification tasks. Multiclass classification is supported.
See themis::tomek for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpTomek$new(id = "tomek", param_vals = list())
id :: character(1)
Identifier of resulting object, default "tomek".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with removed rows for pairs of observations that form a Tomek link.
The output during prediction is the unchanged input.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
Tomek I (1976). “Two Modifications of CNN.” IEEE Transactions on Systems, Man and Cybernetics, 6(11), 769–772. doi:10.1109/TSMC.1976.4309452.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") # Create example task task = tsk("iris") task$head() table(task$data(cols = "Species")) # Down-sample data pop = po("tomek") tomek_result = pop$train(list(task))[[1]]$data() nrow(tomek_result) table(tomek_result$Species)library("mlr3") # Create example task task = tsk("iris") task$head() table(task$data(cols = "Species")) # Down-sample data pop = po("tomek") tomek_result = pop$train(list(task))[[1]]$data() nrow(tomek_result) table(tomek_result$Species)
Tunes optimal probability thresholds over different PredictionClassifs.
mlr3::Learner predict_type: "prob" is required.
Thresholds for each learner are optimized using the Optimizer supplied via
the param_set.
Defaults to GenSA.
Returns a single PredictionClassif.
This PipeOp should be used in conjunction with PipeOpLearnerCV in order to
optimize thresholds of cross-validated predictions.
In order to optimize thresholds without cross-validation, use PipeOpLearnerCV
in conjunction with ResamplingInsample.
R6Class object inheriting from PipeOp.
PipeOpTuneThreshold$new(id = "tunethreshold", param_vals = list())
id :: character(1)
Identifier of resulting object. Default: "tunethreshold".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings
that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOp.
The $state is a named list with elements
thresholds :: numeric
Learned thresholds;
The parameters are the parameters inherited from PipeOp, as well as:
measure :: Measure | characterMeasure to optimize for.
Will be converted to a Measure in case it is character.
Initialized to "classif.ce", i.e. misclassification error.
optimizer :: Optimizer|character(1)Optimizer used to find optimal thresholds.
If character, converts to Optimizer
via opt. Initialized to OptimizerGenSA.
log_level :: character(1) | integer(1)
Set a temporary log-level for lgr::get_logger("mlr3/bbotk"). Initialized to: "warn".
Uses the optimizer provided as a param_val in order to find an optimal threshold.
See the optimizer parameter for more info.
Fields inherited from PipeOp, as well as:
predict_type :: character(1)
Type of prediction to return. Either "prob" (default) or "response".
Setting to "response" should rarely be used; it may potentially save some memory but has
no other benefits.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
library("mlr3") task = tsk("iris") pop = po("learner_cv", lrn("classif.rpart", predict_type = "prob")) %>>% po("tunethreshold") task$data() pop$train(task) pop$statelibrary("mlr3") task = tsk("iris") pop = po("learner_cv", lrn("classif.rpart", predict_type = "prob")) %>>% po("tunethreshold") task$data() pop$train(task) pop$state
Used to bring together different paths created by PipeOpBranch.
R6Class object inheriting from PipeOp.
PipeOpUnbranch$new(options, id = "unbranch", param_vals = list())
options :: numeric(1) | character
If options is 0, a vararg input channel is created that can take
any number of inputs.
If options is a nonzero integer number, it determines the number of
input channels / options that are created, named input1...input<n>. The
If options is a character, it determines the names of channels directly.
The difference between these three is purely cosmetic if the user chooses
to produce channel names matching with the corresponding PipeOpBranch.
However, it is not necessary to have matching names and the vararg option
is always viable.
id :: character(1)
Identifier of resulting object, default "unbranch".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
PipeOpUnbranch has multiple input channels depending on the options construction argument, named "input1", "input2", ...
if options is a nonzero integer and named after each options value if options is a character; if options is 0, there is only one
vararg input channel named "...".
All input channels take any argument ("*") both during training and prediction.
PipeOpUnbranch has one output channel named "output", producing the only NO_OP object received as input ("*"),
both during training and prediction.
The $state is left empty (list()).
PipeOpUnbranch has no parameters.
See PipeOpBranch Internals on how alternative path branching works.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Path Branching:
NO_OP,
filter_noop(),
is_noop(),
mlr_pipeops_branch
# See PipeOpBranch for a complete branching example pou = po("unbranch") pou$train(list(NO_OP, NO_OP, "hello", NO_OP, NO_OP))# See PipeOpBranch for a complete branching example pou = po("unbranch") pou$train(list(NO_OP, NO_OP, "hello", NO_OP, NO_OP))
EXPERIMENTAL, API SUBJECT TO CHANGE
Handles target transformation operations that do not need explicit inversion.
In case the new target is required during predict, creates a vector of NA.
Works similar to PipeOpTargetTrafo and PipeOpTargetMutate, but forgoes the
inversion step.
In case target after the trafo is a factor, levels are saved to $state.
During prediction: Sets all target values to NA before calling the trafo again.
In case target after the trafo is a factor, levels saved in the state are
set during prediction.
As a special case when trafo is identity and new_target_name matches an existing column
name of the data of the input Task, this column is set as the new target. Depending on
drop_original_target the original target is then either dropped or added to the features.
Abstract R6Class inheriting from PipeOp.
PipeOpUpdateTarget$new(id, param_set = ps(), param_vals = list(), packages = character(0))
id :: character(1)
Identifier of resulting object. See $id slot of PipeOp.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set.
The subclass should have its own param_vals parameter and pass it on to super$initialize().
Default list().
The parameters are the parameters inherited from PipeOpTargetTrafo, as well as:
trafo :: function
Transformation function for the target. Should only be a function of the target, i.e., taking a
single argument. Default is identity.
Note, that the data passed on to the target is a data.table consisting of all target column.
new_target_name :: character(1)
Optionally give the transformed target a new name. By default the original name is used.
new_task_type :: character(1)
Optionally a new task type can be set. Legal types are listed in
mlr_reflections$task_types$type.
#' drop_original_target :: logical(1)
Whether to drop the original target column. Default: TRUE.
The $state is a list of class levels for each target after trafo.
list() if none of the targets have levels.
Only fields inherited from PipeOp.
Only methods inherited from PipeOp.
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
## Not run: # Create a binary class task from iris library(mlr3) trafo_fun = function(x) {factor(ifelse(x$Species == "setosa", "setosa", "other"))} po = PipeOpUpdateTarget$new(param_vals = list(trafo = trafo_fun, new_target_name = "setosa")) po$train(list(tsk("iris"))) po$predict(list(tsk("iris"))) ## End(Not run)## Not run: # Create a binary class task from iris library(mlr3) trafo_fun = function(x) {factor(ifelse(x$Species == "setosa", "setosa", "other"))} po = PipeOpUpdateTarget$new(param_vals = list(trafo = trafo_fun, new_target_name = "setosa")) po$train(list(tsk("iris"))) po$predict(list(tsk("iris"))) ## End(Not run)
Provides an interface to the vtreat package.
PipeOpVtreat naturally works for classification tasks and regression tasks.
Internally, PipeOpVtreat follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
vtreat::NumericOutcomeTreatment(), vtreat::BinomialOutcomeTreatment(), or vtreat::MultinomialOutcomeTreatment(), followed by calling
vtreat::fit_prepare() on the training data and vtreat::prepare() during predicton.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpVtreat$new(id = "vtreat", param_vals = list())
id :: character(1)
Identifier of resulting object, default "vtreat".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a
TaskSupervised is used as input and output during training and prediction.
The output is the input Task with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input Task is returned unaltered.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
treatment_plan :: object of class vtreat_pipe_step | NULL
The treatment plan as constructed by vtreat based on the training data, i.e., an object of class treatment_plan.
If vtreat found "no usable vars" and designing the treatment would have failed, this is NULL.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
recommended :: logical(1)
Whether only the "recommended" prepared features should be returned, i.e., non constant
variables with a significance value smaller than vtreat's threshold. Initialized to TRUE.
cols_to_copy :: function | Selector Selector function, takes a Task as argument and returns a character() of features to copy.
See Selector for example functions. Initialized to selector_none().
minFraction :: numeric(1)
Minimum frequency a categorical level must have to be converted to an indicator column.
smFactor :: numeric(1)
Smoothing factor for impact coding models.
rareCount :: integer(1)
Allow levels with this count or below to be pooled into a shared rare-level.
rareSig :: numeric(1)
Suppress levels from pooling at this significance value greater.
collarProb :: numeric(1)
What fraction of the data (pseudo-probability) to collar data at if doCollar = TRUE.
doCollar :: logical(1)
If TRUE collar numeric variables by cutting off after a tail-probability specified by collarProb during treatment design.
codeRestriction :: character()
What types of variables to produce.
customCoders :: named list
Map from code names to custom categorical variable encoding functions.
splitFunction :: function
Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split.
ncross :: integer(1)
Integer larger than one, number of cross-validation rounds to design.
forceSplit :: logical(1)
If TRUE force cross-validated significance calculations on all variables.
catScaling :: logical(1)
If TRUE use stats::glm() linkspace, if FALSE use stats::lm() for scaling.
verbose :: logical(1)
If TRUE print progress.
use_parallel :: logical(1)
If TRUE use parallel methods.
missingness_imputation :: function
Function of signature f(values: numeric, weights: numeric), simple missing value imputer.
Typically, an imputation via a PipeOp should be preferred, see PipeOpImpute.
pruneSig :: numeric(1)
Suppress variables with significance above this level.
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
scale :: logical(1)
If TRUE replace numeric variables with single variable model regressions ("move to outcome-scale").
These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome.
varRestriction :: list()
List of treated variable names to restrict to.
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
trackedValues :: named list()
Named list mapping variables to know values, allows warnings upon novel level appearances (see vtreat::track_values()).
Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.
y_dependent_treatments :: character()
Character what treatment types to build per-outcome level.
Only effects multiclass classification tasks.
imputation_map :: named list
List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.
Typically, an imputation via a PipeOp is to be preferred, see PipeOpImpute.
For more information, see vtreat::regression_parameters(), vtreat::classification_parameters(), or vtreat::multinomial_parameters().
Follows vtreat's fit/prepare interface. See vtreat::NumericOutcomeTreatment(), vtreat::BinomialOutcomeTreatment(),
vtreat::MultinomialOutcomeTreatment(), vtreat::fit_prepare() and vtreat::prepare().
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_yeojohnson
library("mlr3") set.seed(2020) make_data <- function(nrows) { d <- data.frame(x = 5 * rnorm(nrows)) d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows) d[4:10, "x"] = NA # introduce NAs d["xc"] = paste0("level_", 5 * round(d$y / 5, 1)) d["x2"] = rnorm(nrows) d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level return(d) } task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y") pop = PipeOpVtreat$new() pop$train(list(task))library("mlr3") set.seed(2020) make_data <- function(nrows) { d <- data.frame(x = 5 * rnorm(nrows)) d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows) d[4:10, "x"] = NA # introduce NAs d["xc"] = paste0("level_", 5 * round(d$y / 5, 1)) d["x2"] = rnorm(nrows) d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level return(d) } task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y") pop = PipeOpVtreat$new() pop$train(list(task))
Conducts a Yeo-Johnson transformation on numeric features. It therefore estimates
the optimal value of lambda for the transformation.
See bestNormalize::yeojohnson() for details.
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpYeoJohnson$new(id = "yeojohnson", param_vals = list())
id :: character(1)
Identifier of resulting object, default "yeojohnson".
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric features replaced by their transformed versions.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc,
as well as a list of class yeojohnson for each column, which is transformed.
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
eps :: numeric(1)
Tolerance parameter to identify the lambda parameter as zero.
For details see yeojohnson().
standardize :: logical
Whether to center and scale the transformed values to attempt a standard
normal distribution. For details see yeojohnson().
lower :: numeric(1)
Lower value for estimation of lambda parameter.
For details see yeojohnson().
upper :: numeric(1)
Upper value for estimation of lambda parameter.
For details see yeojohnson().
Uses the bestNormalize::yeojohnson function.
Only fields inherited from PipeOp.
Only methods inherited from PipeOpTaskPreproc/PipeOp.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat
library("mlr3") task = tsk("iris") pop = po("yeojohnson") task$data() pop$train(list(task))[[1]]$data() pop$statelibrary("mlr3") task = tsk("iris") pop = po("yeojohnson") task$data() pop$train(list(task))[[1]]$data() pop$state
Housing Data for 506 Census Tracts of Boston
R6Class object inheriting from TaskRegr.
The BostonHousing2 dataset
containing the corrected data from III AMF (1979).
“The Hedonic Price Approach to Measuring Demand for Neighborhood Characteristics.”
In The Economics of Neighborhood, 191–217.
Elsevier.
doi:10.1016/B978-0-12-636250-3.50015-5.
as provided by the mlbench package. See data description there.
A Multiplicity class S3 object.
The function of multiplicities is to indicate that PipeOps should be executed
multiple times with multiple values.
A Multiplicity is a container, like a
list(), that contains multiple values. If the message that is passed along the
edge of a Graph is a Multiplicity-object, then the PipeOp that receives
this object will usually be called once for each contained value. The result of
each of these calls is then, again, packed in a Multiplicity and sent along the
outgoing edge(s) of that PipeOp. This means that a Multiplicity can cause
multiple PipeOps in a row to be run multiple times, where the run for each element
of the Multiplicity is independent from the others.
Most PipeOps only return a Multiplicity if their input was a Multiplicity
(and after having run their code multiple times, once for each entry). However,
there are a few special PipeOps that are "aware" of Multiplicity objects. These
may either create a Multiplicity even though not having a Multiplicity input
(e.g. PipeOpReplicate or PipeOpOVRSplit) – causing the subsequent PipeOps
to be run multiple times – or collect a Multiplicity, being called only once
even though their input is a Multiplicity (e.g. PipeOpOVRUnite or PipeOpFeatureUnion
if constructed with the collect_multiplicity argument set to TRUE). The combination
of these mechanisms makes it possible for parts of a Graph to be called variably
many times if "sandwiched" between Multiplicity creating and collecting PipeOps.
Whether a PipeOp creates or collects a Multiplicity is indicated by the $input
or $output slot (which indicate names and types of in/out channels). If the train and
predict types of an input or output are surrounded by square brackets ("[", "]"), then
this channel handles a Multiplicity explicitly. Depending on the function of the PipeOp,
it will usually collect (input channel) or create (output channel) a Multiplicity.
PipeOps without this indicator are Multiplicity agnostic and blindly execute their
function multiple times when given a Multiplicity.
If a PipeOp is trained on a Multiplicity, the $state slot is set to a Multiplicity
as well; this Multiplicity contains the "original" $state resulting from each individual
call of the PipeOp with the input Multiplicity's content. If a PipeOp was trained
with a Multiplicity, then the predict() argument must be a Multiplicity with the same
number of elements.
Multiplicity(...)Multiplicity(...)
... |
|
Other Special Graph Messages:
NO_OP
Other Experimental Features:
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_replicate
Other Multiplicity PipeOps:
PipeOpEnsemble,
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Special data type for no-ops. Distinct from NULL for easier debugging
and distinction from unintentional NULL returns.
NO_OPNO_OP
R6 object.
Other Path Branching:
filter_noop(),
is_noop(),
mlr_pipeops_branch,
mlr_pipeops_unbranch
Other Special Graph Messages:
Multiplicity()
A PipeOp represents a transformation of a given "input" into a given "output", with two stages: "training"
and "prediction". It can be understood as a generalized function that not only has multiple inputs, but
also multiple outputs (as well as two stages). The "training" stage is used when training a machine learning pipeline or
fitting a statistical model, and the "predicting" stage is then used for making predictions on new data.
To perform training, the $train() function is called which takes inputs and transforms them, while simultaneously storing information
in its $state slot. For prediction, the $predict() function is called, where the $state information can be used to influence the transformation
of the new data.
A PipeOp is usually used in a Graph object, a representation of a computational graph. It can have
multiple input channels—think of these as multiple arguments to a function, for example when averaging
different models—, and multiple output channels—a transformation may
return different objects, for example different subsets of a Task. The purpose of the Graph is to
connect different outputs of some PipeOps to inputs of other PipeOps.
Input and output channel information of a PipeOp is defined in the $input and $output slots; each channel has a name, a required
type during training, and a required type during prediction. The $train() and $predict() functions are called with a list argument
that has one entry for each declared channel (with one exception, see next paragraph). The list is automatically type-checked
for each channel against $input and then passed on to the private$.train() or private$.predict() functions. There the data is processed and
a result list is created. This list is again type-checked for declared output types of each channel. The length and types of the result
list is as declared in $output.
A special input channel name is "...", which creates a vararg channel that takes arbitrarily many arguments, all of the same type. If the $input
table contains an "..."-entry, then the input given to $train() and $predict() may be longer than the number of declared input channels.
This class is an abstract base class that all PipeOps being used in a Graph should inherit from, and
is not intended to be instantiated.
Abstract R6Class.
PipeOp$new(id, param_set = ps(), param_vals = list(), input, output, packages = character(0), tags = character(0))
id :: character(1)
Identifier of resulting object. See $id slot.
param_set :: ParamSet | list of expression
Parameter space description. This should be created by the subclass and given to super$initialize().
If this is a ParamSet, it is used as the PipeOp's ParamSet
directly. Otherwise it must be a list of expressions e.g. created by alist() that evaluate to ParamSets.
These ParamSet are combined using a ParamSetCollection.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set. The
subclass should have its own param_vals parameter and pass it on to super$initialize(). Default list().
input :: data.table with columns name (character), train (character), predict (character)
Sets the $input slot of the resulting object; see description there.
output :: data.table with columns name (character), train (character), predict (character)
Sets the $output slot of the resulting object; see description there.
packages :: character
Set of all required packages for the PipeOp's $train and $predict methods. See $packages slot.
Default is character(0).
tags ::character
A set of tags associated with the PipeOp. Tags describe a PipeOp's purpose.
Can be used to filter as.data.table(mlr_pipeops). Default is "abstract", indicating an abstract PipeOp.
PipeOp is an abstract class with abstract functions private$.train() and private$.predict(). To create a functional
PipeOp class, these two methods must be implemented. Each of these functions receives a named list according to
the PipeOp's input channels, and must return a list (names are ignored) with values in the order of output
channels in $output. The private$.train() and private$.predict() function should not be called by the user;
instead, a $train() and $predict() should be used. The most convenient usage is to add the PipeOp
to a Graph (possibly as singleton in that Graph), and using the Graph's $train() / $predict() methods.
private$.train() and private$.predict() should treat their inputs as read-only. If they are R6 objects,
they should be cloned before being manipulated in-place. Objects, or parts of objects, that are not changed, do
not need to be cloned, and it is legal to return the same identical-by-reference objects to multiple outputs.
id :: character
ID of the PipeOp. IDs are user-configurable, and IDs of PipeOps must be unique within a Graph. IDs of
PipeOps must not be changed once they are part of a Graph, instead the Graph's $set_names() method
should be used.
packages :: character
Packages required for the PipeOp. Functions that are not in base R should still be called using ::
(or explicitly attached using require()) in private$.train() and private$.predict(), but
packages declared here are checked before any (possibly expensive) processing has started within a Graph.
param_set :: ParamSet
Parameters and parameter constraints. Parameter values that influence the functioning of $train and / or $predict are
in the $param_set$values slot; these are automatically checked against parameter constraints in $param_set.
state :: any | NULL
Method-dependent state obtained during training step, and usually required for the prediction step. This is NULL
if and only if the PipeOp has not been trained. The $state is the only slot that can be reliably modified during
$train(), because private$.train() may theoretically be executed in a different R-session (e.g. for parallelization).
$state should furthermore always be set to something with copy-semantics, since it is never cloned. This is a limitation
not of PipeOp or mlr3pipelines, but of the way the system as a whole works, together with GraphLearner and mlr3.
input :: data.table with columns name (character), train (character), predict (character)
Input channels of PipeOp. Column name gives the names (and order) of values in the list given to
$train() and $predict(). Column train is the (S3) class that an input object must conform to during
training, column predict is the (S3) class that an input object must conform to during prediction. Types
are checked by the PipeOp itself and do not need to be checked by private$.train() / private$.predict() code.
A special name is "...", which creates a vararg input channel that accepts a variable number of inputs.
If a row has both train and predict values enclosed by square brackets ("[", "]"), then this channel is
Multiplicity-aware. If the PipeOp receives a Multiplicity value on these channels, this Multiplicity
is given to the .train() and .predict() functions directly. Otherwise, the Multiplicity is transparently
unpacked and the .train() and .predict() functions are called multiple times, once for each Multiplicity element.
The type enclosed by square brackets indicates that only a Multiplicity containing values of this type are accepted.
See Multiplicity for more information.
output :: data.table with columns name (character), train (character), predict (character)
Output channels of PipeOp, in the order in which they will be given in the list returned by $train and
$predict functions. Column train is the (S3) class that an output object must conform to during training,
column predict is the (S3) class that an output object must conform to during prediction. The PipeOp checks
values returned by private$.train() and private$.predict() against these types specifications.
If a row has both train and predict values enclosed by square brackets ("[", "]"), then this signals that the channel
emits a Multiplicity of the indicated type. See Multiplicity for more information.
innum :: numeric(1)
Number of input channels. This equals nrow($input).
outnum :: numeric(1)
Number of output channels. This equals nrow($output).
is_trained :: logical(1)
Indicate whether the PipeOp was already trained and can therefore be used for prediction.
tags ::character
A set of tags associated with the PipeOp. Tags describe a PipeOp's purpose.
Can be used to filter as.data.table(mlr_pipeops).
PipeOp tags are inherited and child classes can introduce additional tags.
hash :: character(1)
Checksum calculated on the PipeOp, depending on the PipeOp's class and the slots $id and $param_set$values. If a
PipeOp's functionality may change depending on more than these values, it should inherit the $hash active
binding and calculate the hash as digest(list(super$hash, <OTHER THINGS>), algo = "xxhash64").
phash :: character(1)
Checksum calculated on the PipeOp, depending on the PipeOp's class and the slots $id but ignoring $param_set$values. If a
PipeOp's functionality may change depending on more than these values, it should inherit the $hash active
binding and calculate the hash as digest(list(super$hash, <OTHER THINGS>), algo = "xxhash64").
.result :: list
If the Graph's $keep_results flag is set to TRUE, then the intermediate Results of $train() and $predict()
are saved to this slot, exactly as they are returned by these functions. This is mainly for debugging purposes
and done, if requested, by the Graph backend itself; it should not be done explicitly by private$.train() or private$.predict().
man :: character(1)
Identifying string of the help page that shows with help().
label :: character(1)
Description of the PipeOp's functionality. Derived from the title of its help page.
properties :: character()
The properties of the PipeOp.
Currently supported values are:
"validation": the PipeOp can make use of the $internal_valid_task of an mlr3::Task.
This is for example used for PipeOpLearners that wrap a Learner with this property, see mlr3::Learner.
PipeOps that have this property, also have a $validate field, which controls whether to use the validation task,
as well as a $internal_valid_scores field, which allows to access the internal validation scores after training.
"internal_tuning": the PipeOp is able to internally optimize hyperparameters.
This works analogously to the internal tuning implementation for mlr3::Learner.
PipeOps with that property also implement the standardized accessor $internal_tuned_values and have at least one
parameter tagged with "internal_tuning".
An example for such a PipeOp is a PipeOpLearner that wraps a Learner with the "internal_tuning" property.
Programatic access to all available properties is possible via mlr_reflections$pipeops$properties.
print()
() -> NULL
Prints the PipeOps most salient information: $id, $is_trained, $param_set$values, $input and $output.
help(help_type)
(character(1)) -> help file
Displays the help file of the concrete PipeOp instance. help_type is one of "text", "html", "pdf" and behaves
as the help_type argument of R's help().
The following public $train() and $predict() methods are the primary user-facing functions intended for direct use:
train(input)
(list) -> named list
Train PipeOp on inputs, transform it to output and store the learned $state. If the PipeOp is already
trained, already present $state is overwritten. Input list is typechecked against the $input train column.
Return value is a list with as many entries as $output has rows, with each entry named after the $output name
column and class according to the $output train column.
The workhorse function for training each PipeOp is the private$.train() function.
predict(input)
(list) -> named list
Predict on new data in input, possibly using the stored $state. Input and output are specified by $input and $output
in the same way as for $train(), except that the predict column is used for type checking.
The workhorse function for predicting in each PipeOp is the private$.predict() function.
To implement a PipeOp the following abstract private functions should be overloaded in the inheriting PipeOp.
Note that these should not be called by a user; instead the public $train() and $predict() method should be used.
.train(input)
(named list) -> list
Abstract function that must be implemented by concrete subclasses. private$.train() is called by $train() after
typechecking. It must change the $state value to something non-NULL and return a list of transformed data
according to the $output train column. Names of the returned list are ignored.
.predict(input)
(named list) -> list
Abstract function that must be implemented by concrete subclasses. private$.predict() is called by $predict()
after typechecking and works analogously to private$.train(). Unlike private$.train(), private$.predict()
should not modify the PipeOp in any way.
To create your own PipeOp, you need to overload the private$.train() and private$.predict() functions.
It is most likely also necessary to overload the $initialize() function to do additional initialization.
The $initialize() method should have at least the arguments id and param_vals, which should be passed on to super$initialize() unchanged.
id should have a useful default value, and param_vals should have the default value list(), meaning no initialization of hyperparameters.
If the $initialize() method has more arguments, then it is necessary to also overload the private$.additional_phash_input() function.
This function should return either all objects, or a hash of all objects, that can change the function or behavior of the PipeOp and are independent
of the class, the id, the $state, and the $param_set$values. The last point is particularly important: changing the $param_set$values should
not change the return value of private$.additional_phash_input().
When you are implementing a PipeOp that operates a task (and is not a PipeOpTaskPreproc), you also need to handle the
$internal_valid_task field of the input task, if there is one.
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
# example (bogus) PipeOp that returns the sum of two numbers during $train() # as well as a letter of the alphabet corresponding to that sum during $predict(). PipeOpSumLetter = R6::R6Class("sumletter", inherit = PipeOp, # inherit from PipeOp public = list( initialize = function(id = "posum", param_vals = list()) { super$initialize(id, param_vals = param_vals, # declare "input" and "output" during construction here # training takes two 'numeric' and returns a 'numeric'; # prediction takes 'NULL' and returns a 'character'. input = data.table::data.table(name = c("input1", "input2"), train = "numeric", predict = "NULL"), output = data.table::data.table(name = "output", train = "numeric", predict = "character") ) } ), private = list( # PipeOp deriving classes must implement .train and # .predict; each taking an input list and returning # a list as output. .train = function(input) { sum = input[[1]] + input[[2]] self$state = sum list(sum) }, .predict = function(input) { list(letters[self$state]) } ) ) posum = PipeOpSumLetter$new() print(posum) posum$train(list(1, 2)) # note the name 'output' is the name of the output channel specified # in the $output data.table. posum$predict(list(NULL, NULL))# example (bogus) PipeOp that returns the sum of two numbers during $train() # as well as a letter of the alphabet corresponding to that sum during $predict(). PipeOpSumLetter = R6::R6Class("sumletter", inherit = PipeOp, # inherit from PipeOp public = list( initialize = function(id = "posum", param_vals = list()) { super$initialize(id, param_vals = param_vals, # declare "input" and "output" during construction here # training takes two 'numeric' and returns a 'numeric'; # prediction takes 'NULL' and returns a 'character'. input = data.table::data.table(name = c("input1", "input2"), train = "numeric", predict = "NULL"), output = data.table::data.table(name = "output", train = "numeric", predict = "character") ) } ), private = list( # PipeOp deriving classes must implement .train and # .predict; each taking an input list and returning # a list as output. .train = function(input) { sum = input[[1]] + input[[2]] self$state = sum list(sum) }, .predict = function(input) { list(letters[self$state]) } ) ) posum = PipeOpSumLetter$new() print(posum) posum$train(list(1, 2)) # note the name 'output' is the name of the output channel specified # in the $output data.table. posum$predict(list(NULL, NULL))
Abstract base class for piecewise linear encoding.
Piecewise linear encoding works by splitting values of features into distinct bins, through an algorithm implemented
in private$.get_bins(), and then creating new feature columns through a continuous alternative to one-hot encoding.
Here, one new feature per bin is constructed, with values being either
0, if the original value was below the lower bin boundary,
1, if the original value was above or equal to the upper bin boundary, or
a scaled value between 0 and 1, if the original value was inside the bin boundaries. Scaling is done by
offsetting the original value by the lower bin boundary and dividing by the bin width.
PipeOps inheriting from this encode columns of type numeric and integer. Use the PipeOpTaskPreproc
$affect_columns functionality to only encode a subset of columns, or only encode columns of a certain type, etc.
Abstract R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.
PipeOpEncodePL$new(id = "encodepl", param_set = ps(), param_vals = list(), packages = character(0), task_type = "Task")
id :: character(1)
Identifier of resulting object. See $id slot of PipeOp.
param_set :: ParamSet
Parameter space description. This should be created by the subclass and given to super$initialize().
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set. The
subclass should have its own param_vals parameter and pass it on to super$initialize(). Default list().
packages :: character
Set of all required packages for the PipeOp's private$.train() and private$.predict() methods. See $packages slot.
Default is character(0).
task_type :: character(1)
The class of Task that should be accepted as input and will be returned as output. This
should generally be a character(1) identifying a type of Task, e.g. "Task", "TaskClassif" or
"TaskRegr" (or another subclass introduced by other packages). Default is "Task".
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected numeric and integer columns encoded using piecewise linear encoding.
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
bins :: named list
Named list of numeric vectors. Each element corresponds to and is named after one of the affected feature columns
and contains the bin boundaries derived through private$.get_bins().
The parameters are the parameters inherited from PipeOpTaskPreproc.
PipeOpEncodePL is an abstract class inheriting from PipeOpTaskPreprocSimple that allows easier implementation
of different binning algorithms for piecewise linear encoding. The respective binning algorithm should be implemented
as private$.get_bins().
Only fields inherited from PipeOp.
Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp as well as
.get_bins(task, cols)
(Task, character) -> named list
Abstract method for splitting the value range of a feature column into distinct bins. The argument cols should
give the names of the feature columns of the task for which bins should be derived. Returns a named list of
numeric vectors containing the bin boundaries for each affected feature column, named by that corresponding feature
column.
Gorishniy Y, Rubachev I, Babenko A (2022). “On Embeddings for Numerical Features in Tabular Deep Learning.” In Advances in Neural Information Processing Systems, volume 35, 24991–25004. https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Piecewise Linear Encoding PipeOps:
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree
Parent class for PipeOps that aggregate predictions. Implements the private$.train() and private$.predict() methods necessary
for a PipeOp and requires deriving classes to create the private$weighted_avg_predictions() function.
Abstract R6Class inheriting from PipeOp.
Note: This object is typically constructed via a derived class, e.g. PipeOpClassifAvg or PipeOpRegrAvg.
PipeOpEnsemble$new(innum = 0, collect_multiplicity = FALSE, id, param_set = ps(), param_vals = list(), packages = character(0), prediction_type = "Prediction")
innum :: numeric(1)
Determines the number of input channels.
If innum is 0 (default), a vararg input channel is created that can take an arbitrary number of inputs.
collect_multiplicity :: logical(1)
If TRUE, the input is a Multiplicity collecting channel. This means, a
Multiplicity input, instead of multiple normal inputs, is accepted and the members are aggregated. This requires innum to be 0.
Default is FALSE.
id :: character(1)
Identifier of the resulting object.
param_set :: ParamSet
("Hyper"-)Parameters in form of a ParamSet for the resulting PipeOp.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().
packages :: character
Set of packages required for this PipeOp. These packages are loaded during $train() and $predict(), but not attached.
Default character(0).
prediction_type :: character(1)
The predict entry of the $input and $output type specifications.
Should be "Prediction" (default) or one of its subclasses, e.g. "PredictionClassif", and correspond to the type accepted by
private$.train() and private$.predict().
PipeOpEnsemble has multiple input channels depending on the innum construction argument, named "input1", "input2", ...
if innum is nonzero; if innum is 0, there is only one vararg input channel named "...".
All input channels take only NULL during training and take a Prediction during prediction.
PipeOpEnsemble has one output channel named "output", producing NULL during training and a Prediction during prediction.
The output during prediction is in some way a weighted averaged representation of the input.
The $state is left empty (list()).
weights :: numeric
Relative weights of input predictions. If this has length 1, it is ignored and weighs all inputs equally. Otherwise it must have
length equal to the number of connected inputs. Initialized to 1 (equal weights).
The commonality of ensemble methods using PipeOpEnsemble is that they take a NULL-input during training and save an empty $state. They can be
used following a set of PipeOpLearner PipeOps to perform (possibly weighted) prediction averaging. See e.g.
PipeOpClassifAvg and PipeOpRegrAvg which both inherit from this class.
Should it be necessary to use the output of preceding Learners
during the "training" phase, then PipeOpEnsemble should not be used. In fact, if training time behaviour of a Learner is important, then
one should use a PipeOpLearnerCV instead of a PipeOpLearner, and the ensemble can be created with a Learner encapsulated by a PipeOpLearner.
See LearnerClassifAvg and LearnerRegrAvg for examples.
Only fields inherited from PipeOp.
Methods inherited from PipeOp as well as:
weighted_avg_prediction(inputs, weights, row_ids, truth)
(list of Prediction, numeric, integer | character, list) -> NULL
Create Predictions that correspond to the weighted average of incoming Predictions. This is
called by private$.predict() with cleaned and sanity-checked values: inputs are guaranteed to fit together,
row_ids and truth are guaranteed to be the same as each one in inputs, and weights is guaranteed to have the same length as inputs.
This method is abstract, it must be implemented by deriving classes.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Multiplicity PipeOps:
Multiplicity(),
mlr_pipeops_classifavg,
mlr_pipeops_featureunion,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg,
mlr_pipeops_replicate
Other Ensembles:
mlr_learners_avg,
mlr_pipeops_classifavg,
mlr_pipeops_ovrunite,
mlr_pipeops_regravg
Abstract base class for feature imputation.
Abstract R6Class object inheriting from PipeOp.
PipeOpImpute$$new(id, param_set = ps(), param_vals = list(), whole_task_dependent = FALSE, empty_level_control = FALSE, packages = character(0), task_type = "Task")
id :: character(1)
Identifier of resulting object. See $id slot of PipeOp.
param_set :: ParamSet
Parameter space description. This should be created by the subclass and given to super$initialize().
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set. The
subclass should have its own param_vals parameter and pass it on to super$initialize(). Default list().
whole_task_dependent :: logical(1)
Whether the context_columns parameter should be added which lets the user limit the columns that are
used for imputation inference. This should generally be FALSE if imputation depends only on individual features
(e.g. mode imputation), and TRUE if imputation depends on other features as well (e.g. kNN-imputation).
empty_level_control :: logical(1)
Control how to handle edge cases where NAs occur in factor or ordered features only during prediction but not
during training. Can be one of "never", "always", or "param":
If set to "never", no empty level is introduced during training, but columns that have missing values only
during prediction will not be imputed.
If set to "always", an unseen level is added to the feature during training and missing values are imputed as
that value during prediction.
Finally, if set to "param", the hyperparameter create_empty_level is added and control over this behavior is
left to the user.
For implementation details, see Internals below. Default is "never".
packages :: character
Set of all required packages for the PipeOp's private$.train and private$.predict methods. See $packages slot.
Default is character(0).
task_type :: character(1)
The class of Task that should be accepted as input and will be returned as output. This
should generally be a character(1) identifying a type of Task, e.g. "Task", "TaskClassif" or
"TaskRegr" (or another subclass introduced by other packages). Default is "Task".
feature_types :: character
Feature types affected by the PipeOp. See private$.select_cols() for more information.
PipeOpImpute has one input channel named "input", taking a Task, or a subclass of
Task if the task_type construction argument is given as such; both during training and prediction.
PipeOpImpute has one output channel named "output", producing a Task, or a subclass;
the Task type is the same as for input; both during training and prediction.
The output Task is the modified input Task with features imputed according to the private$.impute() function.
The $state is a named list; besides members added by inheriting classes, the members are:
affected_cols :: character
Names of features being selected by the affect_columns parameter.
context_cols :: character
Names of features being selected by the context_columns parameter.
intasklayout :: data.table
Copy of the training Task's $feature_types slot. This is used during prediction to ensure that
the prediction Task has the same features, feature layout, and feature types as during training.
outtasklayout :: data.table
Copy of the trained Task's $feature_types slot. This is used during prediction to ensure that
the Task resulting from the prediction operation has the same features, feature layout, and feature types as after training.
model :: named list
Model used for imputation. This is a list named by Task features, containing the result of the private$.train_imputer() or
private$.train_nullmodel() function for each one.
imputed_train :: character
Names of features that were imputed during training. This is used to ensure that factor levels that were added during training are also added during prediction.
Note that features that are imputed during prediction but not during training will still have inconsistent factor levels.
affect_columns :: function | Selector | NULL
What columns the PipeOpImpute should operate on.
The parameter must be a Selector function, which takes a Task as argument and returns a character
of features to use.
See Selector for example functions. Defaults to NULL, which selects all features.
context_columns :: function | Selector | NULL
What columns the PipeOpImpute imputation may depend on. This parameter is only present if the constructor is called with
the whole_task_dependent argument set to TRUE.
The parameter must be a Selector function, which takes a Task as argument and returns a character
of features to use.
See Selector for example functions. Defaults to NULL, which selects all features.
create_empty_level :: logical(1)
Whether an empty level should always be created for factor or ordered columns during training. If FALSE,
columns that had no NAs during training but have NAs during prediction will not be imputed. This parameter is
only present if the constructor is called with the empty_level_control argument set to "param".
Initialized to FALSE.
PipeOpImpute is an abstract class inheriting from PipeOp that makes implementing imputer PipeOps simple.
Internally, the construction argument empty_level_control and the hyperparameter create_empty_level (should it
exist) modify the private$.create_empty_level field. Behavior then depends on whether this field is set to TRUE
or FALSE and works by controlling for which cases imputation is performed on factor or ordered columns. Its
setting has no impact on columns of other types.
If private$.create_empty_level is set to TRUE, private$.impute() is called for all factor or ordered
columns during training, regardless of whether they have any missing values. For this to lead to the creation of an
empty level for columns with no missing values, inheriting PipeOps must implement private$.train_imputer() in
such a way that it returns the name of the level to be created for the feature types factor and ordered.
If private$.create_empty_level is set to FALSE, private$.impute() is not called during prediction for factor
or ordered columns which were not modified during training. This means that NAs will not be imputed for these
columns.
See PipeOpImputeOOR, for a detailed explanation of why these controls are necessary.
Fields inherited from PipeOp.
Methods inherited from PipeOp, as well as:
.select_cols(task)
(Task) -> character
Selects which columns the PipeOp operates on. In contrast to
the affect_columns parameter. private$.select_cols() is for the inheriting class to determine which columns
the operator should function on, e.g. based on feature type, while affect_columns is a way for the user
to limit the columns that a PipeOpTaskPreproc should operate on.
This method can optionally be overloaded when inheriting PipeOpImpute;
If this method is not overloaded, it defaults to selecting the columns of type indicated by the feature_types construction argument.
.train_imputer(feature, type, context)
(atomic, character(1), data.table) -> any
Abstract function that must be overloaded when inheriting.
Called once for each feature selected by affect_columns to create the model entry to be used for private$.impute(). This function
is only called for features with at least one non-missing value.
.train_nullmodel(feature, type, context)
(atomic, character(1), data.table) -> any
Like .train_imputer(), but only called for each feature that only contains missing values. This is not an abstract function
and, if not overloaded, gives a default response of 0 (integer, numeric), c(TRUE, FALSE) (logical), all available levels (factor/ordered),
or the empty string (character).
.impute(feature, type, model, context)
(atomic, character(1), any, data.table) -> atomic
Imputes the features. model is the model created by private$.train_imputer(). Default behaviour is to assume model is an atomic vector
from which values are sampled to impute missing values of feature. model may have an attribute probabilities for non-uniform sampling.
If model has length zero, feature is returned unchanged.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample
Base class for handling target transformation operations. Target transformations are different
from feature transformation because they have to be "inverted" after prediction. The
target is transformed during the training phase and information to invert this transformation
is sent along to PipeOpTargetInvert which then inverts this transformation during the
prediction phase. This inversion may need info about both the training and the prediction data.
Users can overload up to four private$-functions: .get_state() (optional), .transform() (mandatory),
.train_invert() (optional), and .invert() (mandatory).
Abstract R6Class inheriting from PipeOp.
PipeOpTargetTrafo$new(id, param_set = ps(), param_vals = list(), packages = character(0), task_type_in = "Task", task_type_out = task_type_in, tags = NULL)
id :: character(1)
Identifier of resulting object. See $id slot of PipeOp.
param_set :: ParamSet
Parameter space description. This should be created by the subclass and given to super$initialize().
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set.
The subclass should have its own param_vals parameter and pass it on to super$initialize().
Default list().
task_type_in :: character(1)
The class of Task that should be accepted as input. This should generally be a character(1)
identifying a type of Task, e.g. "Task", "TaskClassif" or "TaskRegr" (or another subclass
introduced by other packages). Default is "Task".
task_type_out :: character(1)
The class of Task that is produced as output. This should generally be a character(1)
identifying a type of Task, e.g. "Task", "TaskClassif" or "TaskRegr" (or another subclass
introduced by other packages). Default is the value of task_type_in.
packages :: character
Set of all required packages for the PipeOp's methods. See $packages slot. Default is
character(0).
tags :: character | NULL
Tags of the resulting PipeOp. This is added to the tag "target transform". Default NULL.
PipeOpTargetTrafo has one input channels named "input" taking a Task (or whatever class
was specified by the task_type during construction) both during training and prediction.
PipeOpTargetTrafo has two output channels named "fun" and "output". During training,
"fun" returns NULL and during prediction, "fun" returns a function that can later be used
to invert the transformation done during training according to the overloaded .train_invert()
and .invert() functions. "output" returns the modified input Task (or task_type)
according to the overloaded transform() function both during training and prediction.
The $state is a named list and should be returned explicitly by the user in the overloaded
.get_state() function.
PipeOpTargetTrafo is an abstract class inheriting from PipeOp. It implements the
private$.train() and private$.predict() functions. These functions perform checks and go on
to call .get_state(), .transform(), .train_invert(). .invert() is packaged and sent along
the "fun" output to be applied to a Prediction by PipeOpTargetInvert.
A subclass of PipeOpTargetTrafo should implement these functions and be used in combination
with PipeOpTargetInvert.
Fields inherited from PipeOp.
Methods inherited from PipeOp, as well as:
.get_state(task)
(Task) -> list
Called by PipeOpTargetTrafo's implementation of private$.train(). Takes a single
Task as input and returns a list to set the $state.
.get_state() will be called a single time during training right before
.transform() is called. The return value (i.e. the $state) should contain info needed in
.transform() as well as in .invert().
The base implementation returns list() and should be overloaded if setting the state is desired.
.transform(task, phase)
(Task, character(1)) -> Task
Called by PipeOpTargetTrafo's implementation of private$.train() and
private$.predict(). Takes a single Task as input and modifies it.
This should typically consist of calculating a new target and modifying the
Task by using the convert_task function. .transform() will be called during training and
prediction because the target (and if needed also type) of the input Task must be transformed
both times. Note that unlike $.train(), the argument is not a list but a singular
Task, and the return object is also not a list but a singular Task.
The phase argument is "train" during training phase and "predict" during prediction phase
and can be used to enable different behaviour during training and prediction. When phase is
"train", the $state slot (as previously set by .get_state()) may also be modified, alternatively
or in addition to overloading .get_state().
The input should not be cloned and if possible should be changed in-place.
This function is abstract and should be overloaded by inheriting classes.
.train_invert(task)
(Task) -> any
Called by PipeOpTargetTrafo's implementation of private$.predict(). Takes a single
Task as input and returns an arbitrary value that will be given as
predict_phase_state to .invert(). This should not modify the input Task.
The base implementation returns a list with a single element, the $truth column of the Task,
and should be overloaded if a more training-phase-dependent state is desired.
.invert(prediction, predict_phase_state)
(Prediction, any) -> Prediction
Takes a Prediction and a predict_phase_state object as input and inverts the prediction.
This function is sent as "fun" to PipeOpTargetInvert.
This function is abstract and should be overloaded by inheriting classes. Care should be
taken that the predict_type of the Prediction being inverted is handled well.
.invert_help(predict_phase_state)
(predict_phase_state object) -> function
Helper function that packages .invert() that can later be used for the inversion.
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Base class for handling most "preprocessing" operations. These
are operations that have exactly one Task input and one Task output,
and expect the column layout of these Tasks during input and output
to be the same.
Prediction-behavior of preprocessing operations should always be independent for each row in the input-Task.
This means that the prediction-operation of preprocessing-PipeOps should commute with rbind(): Running prediction
on an n-row Task should result in the same result as rbind()-ing the prediction-result from n
1-row Tasks with the same content. In the large majority of cases, the number and order of rows
should also not be changed during prediction.
Users must implement private$.train_task() and private$.predict_task(), which have a Task
input and should return that Task. The Task should, if possible, be
manipulated in-place, and should not be cloned.
Alternatively, the private$.train_dt() and private$.predict_dt() functions can be implemented, which operate on
data.table objects instead. This should generally only be done if all
data is in some way altered (e.g. PCA changing all columns to principal components) and not if only
a few columns are added or removed (e.g. feature selection) because this should be done at the Task-level
with private$.train_task(). The private$.select_cols() function can be overloaded for private$.train_dt() and private$.predict_dt()
to operate only on subsets of the Task's data, e.g. only on numerical columns.
If the can_subset_cols argument of the constructor is TRUE (the default), then the hyperparameter affect_columns
is added, which can limit the columns of the Task that is modified by the PipeOpTaskPreproc
using a Selector function. Note this functionality is entirely independent of the private$.select_cols() functionality.
PipeOpTaskPreproc is useful for operations that behave differently during training and prediction. For operations
that perform essentially the same operation and only need to perform extra work to build a $state during training,
the PipeOpTaskPreprocSimple class can be used instead.
Abstract R6Class inheriting from PipeOp.
PipeOpTaskPreproc$new(id, param_set = ps(), param_vals = list(), can_subset_cols = TRUE, packages = character(0), task_type = "Task", tags = NULL, feature_types = mlr_reflections$task_feature_types)
id :: character(1)
Identifier of resulting object. See $id slot of PipeOp.
param_set :: ParamSet
Parameter space description. This should be created by the subclass and given to super$initialize().
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set. The
subclass should have its own param_vals parameter and pass it on to super$initialize(). Default list().
can_subset_cols :: logical(1)
Whether the affect_columns parameter should be added which lets the user limit the columns that are
modified by the PipeOpTaskPreproc. This should generally be FALSE if the operation adds or removes
rows from the Task, and TRUE otherwise. Default is TRUE.
packages :: character
Set of all required packages for the PipeOp's private$.train() and private$.predict() methods. See $packages slot.
Default is character(0).
task_type :: character(1)
The class of Task that should be accepted as input and will be returned as output. This
should generally be a character(1) identifying a type of Task, e.g. "Task", "TaskClassif" or
"TaskRegr" (or another subclass introduced by other packages). Default is "Task".
tags :: character | NULL
Tags of the resulting PipeOp. This is added to the tag "data transform". Default NULL.
feature_types :: character
Feature types affected by the PipeOp. See private$.select_cols() for more information.
Defaults to all available feature types.
PipeOpTaskPreproc has one input channel named "input", taking a Task, or a subclass of
Task if the task_type construction argument is given as such; both during training and prediction.
PipeOpTaskPreproc has one output channel named "output", producing a Task, or a subclass;
the Task type is the same as for input; both during training and prediction.
The output Task is the modified input Task according to the overloaded
private$.train_task()/private$.predict_taks() or private$.train_dt()/private$.predict_dt() functions.
The $state is a named list; besides members added by inheriting classes, the members are:
affect_cols :: character
Names of features being selected by the affect_columns parameter, if present; names of all present features otherwise.
intasklayout :: data.table
Copy of the training Task's $feature_types slot. This is used during prediction to ensure that
the prediction Task has the same features, feature layout, and feature types as during training.
outtasklayout :: data.table
Copy of the trained Task's $feature_types slot. This is used during prediction to ensure that
the Task resulting from the prediction operation has the same features, feature layout, and feature types as after training.
dt_columns :: character
Names of features selected by the private$.select_cols() call during training. This is only present if the private$.train_dt() functionality is used,
and not present if the private$.train_task() function is overloaded instead.
feature_types :: character
Feature types affected by the PipeOp. See private$.select_cols() for more information.
affect_columns :: function | Selector | NULL
What columns the PipeOpTaskPreproc should operate on. This parameter is only present if the constructor is called with
the can_subset_cols argument set to TRUE (the default).
The parameter must be a Selector function, which takes a Task as argument and returns a character
of features to use.
See Selector for example functions. Defaults to NULL, which selects all features.
PipeOpTaskPreproc is an abstract class inheriting from PipeOp. It implements the private$.train() and
$.predict() functions. These functions perform checks and go on to call private$.train_task() and private$.predict_task().
A subclass of PipeOpTaskPreproc may implement these functions, or implement private$.train_dt() and private$.predict_dt() instead.
This works by having the default implementations of private$.train_task() and private$.predict_task() call private$.train_dt() and private$.predict_dt(),
respectively.
The affect_columns functionality works by unsetting columns by removing their "col_role" before
processing, and adding them afterwards by setting the col_role to "feature".
Fields inherited from PipeOp.
Methods inherited from PipeOp, as well as:
.train_task(task)
(Task) -> Task
Called by the PipeOpTaskPreproc's implementation of private$.train(). Takes a single Task as input
and modifies it (ideally in-place without cloning) while storing information in the $state slot. Note that unlike
$.train(), the argument is not a list but a singular Task, and the return object is also not a list but
a singular Task. Also, contrary to private$.train(), the $state being generated must be a list, which
the PipeOpTaskPreproc will add additional slots to (see Section State). Care should be taken to avoid name collisions between
$state elements added by private$.train_task() and PipeOpTaskPreproc.
By default this function calls the private$.train_dt() function, but it can be overloaded to perform operations on the Task
directly.
.predict_task(task)
(Task) -> Task
Called by the PipeOpTaskPreproc's implementation of $.predict(). Takes a single Task as input
and modifies it (ideally in-place without cloning) while using information in the $state slot. Works analogously to
private$.train_task(). If private$.predict_task() should only be overloaded if private$.train_task() is overloaded (i.e. private$.train_dt() is not used).
.train_dt(dt, levels, target)
(data.table, named list, any) -> data.table | data.frame | matrix
Train PipeOpTaskPreproc on dt, transform it and store a state in $state. A transformed object must be returned
that can be converted to a data.table using as.data.table. dt does not need to be copied deliberately, it
is possible and encouraged to change it in-place.
The levels argument is a named list of factor levels for factorial or character features.
If the input Task inherits from TaskSupervised, the target argument
contains the $truth() information of the training Task; its type depends on the Task
type being trained on.
This method can be overloaded when inheriting from PipeOpTaskPreproc, together with private$.predict_dt() and optionally
private$.select_cols(); alternatively, private$.train_task() and private$.predict_task() can be overloaded.
.predict_dt(dt, levels)
(data.table, named list) -> data.table | data.frame | matrix
Predict on new data in dt, possibly using the stored $state. A transformed object must be returned
that can be converted to a data.table using as.data.table. dt does not need to be copied deliberately, it
is possible and encouraged to change it in-place.
The levels argument is a named list of factor levels for factorial or character features.
This method can be overloaded when inheriting PipeOpTaskPreproc, together with private$.train_dt() and optionally
private$.select_cols(); alternatively, private$.train_task() and private$.predict_task() can be overloaded.
.select_cols(task)
(Task) -> character
Selects which columns the PipeOp operates on, if private$.train_dt() and private$.predict_dt() are overloaded. This function
is not called if private$.train_task() and private$.predict_task() are overloaded. In contrast to
the affect_columns parameter. private$.select_cols() is for the inheriting class to determine which columns
the operator should function on, e.g. based on feature type, while affect_columns is a way for the user
to limit the columns that a PipeOpTaskPreproc should operate on.
This method can optionally be overloaded when inheriting PipeOpTaskPreproc, together with private$.train_dt() and
private$.predict_dt(); alternatively, private$.train_task() and private$.predict_task() can be overloaded.
If this method is not overloaded, it defaults to selecting of type indicated by the feature_types construction argument.
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreprocSimple,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Base class for handling many "preprocessing" operations
that perform essentially the same operation during training and prediction.
Instead implementing a private$.train_task() and a private$.predict_task() operation, only
a private$.get_state() and a private$.transform() operation needs to be defined,
both of which take one argument: a Task.
Alternatively, analogously to the PipeOpTaskPreproc approach of offering private$.train_dt()/private$.predict_dt(),
the private$.get_state_dt() and private$.transform_dt() functions may be implemented.
private$.get_state must not change its input value in-place and must return
something that will be written into $state
(which must not be NULL), private$.transform() should modify its argument in-place;
it is called both during training and prediction.
This inherits from PipeOpTaskPreproc and behaves essentially the same.
Abstract R6Class inheriting from PipeOpTaskPreproc/PipeOp.
PipeOpTaskPreprocSimple$new(id, param_set = ps(), param_vals = list(), can_subset_cols = TRUE, packages = character(0), task_type = "Task", tags = NULL, feature_types = mlr_reflections$task_feature_types)
(Construction is identical to PipeOpTaskPreproc.)
id :: character(1)
Identifier of resulting object. See $id slot of PipeOp.
param_set :: ParamSet
Parameter space description. This should be created by the subclass and given to super$initialize().
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set. The
subclass should have its own param_vals parameter and pass it on to super$initialize(). Default list().
can_subset_cols :: logical(1)
Whether the affect_columns parameter should be added which lets the user limit the columns that are
modified by the PipeOpTaskPreprocSimple. This should generally be FALSE if the operation adds or removes
rows from the Task, and TRUE otherwise. Default is TRUE.
packages :: character
Set of all required packages for the PipeOp's private$.train() and private$.predict() methods. See $packages slot.
Default is character(0).
task_type :: character(1)
The class of Task that should be accepted as input and will be returned as output. This
should generally be a character(1) identifying a type of Task, e.g. "Task", "TaskClassif" or
"TaskRegr" (or another subclass introduced by other packages). Default is "Task".
tags :: character | NULL
Tags of the resulting PipeOp. This is added to the tag "data transform". Default NULL.
feature_types :: character
Feature types affected by the PipeOp. See private$.select_cols() for more information.
Defaults to all available feature types.
Input and output channels are inherited from PipeOpTaskPreproc.
The output during training and prediction is the Task, modified by private$.transform() or private$.transform_dt().
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc.
The parameters are the parameters inherited from PipeOpTaskPreproc.
PipeOpTaskPreprocSimple is an abstract class inheriting from PipeOpTaskPreproc and implementing the
private$.train_task() and private$.predict_task() functions. A subclass of PipeOpTaskPreprocSimple may implement the
functions private$.get_state() and private$.transform(), or alternatively the functions private$.get_state_dt() and private$.transform_dt()
(as well as private$.select_cols(), in the latter case). This works by having the default implementations of
private$.get_state() and private$.transform() call private$.get_state_dt() and private$.transform_dt().
Fields inherited from PipeOp.
Methods inherited from PipeOpTaskPreproc, as well as:
.get_state(task)
(Task) -> named list
Store create something that will be stored in $state during training phase of PipeOpTaskPreprocSimple.
The state can then influence the private$.transform() function. Note that private$.get_state() must return the state, and
should not store it in $state. It is not strictly necessary to implement either private$.get_state() or private$.get_state_dt();
if they are not implemented, the state will be stored as list().
This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with private$.transform();
alternatively, private$.get_state_dt() (optional) and private$.transform_dt() (and possibly private$.select_cols(), from PipeOpTaskPreproc)
can be overloaded.
.transform(task)
(Task) -> Task
Predict on new data in task, possibly using the stored $state. task should not be cloned, instead it should be
changed in-place. This method is called both during training and prediction phase, and should essentially behave the
same independently of phase. (If this is incongruent with the functionality to be implemented, then it should inherit from
PipeOpTaskPreproc, not from PipeOpTaskPreprocSimple.)
This method can be overloaded when inheriting from PipeOpTaskPreprocSimple, optionally with private$.get_state();
alternatively, private$.get_state_dt() (optional) and private$.transform_dt() (and possibly private$.select_cols(), from PipeOpTaskPreproc)
can be overloaded.
.get_state_dt(dt)
(data.table) -> named list
Create something that will be stored in $state during training phase of PipeOpTaskPreprocSimple.
The state can then influence the private$.transform_dt() function. Note that private$.get_state_dt() must return the state, and
should not store it in $state. If neither private$.get_state() nor private$.get_state_dt() are overloaded, the state will
be stored as list().
This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with private$.transform_dt()
(and optionally private$.select_cols(), from PipeOpTaskPreproc); Alternatively, private$.get_state() (optional) and private$.transform()
can be overloaded.
.transform_dt(dt)
(data.table) -> data.table | data.frame | matrix
Predict on new data in dt, possibly using the stored $state. A transformed object must be returned
that can be converted to a data.table using as.data.table. dt does not need to be copied deliberately, it
is possible and encouraged to change it in-place. This method is called both during training and prediction phase,
and should essentially behave the same independently of phase.
(If this is incongruent with the functionality to be implemented, then it should inherit from
PipeOpTaskPreproc, not from PipeOpTaskPreprocSimple.)
This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with private$.transform_dt()
(and optionally private$.select_cols(), from PipeOpTaskPreproc); Alternatively, private$.get_state() (optional) and private$.transform()
can be overloaded.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEncodePL,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
mlr_pipeops,
mlr_pipeops_adas,
mlr_pipeops_blsmote,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_classweightsex,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_decode,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_encodeplquantiles,
mlr_pipeops_encodepltree,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_info,
mlr_pipeops_isomap,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_learner_pi_cvplus,
mlr_pipeops_learner_quantiles,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nearmiss,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_rowapply,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_smotenc,
mlr_pipeops_spatialsign,
mlr_pipeops_splines,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tomek,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other mlr3pipelines backend related:
Graph,
PipeOp,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
mlr_graphs,
mlr_pipeops,
mlr_pipeops_updatetarget
Create
a PipeOp from mlr_pipeops from given ID
a PipeOpLearner from a Learner object
a PipeOpFilter from a Filter object
a PipeOpSelect from a Selector object
a clone of a PipeOp from a given PipeOp (possibly with changed settings)
The object is initialized with given parameters and param_vals.
po() taks a single obj (PipeOp id, Learner, ...) and converts
it to a PipeOp. pos() (with plural-s) takes either a character-vector, or a
list of objects, and creates a list of PipeOps.
po(.obj, ...) pos(.objs, ...)po(.obj, ...) pos(.objs, ...)
.obj |
|
... |
|
.objs |
|
A PipeOp (for po()), or a list of PipeOps (for pos()).
library("mlr3") po("learner", lrn("classif.rpart"), cp = 0.3) po(lrn("classif.rpart"), cp = 0.3) # is equivalent with: mlr_pipeops$get("learner", lrn("classif.rpart"), param_vals = list(cp = 0.3)) mlr3pipelines::pos(c("pca", original = "nop"))library("mlr3") po("learner", lrn("classif.rpart"), cp = 0.3) po(lrn("classif.rpart"), cp = 0.3) # is equivalent with: mlr_pipeops$get("learner", lrn("classif.rpart"), param_vals = list(cp = 0.3)) mlr3pipelines::pos(c("pca", original = "nop"))
Creates a Graph from mlr_graphs from given ID
ppl() taks a character(1) and returns a Graph. ppls() takes a character
vector of any list and returns a list of possibly muliple Graphs.
ppl(.key, ...) ppls(.keys, ...)ppl(.key, ...) ppls(.keys, ...)
.key |
|
... |
|
.keys |
|
Graph (for ppl()) or list of Graphs (for ppls()).
library("mlr3") gr = ppl("bagging", graph = po(lrn("regr.rpart")), averager = po("regravg", collect_multiplicity = TRUE))library("mlr3") gr = ppl("bagging", graph = po(lrn("regr.rpart")), averager = po("regravg", collect_multiplicity = TRUE))
Function that offers a simple and direct way to train or predict PipeOps and Graphs on Tasks,
data.frames or data.tables.
Training happens if predict is set to FALSE and no state is passed to this function.
Prediction happens if predict is set to TRUE and if the passed Graph or PipeOp is either trained or a state
is explicitly passed to this function.
The passed PipeOp or Graph gets modified by-reference.
preproc(indata, processor, state = NULL, predict = !is.null(state))preproc(indata, processor, state = NULL, predict = !is.null(state))
indata |
( |
processor |
( |
state |
(named |
predict |
( |
any | data.frame | data.table:
If indata is a Task, whatever is returned by the processor's single output channel is returned.
If indata is a data.frame or data.table, an object of the same class is returned, or
if the processor's output channel does not return a Task, an error is thrown.
If processor is a PipeOp, the S3 method preproc.PipeOp gets called first, converting the PipeOp into a
Graph and wrapping the state appropriately, before calling the S3 method preproc.Graph with the modified objects.
If indata is a data.frame or data.table, a
TaskUnsupervised is constructed internally. This implies that processors which only work on sub-classes
of TaskSupervised will not work with these input types for indata.
library("mlr3") task = tsk("iris") pop = po("pca") # Training preproc(task, pop) # Note that the PipeOp gets trained through this pop$is_trained # Predicting a trained PipeOp (trained through previous call to preproc) preproc(task, pop, predict = TRUE) # Predicting using a given state # We use the state of the PipeOp from the last example and then reset it state = pop$state pop$state = NULL preproc(task, pop, state) # Note that the PipeOp's state may get overwritten inadvertently during # training or if a state is given pop$state$sdev preproc(tsk("wine"), pop) pop$state$sdev # Piping multiple preproc() calls, using dictionary sugar to set parameters tsk("penguins") |> preproc(po("imputemode", affect_columns = selector_name("sex"))) |> preproc(po("imputemean")) # Use preproc with a Graph gr = po("pca", rank. = 4) %>>% po("learner", learner = lrn("classif.rpart")) preproc(tsk("sonar"), gr) # returns NULL because of the learner preproc(tsk("sonar"), gr, predict = TRUE) # Training with a data.table input # Note that `$data()` drops the information that "Species" is the target. # It gets handled like an ordinary feature here. dt = tsk("iris")$data() preproc(dt, pop) # Predicting with a data.table input preproc(dt, pop)library("mlr3") task = tsk("iris") pop = po("pca") # Training preproc(task, pop) # Note that the PipeOp gets trained through this pop$is_trained # Predicting a trained PipeOp (trained through previous call to preproc) preproc(task, pop, predict = TRUE) # Predicting using a given state # We use the state of the PipeOp from the last example and then reset it state = pop$state pop$state = NULL preproc(task, pop, state) # Note that the PipeOp's state may get overwritten inadvertently during # training or if a state is given pop$state$sdev preproc(tsk("wine"), pop) pop$state$sdev # Piping multiple preproc() calls, using dictionary sugar to set parameters tsk("penguins") |> preproc(po("imputemode", affect_columns = selector_name("sex"))) |> preproc(po("imputemean")) # Use preproc with a Graph gr = po("pca", rank. = 4) %>>% po("learner", learner = lrn("classif.rpart")) preproc(tsk("sonar"), gr) # returns NULL because of the learner preproc(tsk("sonar"), gr, predict = TRUE) # Training with a data.table input # Note that `$data()` drops the information that "Species" is the target. # It gets handled like an ordinary feature here. dt = tsk("iris")$data() preproc(dt, pop) # Predicting with a data.table input preproc(dt, pop)
Add functions that perform conversion to a desired class.
Whenever a Graph or a PipeOp is called with an object
that does not conform to its declared input type, the "autoconvert
register" is queried for functions that may turn the object into
a desired type.
Conversion functions should try to avoid cloning.
register_autoconvert_function(cls, fun, packages = character(0))register_autoconvert_function(cls, fun, packages = character(0))
cls |
|
fun |
|
packages |
|
NULL.
Other class hierarchy operations:
add_class_hierarchy_cache(),
reset_autoconvert_register(),
reset_class_hierarchy_cache()
# This lets mlr3pipelines automatically try to convert a string into # a `PipeOp` by querying the [`mlr_pipeops`] [`Dictionary`][mlr3misc::Dictionary]. # This is an example and not necessary, because mlr3pipelines adds it by default. register_autoconvert_function("PipeOp", function(x) as_pipeop(x), packages = "mlr3pipelines")# This lets mlr3pipelines automatically try to convert a string into # a `PipeOp` by querying the [`mlr_pipeops`] [`Dictionary`][mlr3misc::Dictionary]. # This is an example and not necessary, because mlr3pipelines adds it by default. register_autoconvert_function("PipeOp", function(x) as_pipeop(x), packages = "mlr3pipelines")
Reset autoconvert register to factory default, thereby undoing
any calls to register_autoconvert_function() by the user.
reset_autoconvert_register()reset_autoconvert_register()
NULL
Other class hierarchy operations:
add_class_hierarchy_cache(),
register_autoconvert_function(),
reset_class_hierarchy_cache()
Reset the class hierarchy cache to factory default, thereby undoing
any calls to add_class_hierarchy_cache() by the user.
reset_class_hierarchy_cache()reset_class_hierarchy_cache()
NULL
Other class hierarchy operations:
add_class_hierarchy_cache(),
register_autoconvert_function(),
reset_autoconvert_register()
A Selector function is used by different PipeOps, most prominently PipeOpSelect and many PipeOps inheriting
from PipeOpTaskPreproc, to determine a subset of Tasks to operate on.
Even though a Selector is a function that can be written itself, it is preferable to use the Selector constructors
shown here. Each of these can be called with its arguments to create a Selector, which can then be given to the PipeOpSelect
selector parameter, or many PipeOpTaskPreprocs' affect_columns parameter. See there for examples of this usage.
selector_all() selector_none() selector_type(types) selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE) selector_name(feature_names, assert_present = FALSE) selector_invert(selector) selector_intersect(selector_x, selector_y) selector_union(selector_x, selector_y) selector_setdiff(selector_x, selector_y) selector_missing() selector_cardinality_greater_than(min_cardinality)selector_all() selector_none() selector_type(types) selector_grep(pattern, ignore.case = FALSE, perl = FALSE, fixed = FALSE) selector_name(feature_names, assert_present = FALSE) selector_invert(selector) selector_intersect(selector_x, selector_y) selector_union(selector_x, selector_y) selector_setdiff(selector_x, selector_y) selector_missing() selector_cardinality_greater_than(min_cardinality)
types |
( |
pattern |
( |
ignore.case |
( |
perl |
( |
fixed |
( |
feature_names |
( |
assert_present |
( |
selector |
|
selector_x |
|
selector_y |
|
min_cardinality |
( |
function: A Selector function that takes a Task and returns the feature names to be processed.
selector_all(): selector_all selects all features.
selector_none(): selector_none selects none of the features.
selector_type(): selector_type selects features according to type. Legal types are listed in mlr_reflections$task_feature_types.
selector_grep(): selector_grep selects features with names matching the grep() pattern.
selector_name(): selector_name selects features with names matching exactly the names listed.
selector_invert(): selector_invert inverts a given Selector: It always selects the features
that would be dropped by the other Selector, and drops the features that
would be kept.
selector_intersect(): selector_intersect selects the intersection of two Selectors: Only features
selected by both Selectors are selected in the end.
selector_union(): selector_union selects the union of two Selectors: Features
selected by either Selector are selected in the end.
selector_setdiff(): selector_setdiff selects the setdiff of two Selectors: Features
selected by selector_x are selected, unless they are also selected
by selector_y.
selector_missing(): selector_missing selects features with missing values.
selector_cardinality_greater_than(): selector_cardinality_greater_than selects categorical features with cardinality
greater then a given threshold.
A Selector is a function
that has one input argument (commonly named task). The function is called with the Task that a PipeOp
is operating on. The return value of the function must be a character vector that is a subset of the feature names present
in the Task.
For example, a Selector that selects all columns is
function(task) {
task$feature_names
}
(this is the selector_all()-Selector.) A Selector that selects
all columns that have names shorter than four letters would be:
function(task) {
task$feature_names[
nchar(task$feature_names) < 4
]
}
A Selector that selects only the column "Sepal.Length" (as in the iris task), if present, is
function(task) {
intersect(task$feature_names, "Sepal.Length")
}
It is preferable to use the Selector construction functions like select_type, select_grep etc. if possible, instead of writing custom Selectors.
Other Selectors:
mlr_pipeops_select
library("mlr3") iris_task = tsk("iris") bh_task = tsk("boston_housing") sela = selector_all() sela(iris_task) sela(bh_task) self = selector_type("factor") self(iris_task) self(bh_task) selg = selector_grep("a.*i") selg(iris_task) selg(bh_task) selgi = selector_invert(selg) selgi(iris_task) selgi(bh_task) selgf = selector_union(selg, self) selgf(iris_task) selgf(bh_task)library("mlr3") iris_task = tsk("iris") bh_task = tsk("boston_housing") sela = selector_all() sela(iris_task) sela(bh_task) self = selector_type("factor") self(iris_task) self(bh_task) selg = selector_grep("a.*i") selg(iris_task) selg(bh_task) selgi = selector_invert(selg) selgi(iris_task) selgi(bh_task) selgf = selector_union(selg, self) selgf(iris_task) selgf(bh_task)
Configure validation for a graph learner.
In a GraphLearner, validation can be configured on two levels:
On the GraphLearner level, which specifies how the validation set is constructed before entering the graph.
On the level of the individual PipeOps (such as PipeOpLearner), which specifies
which pipeops actually make use of the validation data (set its $validate field to "predefined") or not (set it to NULL).
This can be specified via the argument ids.
## S3 method for class 'GraphLearner' set_validate( learner, validate, ids = NULL, args_all = list(), args = list(), ... )## S3 method for class 'GraphLearner' set_validate( learner, validate, ids = NULL, args_all = list(), args = list(), ... )
learner |
( |
validate |
( |
ids |
( |
args_all |
( |
args |
(named |
... |
(any) |
library(mlr3) glrn = as_learner(po("pca") %>>% lrn("classif.debug")) set_validate(glrn, 0.3) glrn$validate glrn$graph$pipeops$classif.debug$learner$validate set_validate(glrn, NULL) glrn$validate glrn$graph$pipeops$classif.debug$learner$validate set_validate(glrn, 0.2, ids = "classif.debug") glrn$validate glrn$graph$pipeops$classif.debug$learner$validatelibrary(mlr3) glrn = as_learner(po("pca") %>>% lrn("classif.debug")) set_validate(glrn, 0.3) glrn$validate glrn$graph$pipeops$classif.debug$learner$validate set_validate(glrn, NULL) glrn$validate glrn$graph$pipeops$classif.debug$learner$validate set_validate(glrn, 0.2, ids = "classif.debug") glrn$validate glrn$graph$pipeops$classif.debug$learner$validate