Title: | Extending 'mlr3' to Functional Data Analysis |
---|---|
Description: | Extends the 'mlr3' ecosystem to functional analysis by adding support for irregular and regular functional data as defined in the 'tf' package. The package provides 'PipeOps' for preprocessing functional columns and for extracting scalar features, thereby allowing standard machine learning algorithms to be applied afterwards. Available operations include simple functional features such as the mean or maximum, smoothing, interpolation, flattening, and functional 'PCA'. |
Authors: | Sebastian Fischer [aut, cre] , Maximilian Mücke [aut] , Fabian Scheipl [ctb] , Bernd Bischl [ctb] |
Maintainer: | Sebastian Fischer <[email protected]> |
License: | LGPL-3 |
Version: | 0.2.0 |
Built: | 2024-11-18 05:39:51 UTC |
Source: | https://github.com/mlr-org/mlr3fda |
Extends the 'mlr3' ecosystem to functional analysis by adding support for irregular and regular functional data as defined in the 'tf' package. The package provides 'PipeOps' for preprocessing functional columns and for extracting scalar features, thereby allowing standard machine learning algorithms to be applied afterwards. Available operations include simple functional features such as the mean or maximum, smoothing, interpolation, flattening, and functional 'PCA'.
To extend mlr3 to functional data, two data types from the tf package are added:
tfd_irreg
- Irregular functional data, i.e. the functions are observed for
potentially different inputs for each observation.
tfd_reg
- Regular functional data, i.e. the functions are observed for the same input
for each individual.
Lang M, Binder M, Richter J, Schratz P, Pfisterer F, Coors S, Au Q, Casalicchio G, Kotthoff L, Bischl B (2019). “mlr3: A modern object-oriented machine learning framework in R.” Journal of Open Source Software. doi:10.21105/joss.01903, https://joss.theoj.org/papers/10.21105/joss.01903.
Maintainer: Sebastian Fischer [email protected] (ORCID)
Authors:
Maximilian Mücke [email protected] (ORCID)
Other contributors:
Fabian Scheipl [email protected] (ORCID) [contributor]
Bernd Bischl [email protected] (ORCID) [contributor]
Useful links:
Report bugs at https://github.com/mlr-org/mlr3fda/issues
Calculates the cross-correlation between two functional vectors using tf::tf_crosscor()
.
Note that it only operates on regular data and that the cross-correlation assumes that each column
has the same domain.
To apply this PipeOp
to irregualr data, convert it to a regular grid first using PipeOpFDAInterpol
.
If you need to change the domain of the columns, use PipeOpFDAScaleRange
.
The parameters are the parameters inherited from PipeOpTaskPreprocSimple
,
as well as the following parameters:
arg
:: numeric()
Grid to use for the cross-correlation.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> mlr3pipelines::PipeOpTaskPreprocSimple
-> PipeOpFDACor
new()
Initializes a new instance of this Class.
PipeOpFDACor$new(id = "fda.cor", param_vals = list())
id
(character(1)
)
Identifier of resulting object, default "fda.cor"
.
param_vals
(named list
)
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list()
.
clone()
The objects of this class are cloneable with this method.
PipeOpFDACor$clone(deep = FALSE)
deep
Whether to make a deep clone.
set.seed(1234L) dt = data.table(y = 1:100, x1 = tf::tf_rgp(100L), x2 = tf::tf_rgp(100L)) task = as_task_regr(dt, target = "y") po_cor = po("fda.cor") task_cor = po_cor$train(list(task))[[1L]] task_cor
set.seed(1234L) dt = data.table(y = 1:100, x1 = tf::tf_rgp(100L), x2 = tf::tf_rgp(100L)) task = as_task_regr(dt, target = "y") po_cor = po("fda.cor") task_cor = po_cor$train(list(task))[[1L]] task_cor
This is the class that extracts simple features from functional columns. Note that it only operates on values that were actually observed and does not interpolate.
The parameters are the parameters inherited from PipeOpTaskPreprocSimple
,
as well as the following parameters:
drop
:: logical(1)
Whether to drop the original functional
features and only keep the extracted features.
Note that this does not remove the features from the backend, but only from the active
column role feature
. Initial value is TRUE
.
features
:: list()
| character()
A list of features to extract. Each element can be either a function or a string.
If the element if is function it requires the following arguments: arg
and value
and returns a numeric
.
For string elements, the following predefined features are available:
"mean"
, "max"
,"min"
,"slope"
,"median"
,"var"
.
Initial is c("mean", "max", "min", "slope", "median", "var")
left
:: numeric()
The left boundary of the window. Initial is -Inf
.
The window is specified such that the all values >=left and <=right are kept for the computations.
right
:: numeric()
The right boundary of the window. Initial is Inf
.
The new names generally append a _{feature}
to the corresponding column name.
However this can lead to name clashes with existing columns.
This is solved as follows:
If a column was called "x"
and the feature is "mean"
, the corresponding new column will
be called "x_mean"
. In case of duplicates, unique names are obtained using make.unique()
and
a warning is given.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> mlr3pipelines::PipeOpTaskPreprocSimple
-> PipeOpFDAExtract
new()
Initializes a new instance of this Class.
PipeOpFDAExtract$new(id = "fda.extract", param_vals = list())
id
(character(1)
)
Identifier of resulting object, default is "fda.extract"
.
param_vals
(named list
)
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list()
.
clone()
The objects of this class are cloneable with this method.
PipeOpFDAExtract$clone(deep = FALSE)
deep
Whether to make a deep clone.
task = tsk("fuel") po_fmean = po("fda.extract", features = "mean") task_fmean = po_fmean$train(list(task))[[1L]] # add more than one feature pop = po("fda.extract", features = c("mean", "median", "var")) task_features = pop$train(list(task))[[1L]] # add a custom feature po_custom = po("fda.extract", features = list(mean = function(arg, value) mean(value, na.rm = TRUE)) ) task_custom = po_custom$train(list(task))[[1L]] task_custom
task = tsk("fuel") po_fmean = po("fda.extract", features = "mean") task_fmean = po_fmean$train(list(task))[[1L]] # add more than one feature pop = po("fda.extract", features = c("mean", "median", "var")) task_features = pop$train(list(task))[[1L]] # add a custom feature po_custom = po("fda.extract", features = list(mean = function(arg, value) mean(value, na.rm = TRUE)) ) task_custom = po_custom$train(list(task))[[1L]] task_custom
Convert regular functional features (e.g. all individuals are observed at the same time-points) to new columns, one for each input value to the function.
The parameters are the parameters inherited from PipeOpTaskPreprocSimple
.
The new names generally append a _1
, ..., to the corresponding column name.
However this can lead to name clashes with existing columns.
This is solved as follows:
If a column was called "x"
and the feature is "mean"
, the corresponding new column will
be called "x_mean"
. In case of duplicates, unique names are obtained using make.unique()
and
a warning is given.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> mlr3pipelines::PipeOpTaskPreprocSimple
-> PipeOpFDAFlatten
new()
Initializes a new instance of this Class.
PipeOpFDAFlatten$new(id = "fda.flatten", param_vals = list())
id
(character(1)
)
Identifier of resulting object, default "fda.flatten"
.
param_vals
(named list
)
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list()
.
clone()
The objects of this class are cloneable with this method.
PipeOpFDAFlatten$clone(deep = FALSE)
deep
Whether to make a deep clone.
task = tsk("fuel") pop = po("fda.flatten") task_flat = pop$train(list(task))
task = tsk("fuel") pop = po("fda.flatten") task_flat = pop$train(list(task))
This PipeOp
applies a functional principal component analysis (FPCA) to functional columns and then
extracts the principal components as features. This is done using a (truncated) weighted SVD.
To apply this PipeOp
to irregualr data, convert it to a regular grid first using PipeOpFDAInterpol
.
For more details, see tf::tfb_fpc()
, which is called internally.
The parameters are the parameters inherited from PipeOpTaskPreproc
,
as well as the following parameters:
pve
:: numeric(1)
The percentage of variance explained that should be retained. Default is 0.995
.
n_components
:: integer(1)
The number of principal components to extract. This parameter is initialized to Inf
.
The new names generally append a _pc_{number}
to the corresponding column name.
If a column was called "x"
and the there are three principcal components, the corresponding
new columns will be called "x_pc_1", "x_pc_2", "x_pc_3"
.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> PipeOpFPCA
new()
Initializes a new instance of this Class.
PipeOpFPCA$new(id = "fda.fpca", param_vals = list())
id
(character(1)
)
Identifier of resulting object, default is "fda.fpca"
.
param_vals
(named list
)
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list()
.
clone()
The objects of this class are cloneable with this method.
PipeOpFPCA$clone(deep = FALSE)
deep
Whether to make a deep clone.
task = tsk("fuel") po_fpca = po("fda.fpca", n_components = 3L) task_fpca = po_fpca$train(list(task))[[1L]] task_fpca$data()
task = tsk("fuel") po_fpca = po("fda.fpca", n_components = 3L) task_fpca = po_fpca$train(list(task))[[1L]] task_fpca$data()
Interpolate functional features (e.g. all individuals are observed at different time-points) to a common grid.
This is useful if you want to compare functional features across observations.
The interpolation is done using the tf
package. See tfd()
for details.
The parameters are the parameters inherited from PipeOpTaskPreprocSimple
,
as well as the following parameters:
grid
:: character(1)
| numeric()
The grid to use for interpolation.
If grid
is numeric, it must be a sequence of values to use for the grid or a single value that
specifies the number of points to use for the grid, requires left
and right
to be specified in the latter case.
If grid
is a character, it must be one of:
"union"
: This option creates a grid based on the union of all argument points from the provided functional
features. This means that if the argument points across features are \(t_1, t_2, ..., t_n\), then the grid will
be the combined unique set of these points. This option is generally used when the argument points vary across
observations and a common grid is needed for comparison or further analysis.
"intersect"
: Creates a grid using the intersection of all argument points of a feature.
This grid includes only those points that are common across all functional features,
facilitating direct comparison on a shared set of points.
"minmax"
: Generates a grid within the range of the maximum of the minimum argument points to the minimum of the
maximum argument points across features.
This bounded grid encapsulates the argument point range common to all features.
Note: For regular functional data this has no effect as all argument points are the same.
Initial value is "union"
.
method
:: character(1)
Defaults to "linear"
. One of:
"linear"
: applies linear interpolation without extrapolation (see tf::tf_approx_linear()
).
"spline"
: applies cubic spline interpolation (see tf::tf_approx_spline()
).
"fill_extend"
: applies linear interpolation with constant extrapolation (see tf::tf_approx_fill_extend()
).
"locf"
: applies "last observation carried forward" interpolation (see tf::tf_approx_locf()
).
"nocb"
: applies "next observation carried backward" interpolation (see tf::tf_approx_nocb()
).
left
:: numeric()
The left boundary of the window.
The window is specified such that the all values >=left and <=right are kept for the computations.
right
:: numeric()
The right boundary of the window.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> mlr3pipelines::PipeOpTaskPreprocSimple
-> PipeOpFDAInterpol
new()
Initializes a new instance of this Class.
PipeOpFDAInterpol$new(id = "fda.interpol", param_vals = list())
id
(character(1)
)
Identifier of resulting object, default "fda.interpol"
.
param_vals
(named list
)
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list()
.
clone()
The objects of this class are cloneable with this method.
PipeOpFDAInterpol$clone(deep = FALSE)
deep
Whether to make a deep clone.
task = tsk("fuel") pop = po("fda.interpol") task_interpol = pop$train(list(task))[[1L]] task_interpol$data()
task = tsk("fuel") pop = po("fda.interpol") task_interpol = pop$train(list(task))[[1L]] task_interpol$data()
Linearly transform the domain of functional data so they are between lower
and upper
.
The formula for this is ,
where
is
and
is
. The same transformation is applied during training and prediction.
The parameters are the parameters inherited from PipeOpTaskPreproc
,
as well as the following parameters:
lower
:: numeric(1)
Target value of smallest item of input data. Initialized to 0
.
uppper
:: numeric(1)
Target value of greatest item of input data. Initialized to 1
.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> PipeOpFDAScaleRange
new()
Initializes a new instance of this Class.
PipeOpFDAScaleRange$new(id = "fda.scalerange", param_vals = list())
id
(character(1)
)
Identifier of resulting object, default "fda.scalerange"
.
param_vals
(named list
)
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list()
.
clone()
The objects of this class are cloneable with this method.
PipeOpFDAScaleRange$clone(deep = FALSE)
deep
Whether to make a deep clone.
task = tsk("fuel") po_scale = po("fda.scalerange", lower = -1, upper = 1) task_scale = po_scale$train(list(task))[[1L]] task_scale$data()
task = tsk("fuel") po_scale = po("fda.scalerange", lower = -1, upper = 1) task_scale = po_scale$train(list(task))[[1L]] task_scale$data()
Smoothes functional data using tf::tf_smooth()
.
This preprocessing operator is similar to PipeOpFDAInterpol
, however it does not interpolate to unobserved
x-values, but rather smooths the observed values.
The parameters are the parameters inherited from PipeOpTaskPreprocSimple
,
as well as the following parameters:
method
:: character(1)
One of:
"lowess"
: locally weighted scatterplot smoothing (default)
"rollmean"
: rolling mean
"rollmedian"
: rolling meadian
"savgol"
: Savitzky-Golay filtering
All methods but "lowess" ignore non-equidistant arg values.
args
:: named list()
List of named arguments that is passed to tf_smooth()
. See the help page of tf_smooth()
for
default values.
verbose
:: logical(1)
Whether to print messages during the transformation.
Is initialized to FALSE
.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> mlr3pipelines::PipeOpTaskPreprocSimple
-> PipeOpFDASmooth
new()
Initializes a new instance of this Class.
PipeOpFDASmooth$new(id = "fda.smooth", param_vals = list())
id
(character(1)
)
Identifier of resulting object, default "fda.smooth"
.
param_vals
(named list
)
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction. Default list()
.
clone()
The objects of this class are cloneable with this method.
PipeOpFDASmooth$clone(deep = FALSE)
deep
Whether to make a deep clone.
task = tsk("fuel") po_smooth = po("fda.smooth", method = "rollmean", args = list(k = 5)) task_smooth = po_smooth$train(list(task))[[1L]] task_smooth task_smooth$data(cols = c("NIR", "UVVIS"))
task = tsk("fuel") po_smooth = po("fda.smooth", method = "rollmean", args = list(k = 5)) task_smooth = po_smooth$train(list(task))[[1L]] task_smooth task_smooth$data(cols = c("NIR", "UVVIS"))
This dataset contains two functional covariates and three scalar covariate. The goal is
to predict the PASAT score. pasat
represents the PASAT score at each vist.
subject_id
represents the subject ID. cca
represents the fractional anisotropy tract profiles from the corpus
callosum. sex
indicates subject's sex. rcst
represents the fractional anisotropy tract profiles from the right
corticospinal tract. Rows containing NAs are removed.
This is a subset of the full dataset, which is contained in the package refund
.
R6::R6Class inheriting from mlr3::TaskRegr.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("dti") tsk("dti")
Task type: “regr”
Dimensions: 340x4
Properties: “groups”
Has Missings: FALSE
Target: “pasat”
Features: “cca”, “rcst”, “sex”
Goldsmith, Jeff, Bobb, Jennifer, Crainiceanu, M C, Caffo, Brian, Reich, Daniel (2011). “Penalized functional regression.” Journal of Computational and Graphical Statistics, 20(4), 830–851.
Brain dataset courtesy of Gordon Kindlmann at the Scientific Computing and Imaging Institute, University of Utah, and Andrew Alexander, W. M. Keck Laboratory for Functional Brain Imaging and Behavior, University of Wisconsin-Madison.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
Package mlr3data for more toy tasks.
Package mlr3oml for downloading tasks from https://www.openml.org.
Package mlr3viz for some generic visualizations.
Dictionary of Tasks: mlr_tasks
as.data.table(mlr_tasks)
for a table of available Tasks in the running session (depending on the loaded packages).
mlr3fselect and mlr3filters for feature selection and feature filtering.
Extension packages for additional task types:
Unsupervised clustering: mlr3cluster
Probabilistic supervised regression and survival analysis: https://mlr3proba.mlr-org.com/.
Other Task:
mlr_tasks_fuel
,
mlr_tasks_phoneme
This dataset contains two functional covariates and one scalar covariate. The goal is to predict the heat value of some fuel based on the ultraviolet radiation spectrum and infrared ray radiation and one scalar column called h2o.
This is a subset of the full dataset, which is contained in the package FDboost
.
R6::R6Class inheriting from mlr3::TaskRegr.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("fuel") tsk("fuel")
Task type: “regr”
Dimensions: 129x4
Properties: -
Has Missings: FALSE
Target: “heatan”
Features: “NIR”, “UVVIS”, “h20”
Brockhaus, Sarah, Scheipl, Fabian, Hothorn, Torsten, Greven, Sonja (2015). “The functional linear array model.” Statistical Modelling, 15(3), 279–300.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
Package mlr3data for more toy tasks.
Package mlr3oml for downloading tasks from https://www.openml.org.
Package mlr3viz for some generic visualizations.
Dictionary of Tasks: mlr_tasks
as.data.table(mlr_tasks)
for a table of available Tasks in the running session (depending on the loaded packages).
mlr3fselect and mlr3filters for feature selection and feature filtering.
Extension packages for additional task types:
Unsupervised clustering: mlr3cluster
Probabilistic supervised regression and survival analysis: https://mlr3proba.mlr-org.com/.
Other Task:
mlr_tasks_dti
,
mlr_tasks_phoneme
The task contains a single functional covariate and 5 equally big classes (aa, ao, dcl, iy, sh).
The aim is to predict the class of the phoneme in the functional, which is a
log-periodogram.
This is a subset of the full dataset, which is contained in the package fda.usc
.
R6::R6Class inheriting from mlr3::TaskClassif.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("phoneme") tsk("phoneme")
Task type: “classif”
Dimensions: 250x2
Properties: “multiclass”
Has Missings: FALSE
Target: “class”
Features: “X”
Ferraty, Frédric, Vieu, Philippe (2003). “Curves discrimination: a nonparametric functional approach.” Computational Statistics & Data Analysis, 44(1-2), 161–173.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
Package mlr3data for more toy tasks.
Package mlr3oml for downloading tasks from https://www.openml.org.
Package mlr3viz for some generic visualizations.
Dictionary of Tasks: mlr_tasks
as.data.table(mlr_tasks)
for a table of available Tasks in the running session (depending on the loaded packages).
mlr3fselect and mlr3filters for feature selection and feature filtering.
Extension packages for additional task types:
Unsupervised clustering: mlr3cluster
Probabilistic supervised regression and survival analysis: https://mlr3proba.mlr-org.com/.
Other Task:
mlr_tasks_dti
,
mlr_tasks_fuel