| Title: | Probabilistic Supervised Learning for 'mlr3' |
|---|---|
| Description: | Provides extensions for probabilistic supervised learning for 'mlr3'. This currently includes survival analysis, probabilistic regression and density estimation. |
| Authors: | Raphael Sonabend [aut] (ORCID: <https://orcid.org/0000-0001-9225-4654>), Franz Kiraly [aut], Michel Lang [aut] (ORCID: <https://orcid.org/0000-0001-9754-0393>), Nurul Ain Toha [ctb], Andreas Bender [ctb] (ORCID: <https://orcid.org/0000-0001-5628-8611>), John Zobolas [cre, aut] (ORCID: <https://orcid.org/0000-0002-3609-8674>), Lukas Burk [ctb] (ORCID: <https://orcid.org/0000-0001-7528-3795>), Philip Studener [aut], Maximilian Mücke [ctb] (ORCID: <https://orcid.org/0009-0000-9432-9795>), Lee Xingzhuo Li [ctb] (ORCID: <https://orcid.org/0000-0001-5259-5198>), Markus Goeswein [ctb] |
| Maintainer: | John Zobolas <[email protected]> |
| License: | LGPL-3 |
| Version: | 0.8.10 |
| Built: | 2026-06-05 20:06:23 UTC |
| Source: | https://github.com/mlr-org/mlr3proba |
Provides extensions for probabilistic supervised learning for 'mlr3'. This currently includes survival analysis, probabilistic regression and density estimation.
Maintainer: John Zobolas [email protected] (ORCID)
Authors:
Raphael Sonabend [email protected] (ORCID)
Franz Kiraly [email protected]
Michel Lang [email protected] (ORCID)
Philip Studener [email protected]
Other contributors:
Nurul Ain Toha [email protected] [contributor]
Andreas Bender [email protected] (ORCID) [contributor]
Lukas Burk [email protected] (ORCID) [contributor]
Maximilian Mücke [email protected] (ORCID) [contributor]
Lee Xingzhuo Li [email protected] (ORCID) [contributor]
Markus Goeswein [email protected] [contributor]
Useful links:
Report bugs at https://github.com/mlr-org/mlr3proba/issues
actg dataset from Hosmer et al. (2008)
actgactg
Identification Code
Time to AIDS diagnosis or death (days).
Event indicator. 1 = AIDS defining diagnosis, 0 = Otherwise.
Time to death (days)
Event indicator for death (only). 1 = Death, 0 = Otherwise.
Treatment indicator. 1 = Treatment includes IDV, 0 = Control group.
Treatment group indicator. 1 = ZDV + 3TC. 2 = ZDV + 3TC + IDV. 3 = d4T + 3TC. 4 = d4T + 3TC + IDV.
CD4 stratum at screening. 0 = CD4 <= 50. 1 = CD4 > 50.
0 = Male. 1 = Female.
Race/Ethnicity. 1 = White Non-Hispanic. 2 = Black Non-Hispanic. 3 = Hispanic. 4 = Asian, Pacific Islander. 5 = American Indian, Alaskan Native. 6 = Other/unknown.
IV drug use history. 1 = Never. 2 = Currently. 3 = Previously.
Hemophiliac. 1 = Yes. 0 = No.
Karnofsky Performance Scale. 100 = Normal; no complaint no evidence of disease. 90 = Normal activity possible; minor signs/symptoms of disease. 80 = Normal activity with effort; some signs/symptoms of disease. 70 = Cares for self; normal activity/active work not possible.
Baseline CD4 count (Cells/Milliliter).
Months of prior ZDV use (months).
Age at Enrollment (years).
https://onlinelibrary.wiley.com/doi/book/10.1002/9780470258019
Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY
Convert object to a PredictionDens.
as_prediction_dens(x, ...) ## S3 method for class 'PredictionDens' as_prediction_dens(x, ...) ## S3 method for class 'data.frame' as_prediction_dens(x, ...)as_prediction_dens(x, ...) ## S3 method for class 'PredictionDens' as_prediction_dens(x, ...) ## S3 method for class 'data.frame' as_prediction_dens(x, ...)
x |
(any) |
... |
(any) |
library(mlr3) task = tsk("precip") learner = lrn("dens.hist") learner$train(task) p = learner$predict(task) # convert to a data.table tab = as.data.table(p) # convert back to a Prediction as_prediction_dens(tab)library(mlr3) task = tsk("precip") learner = lrn("dens.hist") learner$train(task) p = learner$predict(task) # convert to a data.table tab = as.data.table(p) # convert back to a Prediction as_prediction_dens(tab)
Convert object to a PredictionSurv.
as_prediction_surv(x, ...) ## S3 method for class 'PredictionSurv' as_prediction_surv(x, ...) ## S3 method for class 'data.frame' as_prediction_surv(x, ...)as_prediction_surv(x, ...) ## S3 method for class 'PredictionSurv' as_prediction_surv(x, ...) ## S3 method for class 'data.frame' as_prediction_surv(x, ...)
x |
(any) |
... |
(any) |
library(mlr3) task = tsk("rats") learner = lrn("surv.coxph") learner$train(task) p = learner$predict(task) # convert to a data.table tab = as.data.table(p) # convert back to a Prediction as_prediction_surv(tab)library(mlr3) task = tsk("rats") learner = lrn("surv.coxph") learner$train(task) p = learner$predict(task) # convert to a data.table tab = as.data.table(p) # convert back to a Prediction as_prediction_surv(tab)
Convert object to a density task (TaskDens).
as_task_dens(x, ...) ## S3 method for class 'TaskDens' as_task_dens(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_dens(x, id = deparse(substitute(x)), ...) ## S3 method for class 'DataBackend' as_task_dens(x, id = deparse(substitute(x)), ...)as_task_dens(x, ...) ## S3 method for class 'TaskDens' as_task_dens(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_dens(x, id = deparse(substitute(x)), ...) ## S3 method for class 'DataBackend' as_task_dens(x, id = deparse(substitute(x)), ...)
x |
( |
... |
( |
clone |
( |
id |
( |
Convert object to a survival task (TaskSurv).
as_task_surv(x, ...) ## S3 method for class 'TaskSurv' as_task_surv(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_surv( x, time = "time", event = "event", time2 = "time2", type = "right", id = deparse(substitute(x)), ... ) ## S3 method for class 'DataBackend' as_task_surv( x, time = "time", event = "event", time2, type = "right", id = deparse(substitute(x)), ... )as_task_surv(x, ...) ## S3 method for class 'TaskSurv' as_task_surv(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_surv( x, time = "time", event = "event", time2 = "time2", type = "right", id = deparse(substitute(x)), ... ) ## S3 method for class 'DataBackend' as_task_surv( x, time = "time", event = "event", time2, type = "right", id = deparse(substitute(x)), ... )
x |
( |
... |
( |
clone |
( |
time |
( |
event |
( |
time2 |
( |
type |
( |
id |
( |
Asserts if the given input matrix is a (discrete) survival probabilities matrix using Rcpp. The following checks are performed:
All values are probabilities, i.e.
Column names correspond to time-points and should therefore be coercable to
numeric and increasing
Per row/observation, the survival probabilities decrease non-strictly, i.e.
assert_surv_matrix(x)assert_surv_matrix(x)
x |
( |
if the assertion fails an error occurs, otherwise NULL is returned
invisibly.
x = matrix(data = c(1,0.6,0.4,0.8,0.8,0.7), nrow = 2, ncol = 3, byrow = TRUE) colnames(x) = c(12, 34, 42) x assert_surv_matrix(x)x = matrix(data = c(1,0.6,0.4,0.8,0.8,0.7), nrow = 2, ncol = 3, byrow = TRUE) colnames(x) = c(12, 34, 42) x assert_surv_matrix(x)
Visualizations for LearnerSurvCoxPH.
The argument type controls what kind of plot is drawn.
The only possible choice right now is "ggforest" which is a Forest Plot,
using ggforest.
This plot displays the estimated hazard ratios (HRs) and their confidence
intervals (CIs) for different variables included in the trained model.
## S3 method for class 'LearnerSurvCoxPH' autoplot(object, type = "ggforest", ...)## S3 method for class 'LearnerSurvCoxPH' autoplot(object, type = "ggforest", ...)
object |
|
type |
(character(1)) |
... |
Additional parameters passed down to |
library(ggplot2) task = tsk("lung") learner = lrn("surv.coxph") learner$train(task) autoplot(learner)library(ggplot2) task = tsk("lung") learner = lrn("surv.coxph") learner$train(task) autoplot(learner)
Generates plots for PredictionSurv, depending on argument type:
"calib" (default): Calibration plot comparing the average predicted
survival distribution (Pred) to a Kaplan-Meier prediction (KM), this is
not a comparison of a stratified crank or lp.
"dcalib": Distribution calibration plot.
A model is considered D-calibrated if, for any given quantile p, the
proportion of observed outcomes occurring before the predicted time quantile,
matches p. For example, 50% of events should occur before the predicted
median survival time (i.e. the time corresponding to a predicted survival
probability of 0.5).
Good calibration means that the resulting line plot will lie close to the
straight line .
Note that we impute NAs from the predicted quantile function with the
maximum observed outcome time.
"scalib": Smoothed calibration plot at a specific time point.
For a range of probabilities of event occurrence in (x-axis),
the y-axis has the smoothed observed proportions calculated using hazard
regression (model is fitted using the predicted probabilities).
See Austin et al. (2020) and MeasureSurvICI for more details.
Good calibration means that the resulting line plot will lie close to the
straight line .
"isd": Plot the predicted individual survival distributions
(survival curves) for the test set's observations.
## S3 method for class 'PredictionSurv' autoplot( object, type = "calib", times = NULL, row_ids = NULL, cuts = 11L, time = NULL, theme = theme_minimal(), ... )## S3 method for class 'PredictionSurv' autoplot( object, type = "calib", times = NULL, row_ids = NULL, cuts = 11L, time = NULL, theme = theme_minimal(), ... )
object |
|
type |
( |
times |
( |
row_ids |
( |
cuts |
( |
time |
( |
theme |
( |
... |
( |
object must have a distr prediction, as all plot types use the
predicted survival distribution/matrix.
type = "dcalib" is drawn a bit differently from Haider et al. (2020),
though its still conceptually the same.
Haider, Humza, Hoehn, Bret, Davis, Sarah, Greiner, Russell (2020). “Effective Ways to Build and Evaluate Individual Survival Distributions.” Journal of Machine Learning Research, 21(85), 1–63. https://jmlr.org/papers/v21/18-772.html.
Austin, C. P, Harrell, E. F, van Klaveren, David (2020). “Graphical calibration curves and the integrated calibration index (ICI) for survival models.” Statistics in Medicine, 39(21), 2714. ISSN 10970258, doi:10.1002/SIM.8570, https://pmc.ncbi.nlm.nih.gov/articles/PMC7497089/.
library(ggplot2) learner = lrn("surv.coxph") task = tsk("gbcs") p = learner$train(task, row_ids = 1:600)$predict(task, row_ids = 601:686) # calibration by comparison of average prediction to Kaplan-Meier autoplot(p) # same as above, use specific time points autoplot(p, times = seq(1, 1000, 5)) # Distribution-calibration (D-Calibration) autoplot(p, type = "dcalib") # Smoothed Calibration (S-Calibration) autoplot(p, type = "scalib", time = 1750) # Predicted survival curves (all observations) autoplot(p, type = "isd") # Predicted survival curves (specific observations) autoplot(p, type = "isd", row_ids = c(601, 651, 686))library(ggplot2) learner = lrn("surv.coxph") task = tsk("gbcs") p = learner$train(task, row_ids = 1:600)$predict(task, row_ids = 601:686) # calibration by comparison of average prediction to Kaplan-Meier autoplot(p) # same as above, use specific time points autoplot(p, times = seq(1, 1000, 5)) # Distribution-calibration (D-Calibration) autoplot(p, type = "dcalib") # Smoothed Calibration (S-Calibration) autoplot(p, type = "scalib", time = 1750) # Predicted survival curves (all observations) autoplot(p, type = "isd") # Predicted survival curves (specific observations) autoplot(p, type = "isd", row_ids = c(601, 651, 686))
Generates plots for TaskDens.
## S3 method for class 'TaskDens' autoplot(object, type = "dens", theme = theme_minimal(), ...)## S3 method for class 'TaskDens' autoplot(object, type = "dens", theme = theme_minimal(), ...)
object |
(TaskDens). |
type |
(
|
theme |
( |
... |
( |
ggplot2::ggplot() object.
library(ggplot2) task = tsk("precip") task$head() autoplot(task, bins = 15) autoplot(task, type = "freq", bins = 15) autoplot(task, type = "overlay", bins = 15) autoplot(task, type = "freqpoly", bins = 15)library(ggplot2) task = tsk("precip") task$head() autoplot(task, bins = 15) autoplot(task, type = "freq", bins = 15) autoplot(task, type = "overlay", bins = 15) autoplot(task, type = "freqpoly", bins = 15)
Generates plots for TaskSurv, depending on argument type:
"target": Calls GGally::ggsurv() on a survival::survfit() object.
This computes the Kaplan-Meier survival curve for the observations if this task.
"duo": Passes data and additional arguments down to GGally::ggduo().
columnsX is target, columnsY is features.
"pairs": Passes data and additional arguments down to GGally::ggpairs().
Color is set to target column.
## S3 method for class 'TaskSurv' autoplot( object, type = "target", theme = theme_minimal(), reverse = FALSE, ... )## S3 method for class 'TaskSurv' autoplot( object, type = "target", theme = theme_minimal(), reverse = FALSE, ... )
object |
(TaskSurv). |
type |
( |
theme |
( |
reverse |
( |
... |
( |
ggplot2::ggplot() object.
library(ggplot2) task = tsk("lung") task$head() autoplot(task) # KM autoplot(task) # KM of the censoring distribution autoplot(task, rhs = "sex") autoplot(task, type = "duo")library(ggplot2) task = tsk("lung") task$head() autoplot(task) # KM autoplot(task) # KM of the censoring distribution autoplot(task, rhs = "sex") autoplot(task, type = "duo")
Helper function to compose a survival distribution (or cumulative hazard)
from the relative risk predictions (linear predictors, lp) of a
proportional hazards model (e.g. a Cox-type model).
breslow(times, status, lp_train, lp_test, eval_times = NULL, type = "surv")breslow(times, status, lp_train, lp_test, eval_times = NULL, type = "surv")
times |
( |
status |
( |
lp_train |
( |
lp_test |
( |
eval_times |
( |
type |
( |
We estimate the survival probability of individual (from the test set),
at time point as follows:
where:
is the cumulative hazard function for individual
is Breslow's estimator for the cumulative baseline
hazard. Estimation requires the training set's times and status as well
the risk predictions (lp_train).
is the risk prediction (linear predictor) of individual
on the test set.
Breslow's approach uses a non-parametric maximum likelihood estimation of the cumulative baseline hazard function:
where:
is the vector of time points (unique and sorted, from the train set)
is number of events (train set)
is the vector of event times (train set)
is the status indicator (1 = event or 0 = censored)
is the risk set (number of individuals at risk just before
event )
is the risk prediction (linear predictor) of individual
(who is part of the risk set ) on the train set.
We employ constant interpolation to estimate the cumulative baseline hazards,
extending from the observed unique event times to the specified evaluation
times (eval_times).
Any values falling outside the range of the estimated times are assigned as
follows:
and
Note that in the rare event of lp predictions being Inf or -Inf, the
resulting cumulative hazard values become NaN, which we substitute with
Inf (and corresponding survival probabilities take the value of ).
For similar implementations, see gbm::basehaz.gbm(), C060::basesurv() and
xgboost.surv::sgb_bhaz().
a matrix (obs x times). Number of columns is equal to eval_times
and number of rows is equal to the number of test observations (i.e. the
length of the lp_test vector). Depending on the type argument, the matrix
can have either survival probabilities (0-1) or cumulative hazard estimates
(0-Inf).
Breslow N (1972). “Discussion of 'Regression Models and Life-Tables' by D.R. Cox.” Journal of the Royal Statistical Society: Series B, 34(2), 216-217.
Lin, Y. D (2007). “On the Breslow estimator.” Lifetime Data Analysis, 13(4), 471-480. doi:10.1007/s10985-007-9048-y.
task = tsk("rats") part = partition(task, ratio = 0.8) learner = lrn("surv.coxph") learner$train(task, part$train) p_train = learner$predict(task, part$train) p_test = learner$predict(task, part$test) surv = breslow(times = task$times(part$train), status = task$status(part$train), lp_train = p_train$lp, lp_test = p_test$lp) head(surv)task = tsk("rats") part = partition(task, ratio = 0.8) learner = lrn("surv.coxph") learner$train(task, part$train) p_train = learner$predict(task, part$train) p_test = learner$predict(task, part$test) surv = breslow(times = task$times(part$train), status = task$status(part$train), lp_train = p_train$lp, lp_test = p_test$lp) head(surv)
gbcs dataset from Hosmer et al. (2008)
gbcsgbcs
Identification Code
Date of diagnosis.
Date of recurrence free survival.
Date of death.
Age at diagnosis (years).
Menopausal status. 1 = Yes, 0 = No.
Hormone therapy. 1 = Yes. 0 = No.
Tumor size (mm).
Tumor grade (1-3).
Number of lymph nodes.
Number of progesterone receptors.
Number of estrogen receptors.
Time to recurrence (days).
Recurrence status. 1 = Recurrence. 0 = Censored.
Time to death (days).
Censoring status. 1 = Death. 0 = Censored.
https://onlinelibrary.wiley.com/doi/book/10.1002/9780470258019
Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY
Many methods can be used to reduce a discrete survival distribution prediction (i.e. matrix) to a relative risk / ranking prediction, see Sonabend et al. (2022).
This function calculates a relative risk score as the sum of the predicted cumulative hazard function, also called ensemble/expected mortality. This risk score can be loosely interpreted as the expected number of deaths for patients with similar characteristics, see Ishwaran et al. (2008) and has no model or survival distribution assumptions.
get_mortality(x)get_mortality(x)
x |
( |
a numeric vector of the mortality risk scores, one per row of the
input survival matrix.
Sonabend, Raphael, Bender, Andreas, Vollmer, Sebastian (2022). “Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.” Bioinformatics. ISSN 1367-4803, doi:10.1093/BIOINFORMATICS/BTAC451, https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac451/6640155.
Ishwaran, Hemant, Kogalur, B U, Blackstone, H E, Lauer, S M, others (2008). “Random survival forests.” The Annals of applied statistics, 2(3), 841–860.
n = 10 # number of observations k = 50 # time points # Create the matrix with random values between 0 and 1 mat = matrix(runif(n * k, min = 0, max = 1), nrow = n, ncol = k) # transform it to a survival matrix surv_mat = t(apply(mat, 1L, function(row) sort(row, decreasing = TRUE))) colnames(surv_mat) = 1:k # time points # get mortality scores (the larger, the more risk) mort = get_mortality(surv_mat) mortn = 10 # number of observations k = 50 # time points # Create the matrix with random values between 0 and 1 mat = matrix(runif(n * k, min = 0, max = 1), nrow = n, ncol = k) # transform it to a survival matrix surv_mat = t(apply(mat, 1L, function(row) sort(row, decreasing = TRUE))) colnames(surv_mat) = 1:k # time points # get mortality scores (the larger, the more risk) mort = get_mortality(surv_mat) mort
grace dataset from Hosmer et al. (2008)
gracegrace
Identification Code
Follow up time.
Censoring indicator. 1 = Death. 0 = Censored.
Revascularization Performed. 1 = Yes. 0 = No.
Days to revascularization after admission.
Length of hospital stay (days).
Age at admission (years).
Systolic blood pressure on admission (mm Hg).
ST-segment deviation on index ECG. 1 = Yes. 0 = No.
https://onlinelibrary.wiley.com/doi/book/10.1002/9780470258019
Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY
This Learner specializes Learner for density estimation problems:
task_type is set to "dens"
Creates Predictions of class PredictionDens.
Possible values for predict_types are:
"pdf": Evaluates estimated probability density function for each value in the test set.
"cdf": Evaluates estimated cumulative distribution function for each value in the test set.
mlr3::Learner -> LearnerDens
new()
Creates a new instance of this R6 class.
LearnerDens$new( id, param_set = ps(), predict_types = "cdf", feature_types = character(), properties = character(), packages = character(), label = NA_character_, man = NA_character_ )
id(character(1))
Identifier for the new instance.
param_set(paradox::ParamSet)
Set of hyperparameters.
predict_types(character())
Supported predict types. Must be a subset of mlr_reflections$learner_predict_types.
feature_types(character())
Feature types the learner operates on. Must be a subset of mlr_reflections$task_feature_types.
properties(character())
Set of properties of the Learner (see initialization method $new().
Must be a subset of mlr_reflections$learner_properties.
packages(character())
Set of required packages.
A warning is signaled by the constructor if at least one of the packages is not installed,
but loaded (not attached) later on-demand via requireNamespace().
label(character(1))
Label for the new instance.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
clone()
The objects of this class are cloneable with this method.
LearnerDens$clone(deep = FALSE)
deepWhether to make a deep clone.
Other Learner:
LearnerSurv
library(mlr3) # get all density learners from mlr_learners: lrns = mlr_learners$mget(mlr_learners$keys("^dens")) names(lrns) # get a specific learner from mlr_learners: mlr_learners$get("dens.hist") lrn("dens.hist")library(mlr3) # get all density learners from mlr_learners: lrns = mlr_learners$mget(mlr_learners$keys("^dens")) names(lrns) # get a specific learner from mlr_learners: mlr_learners$get("dens.hist") lrn("dens.hist")
This Learner specializes Learner for survival problems:
task_type is set to "surv"
Creates Predictions of class PredictionSurv.
Possible values for predict_types are:
"distr": Predicts a probability distribution for each observation in the test set,
uses distr6.
"lp": Predicts a linear predictor for each observation in the test set.
"crank": Predicts a continuous ranking for each observation in the test set.
"response": Predicts a survival time for each observation in the test set.
mlr3::Learner -> LearnerSurv
new()
Creates a new instance of this R6 class.
LearnerSurv$new( id, param_set = ps(), predict_types = "distr", feature_types = character(), properties = character(), packages = character(), label = NA_character_, man = NA_character_ )
id(character(1))
Identifier for the new instance.
param_set(paradox::ParamSet)
Set of hyperparameters.
predict_types(character())
Supported predict types. Must be a subset of mlr_reflections$learner_predict_types.
feature_types(character())
Feature types the learner operates on. Must be a subset of mlr_reflections$task_feature_types.
properties(character())
Set of properties of the Learner (see initialization method $new().
Must be a subset of mlr_reflections$learner_properties.
packages(character())
Set of required packages.
A warning is signaled by the constructor if at least one of the packages is not installed,
but loaded (not attached) later on-demand via requireNamespace().
label(character(1))
Label for the new instance.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
clone()
The objects of this class are cloneable with this method.
LearnerSurv$clone(deep = FALSE)
deepWhether to make a deep clone.
Other Learner:
LearnerDens
library(mlr3) # get all survival learners from mlr_learners: lrns = mlr_learners$mget(mlr_learners$keys("^surv")) names(lrns) # get a specific learner from mlr_learners: mlr_learners$get("surv.coxph") lrn("surv.coxph")library(mlr3) # get all survival learners from mlr_learners: lrns = mlr_learners$mget(mlr_learners$keys("^surv")) names(lrns) # get a specific learner from mlr_learners: mlr_learners$get("surv.coxph") lrn("surv.coxph")
This measure specializes Measure for survival problems.
task_type is set to "dens".
Possible values for predict_type are "pdf" and "cdf".
Predefined measures can be found in the dictionary mlr3::mlr_measures.
mlr3::Measure -> MeasureDens
new()
Creates a new instance of this R6 class.
MeasureDens$new( id, param_set = ps(), range, minimize = NA, aggregator = NULL, properties = character(), predict_type = "pdf", task_properties = character(), packages = character(), label = NA_character_, man = NA_character_ )
id(character(1))
Identifier for the new instance.
param_set(paradox::ParamSet)
Set of hyperparameters.
range(numeric(2))
Feasible range for this measure as c(lower_bound, upper_bound).
Both bounds may be infinite.
minimize(logical(1))
Set to TRUE if good predictions correspond to small values,
and to FALSE if good predictions correspond to large values.
If set to NA (default), tuning this measure is not possible.
aggregator(function(x))
Function to aggregate individual performance scores x where x is a numeric vector.
If NULL, defaults to mean().
properties(character())
Properties of the measure.
Must be a subset of mlr_reflections$measure_properties.
Supported by mlr3:
"requires_task" (requires the complete Task),
"requires_learner" (requires the trained Learner),
"requires_train_set" (requires the training indices from the Resampling), and
"na_score" (the measure is expected to occasionally return NA or NaN).
predict_type(character(1))
Required predict type of the Learner.
Possible values are stored in mlr_reflections$learner_predict_types.
task_properties(character())
Required task properties, see Task.
packages(character())
Set of required packages.
A warning is signaled by the constructor if at least one of the packages is not installed,
but loaded (not attached) later on-demand via requireNamespace().
label(character(1))
Label for the new instance.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
clone()
The objects of this class are cloneable with this method.
MeasureDens$clone(deep = FALSE)
deepWhether to make a deep clone.
Default density measures: dens.logloss
Other Measure:
MeasureSurv
This measure specializes Measure for survival problems.
task_type is set to "surv".
Possible values for predict_type are "distr", "lp", "crank", and "response".
Predefined measures can be found in the dictionary mlr3::mlr_measures.
mlr3::Measure -> MeasureSurv
new()
Creates a new instance of this R6 class.
MeasureSurv$new( id, param_set = ps(), range, minimize = NA, average = "macro", aggregator = NULL, properties = character(), predict_type = "distr", predict_sets = "test", task_properties = character(), packages = character(), label = NA_character_, man = NA_character_, trafo = NULL )
id(character(1))
Identifier for the new instance.
param_set(paradox::ParamSet)
Set of hyperparameters.
range(numeric(2))
Feasible range for this measure as c(lower_bound, upper_bound).
Both bounds may be infinite.
minimize(logical(1))
Set to TRUE if good predictions correspond to small values,
and to FALSE if good predictions correspond to large values.
If set to NA (default), tuning this measure is not possible.
average(character(1))
How to average multiple Predictions from a
ResampleResult.
The default, "macro", calculates the individual performances scores for
each Prediction and then uses the function defined in
$aggregator to average them to a single number.
If set to "micro", the individual Prediction objects
are first combined into a single new Prediction object
which is then used to assess the performance.
The function in $aggregator is not used in this case.
aggregator(function(x))
Function to aggregate individual performance scores x where x is a numeric vector.
If NULL, defaults to mean().
properties(character())
Properties of the measure.
Must be a subset of mlr_reflections$measure_properties.
Supported by mlr3:
"requires_task" (requires the complete Task),
"requires_learner" (requires the trained Learner),
"requires_train_set" (requires the training indices from the Resampling), and
"na_score" (the measure is expected to occasionally return NA or NaN).
predict_type(character(1))
Required predict type of the Learner.
Possible values are stored in mlr_reflections$learner_predict_types.
predict_sets(character())
Prediction sets to operate on, used in aggregate() to extract the matching predict_sets from the ResampleResult.
Multiple predict sets are calculated by the respective Learner during resample()/benchmark().
Must be a non-empty subset of {"train", "test"}.
If multiple sets are provided, these are first combined to a single prediction object.
Default is "test".
task_properties(character())
Required task properties, see Task.
packages(character())
Set of required packages.
A warning is signaled by the constructor if at least one of the packages is not installed,
but loaded (not attached) later on-demand via requireNamespace().
label(character(1))
Label for the new instance.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
clone()
The objects of this class are cloneable with this method.
MeasureSurv$clone(deep = FALSE)
deepWhether to make a deep clone.
Default survival measure: surv.cindex
Other Measure:
MeasureDens
This is an abstract class that should not be constructed directly.
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvAUC
new()
Creates a new instance of this R6 class.
MeasureSurvAUC$new( id, properties = character(), label = NA_character_, man = NA_character_, param_set = ps() )
id(character(1))
Identifier for the new instance.
properties(character())
Properties of the measure.
Must be a subset of mlr_reflections$measure_properties.
Supported by mlr3:
"requires_task" (requires the complete Task),
"requires_learner" (requires the trained Learner),
"requires_train_set" (requires the training indices from the Resampling), and
"na_score" (the measure is expected to occasionally return NA or NaN).
label(character(1))
Label for the new instance.
man(character(1))
String in the format [pkg]::[topic] pointing to a manual page for this object.
The referenced help package can be opened via method $help().
param_set(paradox::ParamSet)
Set of hyperparameters.
clone()
The objects of this class are cloneable with this method.
MeasureSurvAUC$clone(deep = FALSE)
deepWhether to make a deep clone.
Wrapper around PipeOpCrankCompositor to simplify Graph creation.
pipeline_crankcompositor( learner, method = c("mort"), overwrite = FALSE, graph_learner = FALSE )pipeline_crankcompositor( learner, method = c("mort"), overwrite = FALSE, graph_learner = FALSE )
learner |
|
method |
( |
overwrite |
( |
graph_learner |
( |
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("crankcompositor")
ppl("crankcompositor")
Other pipelines:
mlr_graphs_distrcompositor,
mlr_graphs_probregr,
mlr_graphs_responsecompositor,
mlr_graphs_survaverager,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoclassif_disctime,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("lung") part = partition(task) # change the crank prediction type of a Cox's model predictions grlrn = ppl( "crankcompositor", learner = lrn("surv.coxph"), method = "mort", overwrite = TRUE, graph_learner = TRUE ) grlrn$train(task, part$train) grlrn$predict(task, part$test) ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("lung") part = partition(task) # change the crank prediction type of a Cox's model predictions grlrn = ppl( "crankcompositor", learner = lrn("surv.coxph"), method = "mort", overwrite = TRUE, graph_learner = TRUE ) grlrn$train(task, part$train) grlrn$predict(task, part$test) ## End(Not run)
Wrapper around PipeOpDistrCompositor or PipeOpBreslow to simplify Graph creation.
pipeline_distrcompositor( learner, estimator = "kaplan", form = "aft", overwrite = FALSE, scale_lp = FALSE, graph_learner = FALSE )pipeline_distrcompositor( learner, estimator = "kaplan", form = "aft", overwrite = FALSE, scale_lp = FALSE, graph_learner = FALSE )
learner |
|
estimator |
( |
form |
( |
overwrite |
( |
scale_lp |
( |
graph_learner |
( |
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("distrcompositor")
ppl("distrcompositor")
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_probregr,
mlr_graphs_responsecompositor,
mlr_graphs_survaverager,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoclassif_disctime,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3pipelines) # let's change the distribution prediction of Cox (Breslow-based) to an AFT form: task = tsk("rats") grlrn = ppl( "distrcompositor", learner = lrn("surv.coxph"), estimator = "kaplan", form = "aft", overwrite = TRUE, graph_learner = TRUE ) grlrn$train(task) grlrn$predict(task) ## End(Not run)## Not run: library(mlr3pipelines) # let's change the distribution prediction of Cox (Breslow-based) to an AFT form: task = tsk("rats") grlrn = ppl( "distrcompositor", learner = lrn("surv.coxph"), estimator = "kaplan", form = "aft", overwrite = TRUE, graph_learner = TRUE ) grlrn$train(task) grlrn$predict(task) ## End(Not run)
Wrapper around PipeOpProbregr to simplify Graph creation.
pipeline_probregr( learner, learner_se = NULL, dist = "Uniform", graph_learner = FALSE )pipeline_probregr( learner, learner_se = NULL, dist = "Uniform", graph_learner = FALSE )
learner |
|
learner_se |
|
dist |
( |
graph_learner |
( |
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("probregr")
ppl("probregr")
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_distrcompositor,
mlr_graphs_responsecompositor,
mlr_graphs_survaverager,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoclassif_disctime,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("boston_housing") # method 1 - same learner for response and se pipe = ppl( "probregr", learner = lrn("regr.featureless", predict_type = "se"), dist = "Uniform" ) pipe$train(task) pipe$predict(task) # method 2 - different learners for response and se pipe = ppl( "probregr", learner = lrn("regr.rpart"), learner_se = lrn("regr.featureless", predict_type = "se"), dist = "Normal" ) pipe$train(task) pipe$predict(task) ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("boston_housing") # method 1 - same learner for response and se pipe = ppl( "probregr", learner = lrn("regr.featureless", predict_type = "se"), dist = "Uniform" ) pipe$train(task) pipe$predict(task) # method 2 - different learners for response and se pipe = ppl( "probregr", learner = lrn("regr.rpart"), learner_se = lrn("regr.featureless", predict_type = "se"), dist = "Normal" ) pipe$train(task) pipe$predict(task) ## End(Not run)
Wrapper around PipeOpResponseCompositor to simplify Graph creation.
pipeline_responsecompositor( learner, method = "rmst", tau = NULL, add_crank = FALSE, overwrite = FALSE, graph_learner = FALSE )pipeline_responsecompositor( learner, method = "rmst", tau = NULL, add_crank = FALSE, overwrite = FALSE, graph_learner = FALSE )
learner |
|
method |
( |
tau |
( |
add_crank |
( |
overwrite |
( |
graph_learner |
( |
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("responsecompositor")
ppl("responsecompositor")
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_distrcompositor,
mlr_graphs_probregr,
mlr_graphs_survaverager,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoclassif_disctime,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("lung") part = partition(task) # add survival time prediction type to the predictions of a Cox model grlrn = ppl( "responsecompositor", learner = lrn("surv.coxph"), method = "rmst", overwrite = TRUE, graph_learner = TRUE ) grlrn$train(task, part$train) grlrn$predict(task, part$test) ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("lung") part = partition(task) # add survival time prediction type to the predictions of a Cox model grlrn = ppl( "responsecompositor", learner = lrn("surv.coxph"), method = "rmst", overwrite = TRUE, graph_learner = TRUE ) grlrn$train(task, part$train) grlrn$predict(task, part$test) ## End(Not run)
Wrapper around PipeOpSurvAvg to simplify Graph creation.
pipeline_survaverager(learners, param_vals = list(), graph_learner = FALSE)pipeline_survaverager(learners, param_vals = list(), graph_learner = FALSE)
learners |
|
param_vals |
|
graph_learner |
( |
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("survaverager")
ppl("survaverager")
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_distrcompositor,
mlr_graphs_probregr,
mlr_graphs_responsecompositor,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoclassif_disctime,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") pipe = ppl( "survaverager", learners = lrns(c("surv.kaplan", "surv.coxph")), param_vals = list(weights = c(0.1, 0.9)), graph_learner = FALSE ) pipe$train(task) pipe$predict(task) ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") pipe = ppl( "survaverager", learners = lrns(c("surv.kaplan", "surv.coxph")), param_vals = list(weights = c(0.1, 0.9)), graph_learner = FALSE ) pipe$train(task) pipe$predict(task) ## End(Not run)
Wrapper around PipeOpSubsample and PipeOpSurvAvg to simplify Graph creation.
pipeline_survbagging( learner, iterations = 10, frac = 0.7, avg = TRUE, weights = 1, graph_learner = FALSE )pipeline_survbagging( learner, iterations = 10, frac = 0.7, avg = TRUE, weights = 1, graph_learner = FALSE )
learner |
|
iterations |
( |
frac |
( |
avg |
( |
weights |
( |
graph_learner |
( |
Bagging (Bootstrap AGGregatING) is the process of bootstrapping data and aggregating
the final predictions. Bootstrapping splits the data into B smaller datasets of a given size
and is performed with PipeOpSubsample. Aggregation is
the sample mean of deterministic predictions and a
MixtureDistribution of distribution predictions. This can be
further enhanced by using a weighted average by supplying weights.
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("survbagging")
ppl("survbagging")
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_distrcompositor,
mlr_graphs_probregr,
mlr_graphs_responsecompositor,
mlr_graphs_survaverager,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoclassif_disctime,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") pipe = ppl( "survbagging", learner = lrn("surv.coxph"), iterations = 5, graph_learner = FALSE ) pipe$train(task) pipe$predict(task) ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") pipe = ppl( "survbagging", learner = lrn("surv.coxph"), iterations = 5, graph_learner = FALSE ) pipe$train(task) pipe$predict(task) ## End(Not run)
Wrapper around PipeOpTaskSurvClassifDiscTime and PipeOpPredClassifSurvDiscTime to simplify Graph creation.
pipeline_survtoclassif_disctime( learner, cut = NULL, max_time = NULL, graph_learner = FALSE )pipeline_survtoclassif_disctime( learner, cut = NULL, max_time = NULL, graph_learner = FALSE )
learner |
LearnerClassif |
cut |
( |
max_time |
( |
graph_learner |
( |
The pipeline consists of the following steps:
PipeOpTaskSurvClassifDiscTime Converts TaskSurv to a TaskClassif.
A LearnerClassif is fit and predicted on the new TaskClassif.
PipeOpPredClassifSurvDiscTime transforms the resulting PredictionClassif to PredictionSurv.
Optionally: PipeOpModelMatrix is used to transform the formula of the task before fitting the learner.
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("survtoclassif_disctime")
ppl("survtoclassif_disctime")
Tutz, Gerhard, Schmid, Matthias (2016). Modeling Discrete Time-to-Event Data, series Springer Series in Statistics. Springer International Publishing. ISBN 978-3-319-28156-8 978-3-319-28158-2, http://link.springer.com/10.1007/978-3-319-28158-2.
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_distrcompositor,
mlr_graphs_probregr,
mlr_graphs_responsecompositor,
mlr_graphs_survaverager,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") part = partition(task) grlrn = ppl( "survtoclassif_disctime", learner = lrn("classif.log_reg"), cut = 4, # 4 equidistant time intervals graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) ## End(Not run)## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") part = partition(task) grlrn = ppl( "survtoclassif_disctime", learner = lrn("classif.log_reg"), cut = 4, # 4 equidistant time intervals graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) ## End(Not run)
Wrapper around PipeOpTaskSurvClassifIPCW and PipeOpPredClassifSurvIPCW to simplify Graph creation.
pipeline_survtoclassif_IPCW( learner, tau = NULL, eps = 0.001, graph_learner = FALSE )pipeline_survtoclassif_IPCW( learner, tau = NULL, eps = 0.001, graph_learner = FALSE )
learner |
LearnerClassif |
tau |
( |
eps |
( |
graph_learner |
( |
The pipeline consists of the following steps:
PipeOpTaskSurvClassifIPCW Converts TaskSurv to a TaskClassif.
A LearnerClassif is fit and predicted on the new TaskClassif.
PipeOpPredClassifSurvIPCW transforms the resulting PredictionClassif to PredictionSurv.
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
This Graph can be instantiated via the dictionary mlr_graphs or with the associated sugar function ppl():
mlr_graphs$get("survtoclassif_IPCW")
ppl("survtoclassif_IPCW")
Additional alias id for pipeline construction:
ppl("survtoclassif_vock")
Vock, M D, Wolfson, Julian, Bandyopadhyay, Sunayan, Adomavicius, Gediminas, Johnson, E P, Vazquez-Benitez, Gabriela, O'Connor, J P (2016). “Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.” Journal of Biomedical Informatics, 61, 119–131. doi:10.1016/j.jbi.2016.03.009, https://www.sciencedirect.com/science/article/pii/S1532046416000496.
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_distrcompositor,
mlr_graphs_probregr,
mlr_graphs_responsecompositor,
mlr_graphs_survaverager,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_disctime,
mlr_graphs_survtoregr_pem
## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") part = partition(task) grlrn = ppl( "survtoclassif_IPCW", learner = lrn("classif.rpart"), tau = 500, # Observations after 500 days are censored graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) pred = grlrn$predict(task, row_ids = part$test) pred # crank and distr at the cutoff time point included # score predictions pred$score() # C-index pred$score(msr("surv.brier", times = 500, integrated = FALSE)) # Brier score at tau ## End(Not run)## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") part = partition(task) grlrn = ppl( "survtoclassif_IPCW", learner = lrn("classif.rpart"), tau = 500, # Observations after 500 days are censored graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) pred = grlrn$predict(task, row_ids = part$test) pred # crank and distr at the cutoff time point included # score predictions pred$score() # C-index pred$score(msr("surv.brier", times = 500, integrated = FALSE)) # Brier score at tau ## End(Not run)
Wrapper around multiple PipeOps to help in creation of complex survival reduction methods.
pipeline_survtoregr_pem( learner, cut = NULL, max_time = NULL, graph_learner = FALSE )pipeline_survtoregr_pem( learner, cut = NULL, max_time = NULL, graph_learner = FALSE )
learner |
LearnerRegr |
cut |
|
max_time |
|
graph_learner |
|
A brief mathematical summary of PEMs (see referenced article for more detail):
PED Transformation:
Survival data is converted into piece-wise exponential data (PED) format.
Key elements are: Continuous time is divided into intervals for each subject, .
A status variable in each entry indicates whether an event or censoring occurred during that interval. For any subject, data entries are
created only up until the interval including the event time. An offset column is introduced and represents the logarithm of the time a subject spent in any given interval.
For more details, see pammtools::as_ped().
Hazard Estimation with PEM: The PED transformation combined with the working assumption
where denotes the event or censoring indicator, allows framing the problem of piecewise constant hazard estimation as a poisson regression with offset.
Specifically, we want to estimate
is a general function of features and , i.e. a learner, and may include non-linearity and complex feature interactions.
Two important prerequisites of the learner are its capacity to model a poisson likelihood and accommodate the offset.
From Piecewise Hazards to Survival Probabilities: Lastly, the computed hazards are back transformed to survival probabilities via the following identity
where specifies the duration of interval .
The previous considerations are reflected in the pipeline which consists of the following steps:
PipeOpTaskSurvRegrPEM Converts TaskSurv to a TaskRegr.
A LearnerRegr is fit and predicted on the new TaskRegr.
PipeOpPredRegrSurvPEM transforms the resulting PredictionRegr to PredictionSurv.
mlr3pipelines::Graph or mlr3pipelines::GraphLearner
Bender, Andreas, Groll, Andreas, Scheipl, Fabian (2018). “A generalized additive model approach to time-to-event analysis.” Statistical Modelling, 18(3-4), 299–321. https://doi.org/10.1177/1471082X17748083.
Other pipelines:
mlr_graphs_crankcompositor,
mlr_graphs_distrcompositor,
mlr_graphs_probregr,
mlr_graphs_responsecompositor,
mlr_graphs_survaverager,
mlr_graphs_survbagging,
mlr_graphs_survtoclassif_IPCW,
mlr_graphs_survtoclassif_disctime
## Not run: library(mlr3) library(mlr3learners) library(mlr3extralearners) library(mlr3pipelines) task = tsk("lung") part = partition(task) # typically model formula and features types are extracted from the task learner = lrn("regr.gam", family = "poisson") grlrn = ppl( "survtoregr_pem", learner = learner, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) # In some instances special formulas can be specified in the learner learner = lrn("regr.gam", family = "poisson", formula = pem_status ~ s(tend) + s(age) + meal.cal) grlrn = ppl( "survtoregr_pem", learner = learner, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) # if necessary encode data before passing to learner with e.g. po("encode"), # po("modelmatrix"), etc. # With po("modelmatrix") feature types and formula can be adjusted at the same time cut = round(seq(0, max(task$data()$time), length.out = 20)) learner = as_learner( po("modelmatrix", formula = ~ as.factor(tend) + .) %>>% lrn("regr.glmnet", family = "poisson", lambda = 0) ) grlrn = ppl( "survtoregr_pem", learner = learner, cut = cut, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) # xgboost regression learner learner = as_learner( po("modelmatrix", formula = ~ .) %>>% lrn("regr.xgboost", objective = "count:poisson", nrounds = 100, eta = 0.1) ) grlrn = ppl( "survtoregr_pem", learner = learner, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) ## End(Not run)## Not run: library(mlr3) library(mlr3learners) library(mlr3extralearners) library(mlr3pipelines) task = tsk("lung") part = partition(task) # typically model formula and features types are extracted from the task learner = lrn("regr.gam", family = "poisson") grlrn = ppl( "survtoregr_pem", learner = learner, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) # In some instances special formulas can be specified in the learner learner = lrn("regr.gam", family = "poisson", formula = pem_status ~ s(tend) + s(age) + meal.cal) grlrn = ppl( "survtoregr_pem", learner = learner, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) # if necessary encode data before passing to learner with e.g. po("encode"), # po("modelmatrix"), etc. # With po("modelmatrix") feature types and formula can be adjusted at the same time cut = round(seq(0, max(task$data()$time), length.out = 20)) learner = as_learner( po("modelmatrix", formula = ~ as.factor(tend) + .) %>>% lrn("regr.glmnet", family = "poisson", lambda = 0) ) grlrn = ppl( "survtoregr_pem", learner = learner, cut = cut, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) # xgboost regression learner learner = as_learner( po("modelmatrix", formula = ~ .) %>>% lrn("regr.xgboost", objective = "count:poisson", nrounds = 100, eta = 0.1) ) grlrn = ppl( "survtoregr_pem", learner = learner, graph_learner = TRUE ) grlrn$train(task, row_ids = part$train) grlrn$predict(task, row_ids = part$test) ## End(Not run)
Calls graphics::hist() and the result is coerced to a distr6::Distribution.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensHistogram$new()
mlr_learners$get("dens.hist")
lrn("dens.hist")
Type: "dens"
Predict Types: pdf, cdf, distr
Feature Types: integer, numeric
Properties: -
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensHistogram
new()
Creates a new instance of this R6 class.
LearnerDensHistogram$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensHistogram$clone(deep = FALSE)
deepWhether to make a deep clone.
Other density estimators:
mlr_learners_dens.kde,
mlr_learners_dens.kde_ks,
mlr_learners_dens.locfit,
mlr_learners_dens.logspline,
mlr_learners_dens.mixed,
mlr_learners_dens.nonpar,
mlr_learners_dens.plug,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.hist") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.hist") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls kernels implemented in distr6 and the result is coerced to a distr6::Distribution.
The default bandwidth uses Silverman's rule-of-thumb for Gaussian kernels, however for non-Gaussian kernels it is recommended to use mlr3tuning to tune the bandwidth with cross-validation. Other density learners can be used for automated bandwidth selection. The default kernel is Epanechnikov (chosen to reduce dependencies).
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensKDE$new()
mlr_learners$get("dens.kde")
lrn("dens.kde")
Type: "dens"
Predict Types: pdf, distr
Feature Types: integer, numeric
Properties: missings
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensKDE
new()
Creates a new instance of this R6 class.
LearnerDensKDE$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensKDE$clone(deep = FALSE)
deepWhether to make a deep clone.
Silverman, W. B (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde_ks,
mlr_learners_dens.locfit,
mlr_learners_dens.logspline,
mlr_learners_dens.mixed,
mlr_learners_dens.nonpar,
mlr_learners_dens.plug,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.kde") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.kde") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls ks::kde() and the result is coerced to a distr6::Distribution.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensKDEks$new()
mlr_learners$get("dens.kde_ks")
lrn("dens.kde_ks")
Type: "dens"
Predict Types: pdf
Feature Types: integer, numeric
Properties: -
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensKDEks
new()
Creates a new instance of this R6 class.
LearnerDensKDEks$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensKDEks$clone(deep = FALSE)
deepWhether to make a deep clone.
Gramacki, Artur, Gramacki, Jarosław (2017). “FFT-based fast computation of multivariate kernel density estimators with unconstrained bandwidth matrices.” Journal of Computational and Graphical Statistics, 26(2), 459–462.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde,
mlr_learners_dens.locfit,
mlr_learners_dens.logspline,
mlr_learners_dens.mixed,
mlr_learners_dens.nonpar,
mlr_learners_dens.plug,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.kde_ks") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.kde_ks") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls locfit::density.lf() and the result is coerced to a distr6::Distribution.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensLocfit$new()
mlr_learners$get("dens.locfit")
lrn("dens.locfit")
Type: "dens"
Predict Types: pdf
Feature Types: integer, numeric
Properties: -
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensLocfit
new()
Creates a new instance of this R6 class.
LearnerDensLocfit$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensLocfit$clone(deep = FALSE)
deepWhether to make a deep clone.
Loader, Clive (2006). Local regression and likelihood. Springer Science & Business Media.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde,
mlr_learners_dens.kde_ks,
mlr_learners_dens.logspline,
mlr_learners_dens.mixed,
mlr_learners_dens.nonpar,
mlr_learners_dens.plug,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.locfit") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.locfit") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls logspline::logspline() and the result is coerced to a distr6::Distribution.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensLogspline$new()
mlr_learners$get("dens.logspline")
lrn("dens.logspline")
Type: "dens"
Predict Types: pdf, cdf
Feature Types: integer, numeric
Properties: -
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensLogspline
new()
Creates a new instance of this R6 class.
LearnerDensLogspline$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensLogspline$clone(deep = FALSE)
deepWhether to make a deep clone.
Kooperberg, Charles, Stone, J C (1992). “Logspline density estimation for censored data.” Journal of Computational and Graphical Statistics, 1(4), 301–328.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde,
mlr_learners_dens.kde_ks,
mlr_learners_dens.locfit,
mlr_learners_dens.mixed,
mlr_learners_dens.nonpar,
mlr_learners_dens.plug,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.logspline") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.logspline") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls np::npudens() and the result is coerced to a distr6::Distribution.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensMixed$new()
mlr_learners$get("dens.mixed")
lrn("dens.mixed")
Type: "dens"
Predict Types: pdf
Feature Types: integer, numeric
Properties: -
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensMixed
new()
Creates a new instance of this R6 class.
LearnerDensMixed$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensMixed$clone(deep = FALSE)
deepWhether to make a deep clone.
Li, Qi, Racine, Jeff (2003). “Nonparametric estimation of distributions with categorical and continuous data.” journal of multivariate analysis, 86(2), 266–292.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde,
mlr_learners_dens.kde_ks,
mlr_learners_dens.locfit,
mlr_learners_dens.logspline,
mlr_learners_dens.nonpar,
mlr_learners_dens.plug,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.mixed") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.mixed") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls sm::sm.density() and the result is coerced to a distr6::Distribution.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensNonparametric$new()
mlr_learners$get("dens.nonpar")
lrn("dens.nonpar")
Type: "dens"
Predict Types: pdf
Feature Types: integer, numeric
Properties: weights
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensNonparametric
new()
Creates a new instance of this R6 class.
LearnerDensNonparametric$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensNonparametric$clone(deep = FALSE)
deepWhether to make a deep clone.
Bowman, A.W., Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations, series Oxford Statistical Science Series. OUP Oxford. ISBN 9780191545696, https://books.google.de/books?id=7WBMrZ9umRYC.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde,
mlr_learners_dens.kde_ks,
mlr_learners_dens.locfit,
mlr_learners_dens.logspline,
mlr_learners_dens.mixed,
mlr_learners_dens.plug,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.nonpar") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.nonpar") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls plugdensity::plugin.density() and the result is coerced to a distr6::Distribution.
Kernel density estimation by "plug-in" bandwidth selection.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensPlugin$new()
mlr_learners$get("dens.plug")
lrn("dens.plug")
Type: "dens"
Predict Types: pdf
Feature Types: numeric
Properties: missings
Packages: mlr3 mlr3proba plugdensity distr6
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensPlugin
new()
Creates a new instance of this R6 class.
LearnerDensPlugin$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensPlugin$clone(deep = FALSE)
deepWhether to make a deep clone.
Engel, Joachim, Herrmann, Eva, Gasser, Theo (1994). “An iterative bandwidth selector for kernel estimation of densities and their derivatives.” Journaltitle of Nonparametric Statistics, 4(1), 21–34.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde,
mlr_learners_dens.kde_ks,
mlr_learners_dens.locfit,
mlr_learners_dens.logspline,
mlr_learners_dens.mixed,
mlr_learners_dens.nonpar,
mlr_learners_dens.spline
# Define the Learner learner = lrn("dens.plug") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.plug") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls gss::ssden() and the result is coerced to a distr6::Distribution.
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerDensSpline$new()
mlr_learners$get("dens.spline")
lrn("dens.spline")
Type: "dens"
Predict Types: pdf, cdf
Feature Types: integer, numeric
Properties: missings
mlr3::Learner -> mlr3proba::LearnerDens -> LearnerDensSpline
new()
Creates a new instance of this R6 class.
LearnerDensSpline$new()
clone()
The objects of this class are cloneable with this method.
LearnerDensSpline$clone(deep = FALSE)
deepWhether to make a deep clone.
Gu, Chong, Wang, Jingyuan (2003). “Penalized likelihood density estimation: Direct cross-validation and scalable approximation.” Statistica Sinica, 811–826.
Other density estimators:
mlr_learners_dens.hist,
mlr_learners_dens.kde,
mlr_learners_dens.kde_ks,
mlr_learners_dens.locfit,
mlr_learners_dens.logspline,
mlr_learners_dens.mixed,
mlr_learners_dens.nonpar,
mlr_learners_dens.plug
# Define the Learner learner = lrn("dens.spline") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()# Define the Learner learner = lrn("dens.spline") print(learner) # Define a Task task = tsk("faithful") # Create train and test set ids = partition(task) # Train the learner on the training ids learner$train(task, row_ids = ids$train) print(learner$model) # Make predictions for the test rows predictions = learner$predict(task, row_ids = ids$test) # Score the predictions predictions$score()
Calls survival::coxph().
lp is predicted by survival::predict.coxph()
distr is predicted by survival::survfit.coxph()
crank is identical to lp
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerSurvCoxPH$new()
mlr_learners$get("surv.coxph")
lrn("surv.coxph")
Task type: “surv”
Predict Types: “crank”, “distr”, “lp”
Feature Types: “logical”, “integer”, “numeric”, “factor”
| Id | Type | Default | Levels | Range |
| ties | character | efron | efron, breslow, exact | - |
| singular.ok | logical | TRUE | TRUE, FALSE | - |
| type | character | efron | efron, aalen, kalbfleisch-prentice | - |
| stype | integer | 2 |
|
mlr3::Learner -> mlr3proba::LearnerSurv -> LearnerSurvCoxPH
new()
Creates a new instance of this R6 class.
LearnerSurvCoxPH$new()
clone()
The objects of this class are cloneable with this method.
LearnerSurvCoxPH$clone(deep = FALSE)
deepWhether to make a deep clone.
Cox DR (1972). “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–202. doi:10.1111/j.2517-6161.1972.tb00899.x.
Other survival learners:
mlr_learners_surv.kaplan,
mlr_learners_surv.rpart
Calls survival::survfit().
distr is predicted by estimating the survival function with survival::survfit()
crank is predicted as the sum of the cumulative
hazard function (expected mortality) derived from the survival distribution,
distr
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerSurvKaplan$new()
mlr_learners$get("surv.kaplan")
lrn("surv.kaplan")
Task type: “surv”
Predict Types: “crank”, “distr”
Feature Types: “logical”, “integer”, “numeric”, “character”, “factor”, “ordered”
Empty ParamSet
mlr3::Learner -> mlr3proba::LearnerSurv -> LearnerSurvKaplan
new()
Creates a new instance of this R6 class.
LearnerSurvKaplan$new()
importance()
All features have a score of 0 for this learner.
This method exists solely for compatibility with the mlr3 ecosystem,
as this learner is used as a fallback for other survival learners that
require an importance() method.
LearnerSurvKaplan$importance()
Named numeric().
selected_features()
Selected features are always the empty set for this learner.
This method is implemented only for compatibility with the mlr3 API,
as this learner does not perform feature selection.
LearnerSurvKaplan$selected_features()
character(0).
clone()
The objects of this class are cloneable with this method.
LearnerSurvKaplan$clone(deep = FALSE)
deepWhether to make a deep clone.
Kaplan EL, Meier P (1958). “Nonparametric Estimation from Incomplete Observations.” Journal of the American Statistical Association, 53(282), 457–481. doi:10.1080/01621459.1958.10501452.
Other survival learners:
mlr_learners_surv.coxph,
mlr_learners_surv.rpart
Calls rpart::rpart().
crank is predicted using rpart::predict.rpart()
This Learner can be instantiated via the dictionary mlr_learners or with the associated sugar function lrn():
LearnerSurvRpart$new()
mlr_learners$get("surv.rpart")
lrn("surv.rpart")
Task type: “surv”
Predict Types: “crank”
Feature Types: “logical”, “integer”, “numeric”, “character”, “factor”, “ordered”
| Id | Type | Default | Levels | Range |
| parms | numeric | 1 | |
|
| minbucket | integer | - | |
|
| minsplit | integer | 20 | |
|
| cp | numeric | 0.01 | |
|
| maxcompete | integer | 4 | |
|
| maxsurrogate | integer | 5 | |
|
| maxdepth | integer | 30 | |
|
| usesurrogate | integer | 2 | |
|
| surrogatestyle | integer | 0 | |
|
| xval | integer | 10 | |
|
| cost | untyped | - | - | |
| keep_model | logical | FALSE | TRUE, FALSE | - |
xval is set to 0 in order to save some computation time.
model has been renamed to keep_model.
mlr3::Learner -> mlr3proba::LearnerSurv -> LearnerSurvRpart
new()
Creates a new instance of this R6 class.
LearnerSurvRpart$new()
importance()
The importance scores are extracted from the model slot variable.importance.
LearnerSurvRpart$importance()
Named numeric().
selected_features()
Selected features are extracted from the model slot frame$var.
LearnerSurvRpart$selected_features()
character().
clone()
The objects of this class are cloneable with this method.
LearnerSurvRpart$clone(deep = FALSE)
deepWhether to make a deep clone.
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984). Classification And Regression Trees. Routledge. doi:10.1201/9781315139470.
Other survival learners:
mlr_learners_surv.coxph,
mlr_learners_surv.kaplan
Calculates the cross-entropy, or logarithmic (log), loss.
The Log Loss, in the context of probabilistic predictions, is defined as the negative log
probability density function, , evaluated at the observed value, ,
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureDensLogloss$new()
mlr_measures$get("dens.logloss")
msr("dens.logloss")
| Id | Type | Default | Range |
| eps | numeric | 1e-15 |
|
Type: "density"
Range:
Minimize: TRUE
Required prediction: pdf
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 1e-15.
mlr3::Measure -> mlr3proba::MeasureDens -> MeasureDensLogloss
new()
Creates a new instance of this R6 class.
MeasureDensLogloss$new()
clone()
The objects of this class are cloneable with this method.
MeasureDensLogloss$clone(deep = FALSE)
deepWhether to make a deep clone.
Calculates the cross-entropy, or logarithmic (log), loss.
The Log Loss, in the context of probabilistic predictions, is defined as the negative log
probability density function, , evaluated at the observed value, ,
| Id | Type | Default | Range |
| eps | numeric | 1e-15 |
|
Type: "regr"
Range:
Minimize: TRUE
Required prediction: distr
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 1e-15.
mlr3::Measure -> mlr3::MeasureRegr -> MeasureRegrLogloss
new()
Creates a new instance of this R6 class.
MeasureRegrLogloss$new()
clone()
The objects of this class are cloneable with this method.
MeasureRegrLogloss$clone(deep = FALSE)
deepWhether to make a deep clone.
This calibration method is defined by estimating
where is the observed censoring indicator from the test data
observations), is the predicted cumulative hazard, and
is the observed survival time (event or censoring).
The standard error is given by
The model is well calibrated if the estimated coefficient
(returned score) is equal to 1.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvCalibrationAlpha$new()
mlr_measures$get("surv.calib_alpha")
msr("surv.calib_alpha")
| Id | Type | Default | Levels | Range |
| eps | numeric | 0.001 | |
|
| se | logical | FALSE | TRUE, FALSE | - |
| method | character | ratio | ratio, diff | - |
| truncate | numeric | Inf |
|
Type: "surv"
Range:
Minimize: FALSE
Required prediction: distr
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 0.001.
se (logical(1))
If TRUE then return standard error of the measure, otherwise the score
itself (default).
method (character(1))
Returns if equal to ratio (default) and
if equal to diff.
With diff, the output score can be minimized and for example be used for
tuning purposes. This parameter takes effect only if se is FALSE.
truncate (double(1))
This parameter controls the upper bound of the output score.
We use truncate = Inf by default (so no truncation) and it's up to the user
to set this up reasonably given the chosen method.
Note that truncation may severely limit automated tuning with this measure
using method = diff.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvCalibrationAlpha
new()
Creates a new instance of this R6 class.
MeasureSurvCalibrationAlpha$new(method = "ratio")
methoddefines which output score to return, see "Parameter details" section.
clone()
The objects of this class are cloneable with this method.
MeasureSurvCalibrationAlpha$clone(deep = FALSE)
deepWhether to make a deep clone.
Van Houwelingen, C. H (2000). “Validation, calibration, revision and combination of prognostic survival models.” Statistics in Medicine, 19(24), 3401–3415. doi:10.1002/1097-0258(20001230)19:24<3401::AID-SIM554>3.0.CO;2-2.
Other survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other calibration survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib
Other distr survival measures:
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
This calibration method fits the predicted linear predictor from a Cox PH model as the only predictor in a new Cox PH model with the test data as the response.
where is the predicted linear predictor on the test data.
The model is well calibrated if the estimated coefficient
(returned score) is equal to 1.
Note: Assumes fitted model is Cox PH (i.e. has an lp prediction type).
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvCalibrationBeta$new()
mlr_measures$get("surv.calib_beta")
msr("surv.calib_beta")
| Id | Type | Default | Levels |
| se | logical | FALSE | TRUE, FALSE |
| method | character | ratio | ratio, diff |
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
se (logical(1))
If TRUE then return standard error of the measure which is the standard
error of the estimated coefficient from the Cox PH model.
If FALSE (default) then returns the estimated coefficient .
method (character(1))
Returns if equal to ratio (default) and
if diff.
With diff, the output score can be minimized and for example be used for
tuning purposes.
This parameter takes effect only if se is FALSE.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvCalibrationBeta
new()
Creates a new instance of this R6 class.
MeasureSurvCalibrationBeta$new(method = "ratio")
methoddefines which output score to return, see "Parameter details" section.
clone()
The objects of this class are cloneable with this method.
MeasureSurvCalibrationBeta$clone(deep = FALSE)
deepWhether to make a deep clone.
Van Houwelingen, C. H (2000). “Validation, calibration, revision and combination of prognostic survival models.” Statistics in Medicine, 19(24), 3401–3415. doi:10.1002/1097-0258(20001230)19:24<3401::AID-SIM554>3.0.CO;2-2.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other calibration survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib
Other lp survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Calculates the Integrated Calibration Index (ICI), which evaluates point-calibration (i.e. at a specific time point), see Austin et al. (2020).
Each individual from the test set, has an observed survival outcome
(time and censoring indicator) and predicted survival
function .
The predicted probability of an event occurring before a specific time point
, is defined as .
Using hazard regression (via the polspline R package), a smoothed calibration curve is estimated by fitting the following model:
Note that we substitute probabilities with a small
number to avoid arithmetic issues (). Same with
, we use .
From this model, the smoothed probability of occurrence at for
observation is obtained as .
The Integrated Calibration Index is then computed across the
test set observations as:
Therefore, a perfect calibration (smoothed probabilities match predicted
probabilities for all observations) yields , while the worst
possible score is .
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvICI$new()
mlr_measures$get("surv.calib_index")
msr("surv.calib_index")
| Id | Type | Default | Levels | Range |
| time | numeric | - | |
|
| eps | numeric | 1e-04 | |
|
| method | character | ICI | ICI, E50, E90, Emax | - |
| na.rm | logical | TRUE | TRUE, FALSE | - |
Type: "surv"
Range:
Minimize: TRUE
Required prediction: distr
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 1e-04.
time (numeric(1))
The specific time point at which calibration is evaluated.
If NULL, the median observed time from the test set is used.
method (character(1))
Specifies the summary statistic used to calculate the final calibration score.
"ICI" (default): Uses the mean of absolute differences across all observations.
"E50": Uses the median of absolute differences instead of the mean.
"E90": Uses the 90th percentile of absolute differences, emphasizing higher deviations.
"Emax": Uses the maximum absolute difference, capturing the largest discrepancy between predicted and smoothed probabilities.
na.rm (logical(1))
If TRUE (default) then removes any NAs/NaNs in the smoothed probabilities
that may arise. A warning is issued nonetheless in such
cases.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvICI
new()
Creates a new instance of this R6 class.
MeasureSurvICI$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvICI$clone(deep = FALSE)
deepWhether to make a deep clone.
Austin, C. P, Harrell, E. F, van Klaveren, David (2020). “Graphical calibration curves and the integrated calibration index (ICI) for survival models.” Statistics in Medicine, 39(21), 2714. ISSN 10970258, doi:10.1002/SIM.8570, https://pmc.ncbi.nlm.nih.gov/articles/PMC7497089/.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other calibration survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.dcalib
Other distr survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ICI at median test set time p$score(msr("surv.calib_index")) # ICI at specific time point p$score(msr("surv.calib_index", time = 365)) # E50 at specific time point p$score(msr("surv.calib_index", method = "E50", time = 365))library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ICI at median test set time p$score(msr("surv.calib_index")) # ICI at specific time point p$score(msr("surv.calib_index", time = 365)) # E50 at specific time point p$score(msr("surv.calib_index", method = "E50", time = 365))
Calls survAUC::AUC.cd().
Assumes Cox PH model specification.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvChamblessAUC$new()
mlr_measures$get("surv.chambless_auc")
msr("surv.chambless_auc")
| Id | Type | Default | Levels |
| integrated | logical | TRUE | TRUE, FALSE |
| times | untyped | - |
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvChamblessAUC
new()
Creates a new instance of this R6 class.
MeasureSurvChamblessAUC$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvChamblessAUC$clone(deep = FALSE)
deepWhether to make a deep clone.
Chambless LE, Diao G (2006). “Estimation of time-dependent area under the ROC curve for long-term risk prediction.” Statistics in Medicine, 25(20), 3474–3486. doi:10.1002/sim.2299.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.hung_auc,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.chambless_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.chambless_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.chambless_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.chambless_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.chambless_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.chambless_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)
Calculates weighted concordance statistics, which, depending on the chosen
weighting method (weight_meth) and tied times parameter (tiex), are
equivalent to several proposed methods.
By default, no weighting is applied and this is equivalent to Harrell's C-index.
For the Kaplan-Meier estimate of the training survival distribution (),
and the Kaplan-Meier estimate of the training censoring distribution (),
we have the following options for time-independent concordance statistics
(C-indexes) given the weighted method:
weight_meth:
"I" = No weighting. (Harrell)
"GH" = Gonen and Heller's Concordance Index
"G" = Weights concordance by .
"G2" = Weights concordance by . (Uno et al.)
"SG" = Weights concordance by (Shemper et al.)
"S" = Weights concordance by (Peto and Peto)
The last three require training data. "GH" is only applicable to LearnerSurvCoxPH.
The implementation is slightly different from survival::concordance. Firstly this implementation is faster, and secondly the weights are computed on the training dataset whereas in survival::concordance the weights are computed on the same testing data.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvCindex$new()
mlr_measures$get("surv.cindex")
msr("surv.cindex")
| Id | Type | Default | Levels | Range |
| t_max | numeric | - | |
|
| p_max | numeric | - | |
|
| weight_meth | character | I | I, G, G2, SG, S, GH | - |
| tiex | numeric | 0.5 | |
|
| eps | numeric | 0.001 |
|
Type: "surv"
Range:
Minimize: FALSE
Required prediction: crank
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 0.001.
t_max (numeric(1))
Cutoff time (i.e. time horizon) to evaluate concordance up to.
p_max (numeric(1))
The proportion of censoring to evaluate concordance up to in the given dataset.
When t_max is specified, this parameter is ignored.
weight_meth (character(1))
Method for weighting concordance. Default "I" is Harrell's C. See details.
tiex (numeric(1))
Weighting applied to tied rankings, default is to give them half (0.5) weighting.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvCindex
new()
This is an abstract class that should not be constructed directly.
MeasureSurvCindex$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvCindex$clone(deep = FALSE)
deepWhether to make a deep clone.
Peto, Richard, Peto, Julian (1972). “Asymptotically efficient rank invariant test procedures.” Journal of the Royal Statistical Society: Series A (General), 135(2), 185–198.
Harrell, E F, Califf, M R, Pryor, B D, Lee, L K, Rosati, A R (1982). “Evaluating the yield of medical tests.” Jama, 247(18), 2543–2546.
Gonen M, Heller G (2005). “Concordance probability and discriminatory power in proportional hazards regression.” Biometrika, 92(4), 965–970. doi:10.1093/biomet/92.4.965.
Schemper, Michael, Wakounig, Samo, Heinze, Georg (2009). “The estimation of average hazard ratios by weighted Cox regression.” Statistics in Medicine, 28(19), 2473–2489. doi:10.1002/sim.3623.
Uno H, Cai T, Pencina MJ, D'Agostino RB, Wei LJ (2011). “On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data.” Statistics in Medicine, n/a–n/a. doi:10.1002/sim.4154.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
library(mlr3) task = tsk("rats") learner = lrn("surv.coxph") part = partition(task) # train/test split learner$train(task, part$train) p = learner$predict(task, part$test) # Harrell's C-index p$score(msr("surv.cindex")) # same as `p$score()` # Uno's C-index p$score(msr("surv.cindex", weight_meth = "G2"), task = task, train_set = part$train) # Harrell's C-index evaluated up to a specific time horizon p$score(msr("surv.cindex", t_max = 97)) # Harrell's C-index evaluated up to the time corresponding to 30% of censoring p$score(msr("surv.cindex", p_max = 0.3))library(mlr3) task = tsk("rats") learner = lrn("surv.coxph") part = partition(task) # train/test split learner$train(task, part$train) p = learner$predict(task, part$test) # Harrell's C-index p$score(msr("surv.cindex")) # same as `p$score()` # Uno's C-index p$score(msr("surv.cindex", weight_meth = "G2"), task = task, train_set = part$train) # Harrell's C-index evaluated up to a specific time horizon p$score(msr("surv.cindex", t_max = 97)) # Harrell's C-index evaluated up to the time corresponding to 30% of censoring p$score(msr("surv.cindex", p_max = 0.3))
This calibration method is defined by calculating the following statistic:
where is number of 'buckets' (that equally divide into intervals),
is the number of predictions, and is the observed proportion
of observations in the th interval. An observation is assigned to the
th bucket, if its predicted survival probability at the time of event
falls within the corresponding interval.
This statistic assumes that censoring time is independent of death time.
A model is well D-calibrated if , tested with chisq.test
( if well-calibrated, i.e. higher p-values are preferred).
Model is better calibrated than model if ,
meaning that lower values of this measure are preferred.
This measure can either return the test statistic or the p-value from the chisq.test.
The former is useful for model comparison whereas the latter is useful for determining if a model
is well-calibrated. If chisq = FALSE and s is the predicted value then you can manually
compute the p.value with pchisq(s, B - 1, lower.tail = FALSE).
NOTE: This measure is still experimental both theoretically and in implementation. Results should therefore only be taken as an indicator of performance and not for conclusive judgements about model calibration.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvDCalibration$new()
mlr_measures$get("surv.dcalib")
msr("surv.dcalib")
| Id | Type | Default | Levels | Range |
| B | integer | 10 | |
|
| chisq | logical | FALSE | TRUE, FALSE | - |
| truncate | numeric | Inf |
|
Type: "surv"
Range:
Minimize: TRUE
Required prediction: distr
B (integer(1))
Number of buckets to test for uniform predictions over.
Default of 10 is recommended by Haider et al. (2020).
Changing this parameter affects truncate.
chisq (logical(1))
If TRUE returns the p-value of the corresponding chisq.test instead of the measure.
Default is FALSE and returns the statistic s.
You can manually get the p-value by executing pchisq(s, B - 1, lower.tail = FALSE).
The null hypothesis is that the model is D-calibrated.
truncate (double(1))
This parameter controls the upper bound of the output statistic, when chisq is FALSE.
We use truncate = Inf by default but values between are sufficient
for most purposes, which correspond to p-values of for the chisq.test using
the default buckets.
Values translate to even lower p-values and thus less D-calibrated models.
If the number of buckets changes, you probably will want to
change the truncate value as well to correspond to the same p-value significance.
Note that truncation may severely limit automated tuning with this measure.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvDCalibration
new()
Creates a new instance of this R6 class.
MeasureSurvDCalibration$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvDCalibration$clone(deep = FALSE)
deepWhether to make a deep clone.
Haider, Humza, Hoehn, Bret, Davis, Sarah, Greiner, Russell (2020). “Effective Ways to Build and Evaluate Individual Survival Distributions.” Journal of Machine Learning Research, 21(85), 1–63. https://jmlr.org/papers/v21/18-772.html.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other calibration survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index
Other distr survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
Calculates the Integrated Survival Brier Score (ISBS), Integrated Graf Score or squared survival loss.
This measure has two dimensions: (test set) observations and time points.
For a specific individual from the test set, with observed survival
outcome (time and censoring indicator) and predicted
survival function , the observation-wise estimator of the loss,
integrated across the time dimension up to the time cutoff , is:
where is the Kaplan-Meier estimate of the censoring distribution.
The implementation uses the trapezoidal rule to approximate the integral over
time and the integral is normalized by the range of available evaluation times
().
To get a single score across all observations of the test set, we
return the average of the time-integrated observation-wise scores:
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvGraf$new()
mlr_measures$get("surv.graf")
msr("surv.graf")
| Id | Type | Default | Levels | Range |
| integrated | logical | TRUE | TRUE, FALSE | - |
| times | untyped | - | - | |
| t_max | numeric | - | |
|
| p_max | numeric | - | |
|
| eps | numeric | 0.001 | |
|
| ERV | logical | FALSE | TRUE, FALSE | - |
Type: "surv"
Range:
Minimize: TRUE
Required prediction: distr
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
t_max (numeric(1))
Cutoff time (i.e. time horizon) to evaluate the measure up to
(truncate ).
Mutually exclusive with p_max or times.
It's recommended to set t_max to avoid division by eps, see "Time Cutoff Details" section.
If t_max is not specified, an Inf time horizon is assumed.
p_max (numeric(1))
The proportion of censoring to integrate up to in the given dataset.
Mutually exclusive with times or t_max.
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 0.001.
ERV (logical(1))
If TRUE then the Explained Residual Variation method is applied, which
means the score is standardized against a Kaplan-Meier baseline.
Default is FALSE.
ISBS is not a proper scoring rule, see Sonabend et al. (2024) for more details.
The assumptions for consistent estimation of the loss are that the censoring
distribution is independent of the survival distribution and
is fit on a sufficiently large dataset.
If the times argument is not specified (NULL), then the sorted unique
time points from the test set are used for evaluation of the
time-integrated score.
This was a design decision due to the fact that different predicted survival
distributions usually have a discretized time domain which may
differ, i.e. in the case the survival predictions come from different survival
learners.
Essentially, using the same set of time points for the calculation of the score
minimizes the bias that would come from using different time points.
We note that we perform constant interpolation of for time points
that fall outside its discretized time domain.
Naturally, if the times argument is specified, then exactly these time
points are used for evaluation.
A warning is given to the user in case some of the specified times fall outside
of the time point range of the test set.
The assumption here is that if the test set is large enough, it should have a
time domain/range similar to the one from the train set, and therefore time
points outside that domain might lead to unwanted extrapolation of .
If task and train_set are passed to $score then is fit using
all observations from the train set, otherwise the test set is used.
Using the train set is likely to reduce any bias caused by calculating parts of the
measure on the test data it is evaluating.
Also usually it means that more data is used for fitting the censoring
distribution via the Kaplan-Meier.
The training data is automatically used in scoring resamplings.
If t_max or p_max is given, then the predicted survival function is
truncated at the time cutoff for all observations. This helps mitigate
inflation of the score which can occur when an observation is censored
at the last observed time. In such cases, , triggering the use
of a small constant eps instead, see Kvamme et al. (2023).
Not using a t_max can lead to misleading evaluation, violations of properness
and poor optimization outcomes when using this score for model tuning, see
Sonabend et al. (2024).
If comparing the integrated Graf score to other packages, e.g. pec,
results may be very slightly different as this package uses survfit to estimate
the censoring distribution, in line with the Graf 1999 paper; whereas some
other packages use prodlim with reverse = TRUE (meaning Kaplan-Meier is
not used).
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvGraf
new()
Creates a new instance of this R6 class.
MeasureSurvGraf$new(ERV = FALSE)
ERV(logical(1))
Standardize measure against a Kaplan-Meier baseline
(Explained Residual Variation)
clone()
The objects of this class are cloneable with this method.
MeasureSurvGraf$clone(deep = FALSE)
deepWhether to make a deep clone.
Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999). “Assessment and comparison of prognostic classification schemes for survival data.” Statistics in Medicine, 18(17-18), 2529–2545. doi:10.1002/(sici)1097-0258(19990915/30)18:17/18<2529::aid-sim274>3.0.co;2-5.
Sonabend, Raphael, Zobolas, John, Kopper, Philipp, Burk, Lukas, Bender, Andreas (2024). “Examining properness in the external validation of survival models with squared and logarithmic losses.” https://arxiv.org/abs/2212.05260v3.
Kvamme, Havard, Borgan, Ornulf (2023). “The Brier Score under Administrative Censoring: Problems and a Solution.” Journal of Machine Learning Research, 24(2), 1–26. ISSN 1533-7928, http://jmlr.org/papers/v24/19-1030.html.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other Probabilistic survival measures:
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
Other distr survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ISBS, G(t) calculated using the test set p$score(msr("surv.graf")) # ISBS, G(t) calculated using the train set (always recommended) p$score(msr("surv.graf"), task = task, train_set = part$train) # ISBS, ERV score (comparing with KM baseline) p$score(msr("surv.graf", ERV = TRUE), task = task, train_set = part$train) # ISBS at specific time point p$score(msr("surv.graf", times = 365), task = task, train_set = part$train) # ISBS at multiple time points (integrated) p$score(msr("surv.graf", times = c(125, 365, 450), integrated = TRUE), task = task, train_set = part$train) # ISBS, use time cutoff p$score(msr("surv.graf", t_max = 700), task = task, train_set = part$train) # ISBS, use time cutoff corresponding to specific proportion of censoring on the test set p$score(msr("surv.graf", p_max = 0.8), task = task, train_set = part$train)library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ISBS, G(t) calculated using the test set p$score(msr("surv.graf")) # ISBS, G(t) calculated using the train set (always recommended) p$score(msr("surv.graf"), task = task, train_set = part$train) # ISBS, ERV score (comparing with KM baseline) p$score(msr("surv.graf", ERV = TRUE), task = task, train_set = part$train) # ISBS at specific time point p$score(msr("surv.graf", times = 365), task = task, train_set = part$train) # ISBS at multiple time points (integrated) p$score(msr("surv.graf", times = c(125, 365, 450), integrated = TRUE), task = task, train_set = part$train) # ISBS, use time cutoff p$score(msr("surv.graf", t_max = 700), task = task, train_set = part$train) # ISBS, use time cutoff corresponding to specific proportion of censoring on the test set p$score(msr("surv.graf", p_max = 0.8), task = task, train_set = part$train)
Calls survAUC::AUC.hc().
Assumes random censoring.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvHungAUC$new()
mlr_measures$get("surv.hung_auc")
msr("surv.hung_auc")
| Id | Type | Default | Levels |
| integrated | logical | TRUE | TRUE, FALSE |
| times | untyped | - |
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvHungAUC
new()
Creates a new instance of this R6 class.
MeasureSurvHungAUC$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvHungAUC$clone(deep = FALSE)
deepWhether to make a deep clone.
Hung H, Chiang C (2010). “Estimation methods for time-dependent AUC models with survival data.” The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 38(1), 8–26. https://www.jstor.org/stable/27805213.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.hung_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.hung_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.hung_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.hung_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.hung_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.hung_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)
Calculates the Integrated Survival Log-Likelihood (ISLL) or Integrated Logarithmic (log) Loss, aka integrated cross entropy.
This measure has two dimensions: (test set) observations and time points.
For a specific individual from the test set, with observed survival
outcome (time and censoring indicator) and predicted
survival function , the observation-wise estimator of the loss,
integrated across the time dimension up to the time cutoff , is:
where is the Kaplan-Meier estimate of the censoring distribution.
The implementation uses the trapezoidal rule to approximate the integral over
time and the integral is normalized by the range of available evaluation times
().
To get a single score across all observations of the test set, we
return the average of the time-integrated observation-wise scores:
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvIntLogloss$new()
mlr_measures$get("surv.intlogloss")
msr("surv.intlogloss")
| Id | Type | Default | Levels | Range |
| integrated | logical | TRUE | TRUE, FALSE | - |
| times | untyped | - | - | |
| t_max | numeric | - | |
|
| p_max | numeric | - | |
|
| eps | numeric | 0.001 | |
|
| ERV | logical | FALSE | TRUE, FALSE | - |
Type: "surv"
Range:
Minimize: TRUE
Required prediction: distr
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
t_max (numeric(1))
Cutoff time (i.e. time horizon) to evaluate the measure up to
(truncate ).
Mutually exclusive with p_max or times.
It's recommended to set t_max to avoid division by eps, see "Time Cutoff Details" section.
If t_max is not specified, an Inf time horizon is assumed.
p_max (numeric(1))
The proportion of censoring to integrate up to in the given dataset.
Mutually exclusive with times or t_max.
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 0.001.
ERV (logical(1))
If TRUE then the Explained Residual Variation method is applied, which
means the score is standardized against a Kaplan-Meier baseline.
Default is FALSE.
ISLL is not a proper scoring rule, see Sonabend et al. (2024) for more details.
The assumptions for consistent estimation of the loss are that the censoring
distribution is independent of the survival distribution and
is fit on a sufficiently large dataset.
If the times argument is not specified (NULL), then the sorted unique
time points from the test set are used for evaluation of the
time-integrated score.
This was a design decision due to the fact that different predicted survival
distributions usually have a discretized time domain which may
differ, i.e. in the case the survival predictions come from different survival
learners.
Essentially, using the same set of time points for the calculation of the score
minimizes the bias that would come from using different time points.
We note that we perform constant interpolation of for time points
that fall outside its discretized time domain.
Naturally, if the times argument is specified, then exactly these time
points are used for evaluation.
A warning is given to the user in case some of the specified times fall outside
of the time point range of the test set.
The assumption here is that if the test set is large enough, it should have a
time domain/range similar to the one from the train set, and therefore time
points outside that domain might lead to unwanted extrapolation of .
If task and train_set are passed to $score then is fit using
all observations from the train set, otherwise the test set is used.
Using the train set is likely to reduce any bias caused by calculating parts of the
measure on the test data it is evaluating.
Also usually it means that more data is used for fitting the censoring
distribution via the Kaplan-Meier.
The training data is automatically used in scoring resamplings.
If t_max or p_max is given, then the predicted survival function is
truncated at the time cutoff for all observations. This helps mitigate
inflation of the score which can occur when an observation is censored
at the last observed time. In such cases, , triggering the use
of a small constant eps instead, see Kvamme et al. (2023).
Not using a t_max can lead to misleading evaluation, violations of properness
and poor optimization outcomes when using this score for model tuning, see
Sonabend et al. (2024).
If comparing the integrated Graf score to other packages, e.g. pec,
results may be very slightly different as this package uses survfit to estimate
the censoring distribution, in line with the Graf 1999 paper; whereas some
other packages use prodlim with reverse = TRUE (meaning Kaplan-Meier is
not used).
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvIntLogloss
new()
Creates a new instance of this R6 class.
MeasureSurvIntLogloss$new(ERV = FALSE)
ERV(logical(1))
Standardize measure against a Kaplan-Meier baseline
(Explained Residual Variation)
clone()
The objects of this class are cloneable with this method.
MeasureSurvIntLogloss$clone(deep = FALSE)
deepWhether to make a deep clone.
Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999). “Assessment and comparison of prognostic classification schemes for survival data.” Statistics in Medicine, 18(17-18), 2529–2545. doi:10.1002/(sici)1097-0258(19990915/30)18:17/18<2529::aid-sim274>3.0.co;2-5.
Sonabend, Raphael, Zobolas, John, Kopper, Philipp, Burk, Lukas, Bender, Andreas (2024). “Examining properness in the external validation of survival models with squared and logarithmic losses.” https://arxiv.org/abs/2212.05260v3.
Kvamme, Havard, Borgan, Ornulf (2023). “The Brier Score under Administrative Censoring: Problems and a Solution.” Journal of Machine Learning Research, 24(2), 1–26. ISSN 1533-7928, http://jmlr.org/papers/v24/19-1030.html.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other Probabilistic survival measures:
mlr_measures_surv.graf,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
Other distr survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ISLL, G(t) calculated using the test set p$score(msr("surv.intlogloss")) # ISLL, G(t) calculated using the train set (always recommended) p$score(msr("surv.intlogloss"), task = task, train_set = part$train) # ISLL, ERV score (comparing with KM baseline) p$score(msr("surv.intlogloss", ERV = TRUE), task = task, train_set = part$train) # ISLL at specific time point p$score(msr("surv.intlogloss", times = 365), task = task, train_set = part$train) # ISLL at multiple time points (integrated) p$score(msr("surv.intlogloss", times = c(125, 365, 450), integrated = TRUE), task = task, train_set = part$train) # ISLL, use time cutoff p$score(msr("surv.intlogloss", t_max = 700), task = task, train_set = part$train) # ISLL, use time cutoff corresponding to specific proportion of censoring on the test set p$score(msr("surv.intlogloss", p_max = 0.8), task = task, train_set = part$train)library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ISLL, G(t) calculated using the test set p$score(msr("surv.intlogloss")) # ISLL, G(t) calculated using the train set (always recommended) p$score(msr("surv.intlogloss"), task = task, train_set = part$train) # ISLL, ERV score (comparing with KM baseline) p$score(msr("surv.intlogloss", ERV = TRUE), task = task, train_set = part$train) # ISLL at specific time point p$score(msr("surv.intlogloss", times = 365), task = task, train_set = part$train) # ISLL at multiple time points (integrated) p$score(msr("surv.intlogloss", times = c(125, 365, 450), integrated = TRUE), task = task, train_set = part$train) # ISLL, use time cutoff p$score(msr("surv.intlogloss", t_max = 700), task = task, train_set = part$train) # ISLL, use time cutoff corresponding to specific proportion of censoring on the test set p$score(msr("surv.intlogloss", p_max = 0.8), task = task, train_set = part$train)
Calculates the cross-entropy, or negative log-likelihood (NLL) or logarithmic (log) loss.
The (observation-wise) Log-Likelihood is defined as the negative logarithm of
the predicted probability density function , evaluated at the
observation time (event or censoring):
This loss does not take into account the censoring status of an observation, treating all outcomes as events, and is also an improper scoring rule, see Sonabend et al. (2024). See section Interpolation for implementation details.
To get a single score across all observations of the test set, we
return the average of the observation-wise scores:
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvLogloss$new()
mlr_measures$get("surv.logloss")
msr("surv.logloss")
| Id | Type | Default | Levels | Range |
| eps | numeric | 1e-06 | |
|
| ERV | logical | FALSE | TRUE, FALSE | - |
Type: "surv"
Range:
Minimize: TRUE
Required prediction: distr
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 1e-06.
ERV (logical(1))
If TRUE then the Explained Residual Variation method is applied, which
means the score is standardized against a Kaplan-Meier baseline.
Default is FALSE.
To evaluate scores involving subject-specific survival functions
, we perform linear interpolation on the discrete survival
values provided in the prediction.
Duplicate survival values are removed prior to interpolation to ensure strict
monotonicity and non-negative density values.
Therefore we are left with the distinct survival time points
and the corresponding survival values .
Interpolation is performed using base R’s approx() with method = "linear"
and rule = 2, ensuring:
Left extrapolation (for ) assumes and uses
the slope from to .
Right extrapolation (for ) uses the slope from the last
interval to , with results
truncated at 0 to preserve non-negativity.
This ensures a continuous, piecewise-linear survival function that
satisfies and remains non-increasing and non-negative across
the entire domain.
The density at time point , with , is
estimated as follows:
This corresponds to the (negative) slope of the between the closest
grid point after and itself.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvLogloss
new()
Creates a new instance of this R6 class.
MeasureSurvLogloss$new(ERV = FALSE)
ERV(logical(1))
Standardize measure against a Kaplan-Meier baseline
(Explained Residual Variation)
clone()
The objects of this class are cloneable with this method.
MeasureSurvLogloss$clone(deep = FALSE)
deepWhether to make a deep clone.
Sonabend, Raphael, Zobolas, John, Kopper, Philipp, Burk, Lukas, Bender, Andreas (2024). “Examining properness in the external validation of survival models with squared and logarithmic losses.” https://arxiv.org/abs/2212.05260v3.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other Probabilistic survival measures:
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
Other distr survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid
Calculates the mean absolute error (MAE).
The MAE is defined by
where is the true value and is the prediction.
Censored observations in the test set are ignored.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvMAE$new()
mlr_measures$get("surv.mae")
msr("surv.mae")
Empty ParamSet
Type: "surv"
Range:
Minimize: TRUE
Required prediction: response
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvMAE
new()
Creates a new instance of this R6 class.
MeasureSurvMAE$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvMAE$clone(deep = FALSE)
deepWhether to make a deep clone.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other response survival measures:
mlr_measures_surv.mse,
mlr_measures_surv.rmse
Calculates the mean squared error (MSE).
The MSE is defined by
where is the true value and is the prediction.
Censored observations in the test set are ignored.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvMSE$new()
mlr_measures$get("surv.mse")
msr("surv.mse")
Empty ParamSet
Type: "surv"
Range:
Minimize: TRUE
Required prediction: response
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvMSE
new()
Creates a new instance of this R6 class.
MeasureSurvMSE$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvMSE$clone(deep = FALSE)
deepWhether to make a deep clone.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other response survival measures:
mlr_measures_surv.mae,
mlr_measures_surv.rmse
Calls survAUC::Nagelk().
Assumes Cox PH model specification.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvNagelkR2$new()
mlr_measures$get("surv.nagelk_r2")
msr("surv.nagelk_r2")
Empty ParamSet
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvNagelkR2
new()
Creates a new instance of this R6 class.
MeasureSurvNagelkR2$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvNagelkR2$clone(deep = FALSE)
deepWhether to make a deep clone.
Nagelkerke, JD N, others (1991). “A note on a general definition of the coefficient of determination.” Biometrika, 78(3), 691–692.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other R2 survival measures:
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.xu_r2
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Calls survAUC::OXS().
Assumes Cox PH model specification.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvOQuigleyR2$new()
mlr_measures$get("surv.oquigley_r2")
msr("surv.oquigley_r2")
Empty ParamSet
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvOQuigleyR2
new()
Creates a new instance of this R6 class.
MeasureSurvOQuigleyR2$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvOQuigleyR2$clone(deep = FALSE)
deepWhether to make a deep clone.
O'Quigley J, Xu R, Stare J (2005). “Explained randomness in proportional hazards models.” Statistics in Medicine, 24(3), 479–489. doi:10.1002/sim.1946.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other R2 survival measures:
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.xu_r2
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Calculates the right-censored log-likelihood (RCLL) or logarithmic loss, introduced by Avati et al. (2020).
The observation-wise RCLL is defined by:
where is the censoring indicator, the predicted probability
density function and the predicted survival function for observation .
RCLL is proper given that censoring and survival distribution are independent, see Rindt et al. (2022).
Simulation studies by Sonabend et al. (2024) provide strong empirical evidence
supporting the properness of this score.
See section Interpolation for implementation details.
To get a single score across all observations of the test set, we
return the average of the observation-wise scores:
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvRCLL$new()
mlr_measures$get("surv.rcll")
msr("surv.rcll")
| Id | Type | Default | Levels | Range |
| eps | numeric | 1e-06 | |
|
| ERV | logical | FALSE | TRUE, FALSE | - |
Type: "surv"
Range:
Minimize: TRUE
Required prediction: distr
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 1e-06.
ERV (logical(1))
If TRUE then the Explained Residual Variation method is applied, which
means the score is standardized against a Kaplan-Meier baseline.
Default is FALSE.
To evaluate scores involving subject-specific survival functions
, we perform linear interpolation on the discrete survival
values provided in the prediction.
Duplicate survival values are removed prior to interpolation to ensure strict
monotonicity and non-negative density values.
Therefore we are left with the distinct survival time points
and the corresponding survival values .
Interpolation is performed using base R’s approx() with method = "linear"
and rule = 2, ensuring:
Left extrapolation (for ) assumes and uses
the slope from to .
Right extrapolation (for ) uses the slope from the last
interval to , with results
truncated at 0 to preserve non-negativity.
This ensures a continuous, piecewise-linear survival function that
satisfies and remains non-increasing and non-negative across
the entire domain.
The density at time point , with , is
estimated as follows:
This corresponds to the (negative) slope of the between the closest
grid point after and itself.
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvRCLL
new()
Creates a new instance of this R6 class.
MeasureSurvRCLL$new(ERV = FALSE)
ERV(logical(1))
Standardize measure against a Kaplan-Meier baseline
(Explained Residual Variation)
clone()
The objects of this class are cloneable with this method.
MeasureSurvRCLL$clone(deep = FALSE)
deepWhether to make a deep clone.
Avati, Anand, Duan, Tony, Zhou, Sharon, Jung, Kenneth, Shah, H N, Ng, Y A (2020). “Countdown Regression: Sharp and Calibrated Survival Predictions.” Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, 115(4), 145–155. https://proceedings.mlr.press/v115/avati20a.html.
Rindt, David, Hu, Robert, Steinsaltz, David, Sejdinovic, Dino (2022). “Survival regression with proper scoring rules and monotonic neural networks.” Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, 151(4), 1190–1205. https://proceedings.mlr.press/v151/rindt22a.html.
Sonabend, Raphael, Zobolas, John, Kopper, Philipp, Burk, Lukas, Bender, Andreas (2024). “Examining properness in the external validation of survival models with squared and logarithmic losses.” https://arxiv.org/abs/2212.05260v3.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other Probabilistic survival measures:
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.schmid
Other distr survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.schmid
Calculates the root mean squared error (RMSE).
The RMSE is defined by
where is the true value and is the prediction.
Censored observations in the test set are ignored.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvRMSE$new()
mlr_measures$get("surv.rmse")
msr("surv.rmse")
Empty ParamSet
Type: "surv"
Range:
Minimize: TRUE
Required prediction: response
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvRMSE
new()
Creates a new instance of this R6 class.
MeasureSurvRMSE$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvRMSE$clone(deep = FALSE)
deepWhether to make a deep clone.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other response survival measures:
mlr_measures_surv.mae,
mlr_measures_surv.mse
Calculates the Integrated Schmid Score (ISS), aka integrated absolute loss.
This measure has two dimensions: (test set) observations and time points.
For a specific individual from the test set, with observed survival
outcome (time and censoring indicator) and predicted
survival function , the observation-wise estimator of the loss,
integrated across the time dimension up to the time cutoff , is:
where is the Kaplan-Meier estimate of the censoring distribution.
The implementation uses the trapezoidal rule to approximate the integral over
time and the integral is normalized by the range of available evaluation times
().
To get a single score across all observations of the test set, we
return the average of the time-integrated observation-wise scores:
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvSchmid$new()
mlr_measures$get("surv.schmid")
msr("surv.schmid")
| Id | Type | Default | Levels | Range |
| integrated | logical | TRUE | TRUE, FALSE | - |
| times | untyped | - | - | |
| t_max | numeric | - | |
|
| p_max | numeric | - | |
|
| eps | numeric | 0.001 | |
|
| ERV | logical | FALSE | TRUE, FALSE | - |
Type: "surv"
Range:
Minimize: TRUE
Required prediction: distr
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
t_max (numeric(1))
Cutoff time (i.e. time horizon) to evaluate the measure up to
(truncate ).
Mutually exclusive with p_max or times.
It's recommended to set t_max to avoid division by eps, see "Time Cutoff Details" section.
If t_max is not specified, an Inf time horizon is assumed.
p_max (numeric(1))
The proportion of censoring to integrate up to in the given dataset.
Mutually exclusive with times or t_max.
eps (numeric(1))
Very small number to substitute near-zero values in order to prevent errors
in e.g. log(0) and/or division-by-zero calculations.
Default value is 0.001.
ERV (logical(1))
If TRUE then the Explained Residual Variation method is applied, which
means the score is standardized against a Kaplan-Meier baseline.
Default is FALSE.
ISS is not a proper scoring rule, see Sonabend et al. (2024) for more details.
The assumptions for consistent estimation of the loss are that the censoring
distribution is independent of the survival distribution and
is fit on a sufficiently large dataset.
If the times argument is not specified (NULL), then the sorted unique
time points from the test set are used for evaluation of the
time-integrated score.
This was a design decision due to the fact that different predicted survival
distributions usually have a discretized time domain which may
differ, i.e. in the case the survival predictions come from different survival
learners.
Essentially, using the same set of time points for the calculation of the score
minimizes the bias that would come from using different time points.
We note that we perform constant interpolation of for time points
that fall outside its discretized time domain.
Naturally, if the times argument is specified, then exactly these time
points are used for evaluation.
A warning is given to the user in case some of the specified times fall outside
of the time point range of the test set.
The assumption here is that if the test set is large enough, it should have a
time domain/range similar to the one from the train set, and therefore time
points outside that domain might lead to unwanted extrapolation of .
If task and train_set are passed to $score then is fit using
all observations from the train set, otherwise the test set is used.
Using the train set is likely to reduce any bias caused by calculating parts of the
measure on the test data it is evaluating.
Also usually it means that more data is used for fitting the censoring
distribution via the Kaplan-Meier.
The training data is automatically used in scoring resamplings.
If t_max or p_max is given, then the predicted survival function is
truncated at the time cutoff for all observations. This helps mitigate
inflation of the score which can occur when an observation is censored
at the last observed time. In such cases, , triggering the use
of a small constant eps instead, see Kvamme et al. (2023).
Not using a t_max can lead to misleading evaluation, violations of properness
and poor optimization outcomes when using this score for model tuning, see
Sonabend et al. (2024).
If comparing the integrated Graf score to other packages, e.g. pec,
results may be very slightly different as this package uses survfit to estimate
the censoring distribution, in line with the Graf 1999 paper; whereas some
other packages use prodlim with reverse = TRUE (meaning Kaplan-Meier is
not used).
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvSchmid
new()
Creates a new instance of this R6 class.
MeasureSurvSchmid$new(ERV = FALSE)
ERV(logical(1))
Standardize measure against a Kaplan-Meier baseline
(Explained Residual Variation)
clone()
The objects of this class are cloneable with this method.
MeasureSurvSchmid$clone(deep = FALSE)
deepWhether to make a deep clone.
Schemper, Michael, Henderson, Robin (2000). “Predictive Accuracy and Explained Variation in Cox Regression.” Biometrics, 56, 249–255. doi:10.1002/sim.1486.
Schmid, Matthias, Hielscher, Thomas, Augustin, Thomas, Gefeller, Olaf (2011). “A Robust Alternative to the Schemper-Henderson Estimator of Prediction Error.” Biometrics, 67(2), 524–535. doi:10.1111/j.1541-0420.2010.01459.x.
Sonabend, Raphael, Zobolas, John, Kopper, Philipp, Burk, Lukas, Bender, Andreas (2024). “Examining properness in the external validation of survival models with squared and logarithmic losses.” https://arxiv.org/abs/2212.05260v3.
Kvamme, Havard, Borgan, Ornulf (2023). “The Brier Score under Administrative Censoring: Problems and a Solution.” Journal of Machine Learning Research, 24(2), 1–26. ISSN 1533-7928, http://jmlr.org/papers/v24/19-1030.html.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other Probabilistic survival measures:
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll
Other distr survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_index,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.rcll
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ISS, G(t) calculated using the test set p$score(msr("surv.schmid")) # ISS, G(t) calculated using the train set (always recommended) p$score(msr("surv.schmid"), task = task, train_set = part$train) # ISS, ERV score (comparing with KM baseline) p$score(msr("surv.schmid", ERV = TRUE), task = task, train_set = part$train) # ISS at specific time point p$score(msr("surv.schmid", times = 365), task = task, train_set = part$train) # ISS at multiple time points (integrated) p$score(msr("surv.schmid", times = c(125, 365, 450), integrated = TRUE), task = task, train_set = part$train) # ISS, use time cutoff p$score(msr("surv.schmid", t_max = 700), task = task, train_set = part$train) # ISS, use time cutoff corresponding to specific proportion of censoring on the test set p$score(msr("surv.schmid", p_max = 0.8), task = task, train_set = part$train)library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # ISS, G(t) calculated using the test set p$score(msr("surv.schmid")) # ISS, G(t) calculated using the train set (always recommended) p$score(msr("surv.schmid"), task = task, train_set = part$train) # ISS, ERV score (comparing with KM baseline) p$score(msr("surv.schmid", ERV = TRUE), task = task, train_set = part$train) # ISS at specific time point p$score(msr("surv.schmid", times = 365), task = task, train_set = part$train) # ISS at multiple time points (integrated) p$score(msr("surv.schmid", times = c(125, 365, 450), integrated = TRUE), task = task, train_set = part$train) # ISS, use time cutoff p$score(msr("surv.schmid", t_max = 700), task = task, train_set = part$train) # ISS, use time cutoff corresponding to specific proportion of censoring on the test set p$score(msr("surv.schmid", p_max = 0.8), task = task, train_set = part$train)
Calls survAUC::AUC.sh().
Assumes Cox PH model specification.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvSongAUC$new()
mlr_measures$get("surv.song_auc")
msr("surv.song_auc")
| Id | Type | Default | Levels |
| times | untyped | - | |
| integrated | logical | TRUE | TRUE, FALSE |
| type | character | incident | incident, cumulative |
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
type (character(1))
A string defining the type of true positive rate (TPR): incident refers to
incident TPR, cumulative refers to cumulative TPR.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvSongAUC
new()
Creates a new instance of this R6 class.
MeasureSurvSongAUC$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvSongAUC$clone(deep = FALSE)
deepWhether to make a deep clone.
Song, Xiao, Zhou, Xiao-Hua (2008). “A semiparametric approach for the covariate specific ROC curve with survival outcome.” Statistica Sinica, 18(3), 947–65. https://www.jstor.org/stable/24308524.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.song_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.song_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.song_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.song_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.song_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.song_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)
Calls survAUC::spec.sh().
Assumes Cox PH model specification.
times and lp_thresh are arbitrarily set to 0 to prevent crashing, these
should be further specified.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvSongTNR$new()
mlr_measures$get("surv.song_tnr")
msr("surv.song_tnr")
| Id | Type | Default | Range |
| times | numeric | - | |
| lp_thresh | numeric | 0 |
|
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
lp_thresh (numeric(1))
Determines the cutoff threshold of the linear predictor in the
calculation of the TPR/TNR scores.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvSongTNR
new()
Creates a new instance of this R6 class.
MeasureSurvSongTNR$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvSongTNR$clone(deep = FALSE)
deepWhether to make a deep clone.
Song, Xiao, Zhou, Xiao-Hua (2008). “A semiparametric approach for the covariate specific ROC curve with survival outcome.” Statistica Sinica, 18(3), 947–65. https://www.jstor.org/stable/24308524.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Calls survAUC::sens.sh().
Assumes Cox PH model specification.
times and lp_thresh are arbitrarily set to 0 to prevent crashing, these
should be further specified.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvSongTPR$new()
mlr_measures$get("surv.song_tpr")
msr("surv.song_tpr")
| Id | Type | Default | Levels | Range |
| times | numeric | - | |
|
| lp_thresh | numeric | 0 | |
|
| type | character | incident | incident, cumulative | - |
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
lp_thresh (numeric(1))
Determines the cutoff threshold of the linear predictor in the
calculation of the TPR/TNR scores.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvSongTPR
new()
Creates a new instance of this R6 class.
MeasureSurvSongTPR$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvSongTPR$clone(deep = FALSE)
deepWhether to make a deep clone.
Song, Xiao, Zhou, Xiao-Hua (2008). “A semiparametric approach for the covariate specific ROC curve with survival outcome.” Statistica Sinica, 18(3), 947–65. https://www.jstor.org/stable/24308524.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Calls survAUC::AUC.uno().
Assumes random censoring.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvUnoAUC$new()
mlr_measures$get("surv.uno_auc")
msr("surv.uno_auc")
| Id | Type | Default | Levels |
| integrated | logical | TRUE | TRUE, FALSE |
| times | untyped | - |
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
integrated (logical(1))
If TRUE (default), returns the integrated score (eg across
time points); otherwise, not integrated (eg at a single time point).
times (numeric())
If integrated == TRUE then a vector of time-points over which to integrate the score.
If integrated == FALSE then a single time point at which to return the score.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvUnoAUC
new()
Creates a new instance of this R6 class.
MeasureSurvUnoAUC$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvUnoAUC$clone(deep = FALSE)
deepWhether to make a deep clone.
Uno H, Cai T, Tian L, Wei LJ (2007). “Evaluating Prediction Rules fort-Year Survivors With Censored Regression Models.” Journal of the American Statistical Association, 102(478), 527–537. doi:10.1198/016214507000000149.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.uno_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.uno_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.uno_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)library(mlr3) # Define a survival Task task = tsk("lung") # Create train and test set part = partition(task) # Train Cox learner on the train set cox = lrn("surv.coxph") cox$train(task, row_ids = part$train) # Make predictions for the test set p = cox$predict(task, row_ids = part$test) # Integrated AUC score p$score(msr("surv.uno_auc"), task = task, train_set = part$train, learner = cox) # AUC at specific time point p$score(msr("surv.uno_auc", times = 600), task = task, train_set = part$train, learner = cox) # Integrated AUC at specific time points p$score(msr("surv.uno_auc", times = c(100, 200, 300, 400, 500)), task = task, train_set = part$train, learner = cox)
Calls survAUC::spec.uno().
Assumes random censoring.
times and lp_thresh are arbitrarily set to 0 to prevent crashing, these
should be further specified.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvUnoTNR$new()
mlr_measures$get("surv.uno_tnr")
msr("surv.uno_tnr")
| Id | Type | Default | Range |
| times | numeric | - | |
| lp_thresh | numeric | 0 |
|
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
times (numeric())
A vector of time-points at which we calculate the TPR/TNR scores.
lp_thresh (numeric(1))
Determines the cutoff threshold of the linear predictor in the
calculation of the TPR/TNR scores.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvUnoTNR
new()
Creates a new instance of this R6 class.
MeasureSurvUnoTNR$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvUnoTNR$clone(deep = FALSE)
deepWhether to make a deep clone.
Uno H, Cai T, Tian L, Wei LJ (2007). “Evaluating Prediction Rules fort-Year Survivors With Censored Regression Models.” Journal of the American Statistical Association, 102(478), 527–537. doi:10.1198/016214507000000149.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tpr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tpr,
mlr_measures_surv.xu_r2
Calls survAUC::sens.uno().
Assumes random censoring.
times and lp_thresh are arbitrarily set to 0 to prevent crashing, these
should be further specified.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvUnoTPR$new()
mlr_measures$get("surv.uno_tpr")
msr("surv.uno_tpr")
| Id | Type | Default | Range |
| times | numeric | - | |
| lp_thresh | numeric | 0 |
|
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
times (numeric())
A vector of time-points at which we calculate the TPR/TNR scores.
lp_thresh (numeric(1))
Determines the cutoff threshold of the linear predictor in the
calculation of the TPR/TNR scores.
mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvAUC -> MeasureSurvUnoTPR
new()
Creates a new instance of this R6 class.
MeasureSurvUnoTPR$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvUnoTPR$clone(deep = FALSE)
deepWhether to make a deep clone.
Uno H, Cai T, Tian L, Wei LJ (2007). “Evaluating Prediction Rules fort-Year Survivors With Censored Regression Models.” Journal of the American Statistical Association, 102(478), 527–537. doi:10.1198/016214507000000149.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.xu_r2
Other AUC survival measures:
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.xu_r2
Calls survAUC::XO().
Assumes Cox PH model specification.
All measures implemented from survAUC should be used with
care, we are aware of problems in implementation that sometimes cause fatal
errors in R.
In future updates some of these measures may be re-written and implemented
directly in mlr3proba.
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
MeasureSurvXuR2$new()
mlr_measures$get("surv.xu_r2")
msr("surv.xu_r2")
Empty ParamSet
Type: "surv"
Range:
Minimize: FALSE
Required prediction: lp
mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvXuR2
new()
Creates a new instance of this R6 class.
MeasureSurvXuR2$new()
clone()
The objects of this class are cloneable with this method.
MeasureSurvXuR2$clone(deep = FALSE)
deepWhether to make a deep clone.
Xu R, O'Quigley J (1999). “A R2 type measure of dependence for proportional hazards models.” Journal of Nonparametric Statistics, 12(1), 83–107. doi:10.1080/10485259908832799.
Other survival measures:
mlr_measures_surv.calib_alpha,
mlr_measures_surv.calib_beta,
mlr_measures_surv.calib_index,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.cindex,
mlr_measures_surv.dcalib,
mlr_measures_surv.graf,
mlr_measures_surv.hung_auc,
mlr_measures_surv.intlogloss,
mlr_measures_surv.logloss,
mlr_measures_surv.mae,
mlr_measures_surv.mse,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.rcll,
mlr_measures_surv.rmse,
mlr_measures_surv.schmid,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Other R2 survival measures:
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2
Other lp survival measures:
mlr_measures_surv.calib_beta,
mlr_measures_surv.chambless_auc,
mlr_measures_surv.hung_auc,
mlr_measures_surv.nagelk_r2,
mlr_measures_surv.oquigley_r2,
mlr_measures_surv.song_auc,
mlr_measures_surv.song_tnr,
mlr_measures_surv.song_tpr,
mlr_measures_surv.uno_auc,
mlr_measures_surv.uno_tnr,
mlr_measures_surv.uno_tpr
Composes a survival distribution (distr) using the linear predictor
predictions (lp) from a given LearnerSurv during training and prediction,
utilizing the breslow estimator. The specified learner must be
capable of generating lp-type predictions (e.g., a Cox-type model).
This PipeOp can be instantiated via the Dictionary mlr_pipeops or with the associated sugar function po():
PipeOpBreslow$new(learner)
mlr_pipeops$get("breslowcompose", learner)
po("breslowcompose", learner, breslow.overwrite = TRUE)
PipeOpBreslow is like a LearnerSurv.
It has one input channel, named input that takes a TaskSurv during training
and another TaskSurv during prediction.
PipeOpBreslow has one output channel named output, producing NULL during
training and a PredictionSurv during prediction.
The $state slot stores the times and status survival target variables of
the train TaskSurv as well as the lp predictions on the train set.
The parameters are:
breslow.overwrite :: logical(1)
If FALSE (default) then the compositor does nothing and returns the
input learner's PredictionSurv.
If TRUE or in the case that the input learner doesn't have distr
predictions, then the distr is overwritten with the distr composed
from lp and the train set information using breslow.
This is useful for changing the prediction distr from one model form to
another.
mlr3pipelines::PipeOp -> PipeOpBreslow
learner(mlr3::Learner)
The input survival learner.
new()
Creates a new instance of this R6 class.
PipeOpBreslow$new(learner, id = NULL, param_vals = list())
learner(LearnerSurv)
Survival learner which must provide lp-type predictions
id(character(1))
Identifier of the resulting object. If NULL (default), it will be set
as the id of the input learner.
param_vals(list())
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction.
clone()
The objects of this class are cloneable with this method.
PipeOpBreslow$clone(deep = FALSE)
deepWhether to make a deep clone.
Breslow N (1972). “Discussion of 'Regression Models and Life-Tables' by D.R. Cox.” Journal of the Royal Statistical Society: Series B, 34(2), 216-217.
Lin, Y. D (2007). “On the Breslow estimator.” Lifetime Data Analysis, 13(4), 471-480. doi:10.1007/s10985-007-9048-y.
Other survival compositors:
mlr_pipeops_crankcompose,
mlr_pipeops_distrcompose,
mlr_pipeops_responsecompose
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") part = partition(task, ratio = 0.8) train_task = task$clone()$filter(part$train) test_task = task$clone()$filter(part$test) learner = lrn("surv.coxph") # learner with lp predictions b = po("breslowcompose", learner = learner, breslow.overwrite = TRUE) b$train(list(train_task)) p = b$predict(list(test_task))[[1L]] ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") part = partition(task, ratio = 0.8) train_task = task$clone()$filter(part$train) test_task = task$clone()$filter(part$test) learner = lrn("surv.coxph") # learner with lp predictions b = po("breslowcompose", learner = learner, breslow.overwrite = TRUE) b$train(list(train_task)) p = b$predict(list(test_task))[[1L]] ## End(Not run)
Combines a predicted response and se from PredictionRegr
with a specified probability distribution to estimate (or 'compose') a distr prediction.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops or with the associated sugar
function mlr3pipelines::po():
PipeOpProbregr$new()
mlr_pipeops$get("compose_probregr")
po("compose_probregr")
PipeOpProbregr has two input channels named "input_response" and "input_se",
which take NULL during training and two PredictionRegrs
during prediction, these should respectively contain the response and se
return type, the same object can be passed twice.
The output during prediction is a PredictionRegr with
the "response" from input_response, the "se" from input_se and a "distr"
created from combining the two.
The $state is left empty (list()).
dist :: character(1)
Location-scale distribution to use for composition. Current choices are
"Uniform" (default), "Normal", "Cauchy", "Gumbel", "Laplace",
"Logistic". All implemented via distr6.
The composition is created by substituting the response and se predictions into the
distribution location and scale parameters respectively.
mlr3pipelines::PipeOp -> PipeOpProbregr
new()
Creates a new instance of this R6 class.
PipeOpProbregr$new(id = "compose_probregr", param_vals = list())
id(character(1))
Identifier of the resulting object.
param_vals(list())
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction.
clone()
The objects of this class are cloneable with this method.
PipeOpProbregr$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: library(mlr3) library(mlr3pipelines) set.seed(1) task = tsk("boston_housing") # Option 1: Use a learner that can predict se learn = lrn("regr.featureless", predict_type = "se") pred = learn$train(task)$predict(task) poc = po("compose_probregr") poc$train(list(NULL, NULL)) poc$predict(list(pred, pred))[[1]] # Option 2: Use two learners, one for response and the other for se learn_response = lrn("regr.rpart") learn_se = lrn("regr.featureless", predict_type = "se") pred_response = learn_response$train(task)$predict(task) pred_se = learn_se$train(task)$predict(task) poc = po("compose_probregr") poc$train(list(NULL, NULL)) poc$predict(list(pred_response, pred_se))[[1]] ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) set.seed(1) task = tsk("boston_housing") # Option 1: Use a learner that can predict se learn = lrn("regr.featureless", predict_type = "se") pred = learn$train(task)$predict(task) poc = po("compose_probregr") poc$train(list(NULL, NULL)) poc$predict(list(pred, pred))[[1]] # Option 2: Use two learners, one for response and the other for se learn_response = lrn("regr.rpart") learn_se = lrn("regr.featureless", predict_type = "se") pred_response = learn_response$train(task)$predict(task) pred_se = learn_se$train(task)$predict(task) poc = po("compose_probregr") poc$train(list(NULL, NULL)) poc$predict(list(pred_response, pred_se))[[1]] ## End(Not run)
Uses a predicted distr in a PredictionSurv to estimate (or 'compose') a crank prediction.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops or with the associated sugar
function mlr3pipelines::po():
PipeOpCrankCompositor$new()
mlr_pipeops$get("crankcompose")
po("crankcompose")
PipeOpCrankCompositor has one input channel named "input", which takes NULL during training and PredictionSurv during prediction.
PipeOpCrankCompositor has one output channel named "output", producing NULL during training and a PredictionSurv during prediction.
The output during prediction is the PredictionSurv from the input but with the crank predict type overwritten by the given estimation method.
The $state is left empty (list()).
method :: character(1)
Determines what method should be used to produce a continuous ranking from the distribution.
Currently only mort is supported, which is the sum of the cumulative hazard, also called expected/ensemble mortality, see Ishwaran et al. (2008).
For more details, see get_mortality().
overwrite :: logical(1)
If FALSE (default) and the prediction already has a crank prediction, then the compositor returns the input prediction unchanged.
If TRUE, then the crank will be overwritten.
mlr3pipelines::PipeOp -> PipeOpCrankCompositor
new()
Creates a new instance of this R6 class.
PipeOpCrankCompositor$new(id = "crankcompose", param_vals = list())
id(character(1))
Identifier of the resulting object.
param_vals(list())
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction.
clone()
The objects of this class are cloneable with this method.
PipeOpCrankCompositor$clone(deep = FALSE)
deepWhether to make a deep clone.
Sonabend, Raphael, Bender, Andreas, Vollmer, Sebastian (2022). “Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.” Bioinformatics. ISSN 1367-4803, doi:10.1093/BIOINFORMATICS/BTAC451, https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac451/6640155.
Ishwaran, Hemant, Kogalur, B U, Blackstone, H E, Lauer, S M, others (2008). “Random survival forests.” The Annals of applied statistics, 2(3), 841–860.
Other survival compositors:
mlr_pipeops_compose_breslow_distr,
mlr_pipeops_distrcompose,
mlr_pipeops_responsecompose
## Not run: library(mlr3pipelines) task = tsk("rats") # change the crank prediction type of a Cox's model predictions pred = lrn("surv.coxph")$train(task)$predict(task) poc = po("crankcompose", param_vals = list(overwrite = TRUE)) poc$train(list(NULL)) # need to train first, even if nothing happens poc$predict(list(pred))[[1L]] ## End(Not run)## Not run: library(mlr3pipelines) task = tsk("rats") # change the crank prediction type of a Cox's model predictions pred = lrn("surv.coxph")$train(task)$predict(task) poc = po("crankcompose", param_vals = list(overwrite = TRUE)) poc$train(list(NULL)) # need to train first, even if nothing happens poc$predict(list(pred))[[1L]] ## End(Not run)
Estimates (or 'composes') a survival distribution from a predicted baseline
survival distribution (distr) and a linear predictor (lp) from two PredictionSurvs.
Compositor Assumptions:
The baseline distr is a discrete estimator, e.g. surv.kaplan.
The composed distr is of a linear form
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops or with the associated sugar
function mlr3pipelines::po():
PipeOpDistrCompositor$new()
mlr_pipeops$get("distrcompose")
po("distrcompose")
PipeOpDistrCompositor has two input channels, "base" and "pred".
Both input channels take NULL during training and PredictionSurv during prediction.
PipeOpDistrCompositor has one output channel named "output", producing
NULL during training and a PredictionSurv during prediction.
The output during prediction is the PredictionSurv from the "pred" input
but with an extra (or overwritten) column for the distr predict type; which
is composed from the distr of "base" and the lp of "pred".
If no lp predictions have been made or exist, then the "pred" is returned unchanged.
The $state is left empty (list()).
The parameters are:
form :: character(1)
Determines the form that the predicted linear survival model should take. This is either,
accelerated-failure time, aft, proportional hazards, ph, or proportional odds, po.
Default aft.
overwrite :: logical(1)
If FALSE (default) then if the "pred" input already has a distr, the compositor does
nothing and returns the given PredictionSurv. If TRUE, then the distr is overwritten
with the distr composed from lp - this is useful for changing the prediction
distr from one model form to another.
scale_lp :: logical(1)
This option is only applicable to form equal to "aft". If TRUE, it
min-max scales the linear prediction scores to be in the interval ,
avoiding extrapolation of the baseline on the transformed time
points , as these will be ,
and so always smaller than the maximum time point for which we have estimated
.
Note that this is just a heuristic to get reasonable results in the
case you observe survival predictions to be e.g. constant after the AFT
composition and it definitely provides no guarantee for creating calibrated
distribution predictions (as none of these methods do). Therefore, it is
set to FALSE by default.
The respective forms above have respective survival distributions:
where is the estimated baseline survival distribution, and is the
predicted linear predictor.
For an example use of the "aft" composition using Kaplan-Meier as a baseline
distribution, see Norman et al. (2024).
mlr3pipelines::PipeOp -> PipeOpDistrCompositor
new()
Creates a new instance of this R6 class.
PipeOpDistrCompositor$new(id = "distrcompose", param_vals = list())
id(character(1))
Identifier of the resulting object.
param_vals(list())
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction.
clone()
The objects of this class are cloneable with this method.
PipeOpDistrCompositor$clone(deep = FALSE)
deepWhether to make a deep clone.
Norman, A P, Li, Wanlu, Jiang, Wenyu, Chen, E B (2024). “deepAFT: A nonlinear accelerated failure time model with artificial neural network.” Statistics in Medicine. doi:10.1002/sim.10152.
Other survival compositors:
mlr_pipeops_compose_breslow_distr,
mlr_pipeops_crankcompose,
mlr_pipeops_responsecompose
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") base = lrn("surv.kaplan")$train(task)$predict(task) pred = lrn("surv.coxph")$train(task)$predict(task) # let's change the distribution prediction of Cox (Breslow-based) to an AFT form: pod = po("distrcompose", param_vals = list(form = "aft", overwrite = TRUE)) pod$train(list(NULL, NULL)) # need to train first, even if nothing happens pod$predict(list(base = base, pred = pred))[[1]] ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") base = lrn("surv.kaplan")$train(task)$predict(task) pred = lrn("surv.coxph")$train(task)$predict(task) # let's change the distribution prediction of Cox (Breslow-based) to an AFT form: pod = po("distrcompose", param_vals = list(form = "aft", overwrite = TRUE)) pod$train(list(NULL, NULL)) # need to train first, even if nothing happens pod$predict(list(base = base, pred = pred))[[1]] ## End(Not run)
Uses a predicted survival distribution (distr) in a PredictionSurv to estimate (or 'compose') an expected survival time (response) prediction.
Practically, this PipeOp summarizes an observation's survival curve/distribution to a single number which can be either the restricted mean survival time or the median survival time.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops or with the associated sugar
function mlr3pipelines::po():
PipeOpResponseCompositor$new()
mlr_pipeops$get("responsecompose")
po("responsecompose")
PipeOpResponseCompositor has one input channel named "input", which takes
NULL during training and PredictionSurv during prediction.
PipeOpResponseCompositor has one output channel named "output", producing NULL during training
and a PredictionSurv during prediction.
The output during prediction is the PredictionSurv from the input but with the response
predict type overwritten by the given method.
The $state is left empty (list()).
method :: character(1)
Determines what method should be used to produce a survival time (response) from the survival distribution.
Available methods are "rmst" and "median", corresponding to the restricted mean survival time and the median survival time respectively.
tau :: numeric(1)
Determines the time point up to which we calculate the restricted mean survival time (works only for the "rmst" method).
If NULL (default), all the available time points in the predicted survival distribution will be used.
add_crank :: logical(1)
If TRUE then crank predict type will be set as -response (as higher survival times correspond to lower risk).
Works only if overwrite is TRUE.
overwrite :: logical(1)
If FALSE (default) and the prediction already has a response prediction, then the compositor returns the input prediction unchanged.
If TRUE, then the response (and the crank, if add_crank is TRUE) will be overwritten.
The restricted mean survival time is the default/preferred method and is calculated as follows:
where is the expected survival time, is the time cutoff/horizon and are the predicted survival probabilities of observation for all the time points.
The survival time is just the first time point for which the survival probability is less than .
If no such time point exists (e.g. when the survival distribution is not proper due to high censoring) we return the last time point.
This is not a good estimate to use in general, only a reasonable substitute in such cases.
mlr3pipelines::PipeOp -> PipeOpResponseCompositor
new()
Creates a new instance of this R6 class.
PipeOpResponseCompositor$new(id = "responsecompose", param_vals = list())
id(character(1))
Identifier of the resulting object.
param_vals(list())
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction.
clone()
The objects of this class are cloneable with this method.
PipeOpResponseCompositor$clone(deep = FALSE)
deepWhether to make a deep clone.
Zhao, Lihui, Claggett, Brian, Tian, Lu, Uno, Hajime, Pfeffer, A. M, Solomon, D. S, Trippa, Lorenzo, Wei, J. L (2016). “On the restricted mean survival time curve in survival analysis.” Biometrics, 72(1), 215–221. ISSN 1541-0420, doi:10.1111/BIOM.12384, https://onlinelibrary.wiley.com/doi/full/10.1111/biom.12384.
Other survival compositors:
mlr_pipeops_compose_breslow_distr,
mlr_pipeops_crankcompose,
mlr_pipeops_distrcompose
## Not run: library(mlr3pipelines) task = tsk("rats") # add survival time prediction type to the predictions of a Cox model # Median survival time as response pred = lrn("surv.coxph")$train(task)$predict(task) por = po("responsecompose", param_vals = list(method = "median", overwrite = TRUE)) por$train(list(NULL)) # need to train first, even if nothing happens por$predict(list(pred))[[1L]] # mostly improper survival distributions, "median" sets the survival time # to the last time point # RMST (default) as response, while also changing the `crank` to `-response` por = po("responsecompose", param_vals = list(overwrite = TRUE, add_crank = TRUE)) por$train(list(NULL)) por$predict(list(pred))[[1L]] ## End(Not run)## Not run: library(mlr3pipelines) task = tsk("rats") # add survival time prediction type to the predictions of a Cox model # Median survival time as response pred = lrn("surv.coxph")$train(task)$predict(task) por = po("responsecompose", param_vals = list(method = "median", overwrite = TRUE)) por$train(list(NULL)) # need to train first, even if nothing happens por$predict(list(pred))[[1L]] # mostly improper survival distributions, "median" sets the survival time # to the last time point # RMST (default) as response, while also changing the `crank` to `-response` por = po("responsecompose", param_vals = list(overwrite = TRUE, add_crank = TRUE)) por$train(list(NULL)) por$predict(list(pred))[[1L]] ## End(Not run)
Perform (weighted) prediction averaging from survival PredictionSurvs by connecting
PipeOpSurvAvg to multiple PipeOpLearner outputs.
The resulting prediction will aggregate any predict types that are contained within all inputs.
Any predict types missing from at least one input will be set to NULL. These are aggregated
as follows:
"response", "crank", and "lp" are all a weighted average from the incoming predictions.
"distr" is a distr6::VectorDistribution containing distr6::MixtureDistributions.
Weights can be set as a parameter; if none are provided, defaults to equal weights for each prediction.
Input and output channels are inherited from PipeOpEnsemble with a PredictionSurv for inputs and outputs.
The $state is left empty (list()).
The parameters are the parameters inherited from the PipeOpEnsemble.
Inherits from PipeOpEnsemble by implementing the
private$weighted_avg_predictions() method.
mlr3pipelines::PipeOp -> mlr3pipelines::PipeOpEnsemble -> PipeOpSurvAvg
new()
Creates a new instance of this R6 class.
PipeOpSurvAvg$new(innum = 0, id = "survavg", param_vals = list(), ...)
innum(numeric(1))
Determines the number of input channels.
If innum is 0 (default), a vararg input channel is created that can take an arbitrary
number of inputs.
id(character(1))
Identifier of the resulting object.
param_vals(list())
List of hyperparameter settings, overwriting the hyperparameter settings that would
otherwise be set during construction.
...(ANY)
Additional arguments passed to mlr3pipelines::PipeOpEnsemble.
clone()
The objects of this class are cloneable with this method.
PipeOpSurvAvg$clone(deep = FALSE)
deepWhether to make a deep clone.
Other PipeOps:
mlr_pipeops_trafopred_regrsurv_pem,
mlr_pipeops_trafotask_survregr_pem
## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") p1 = lrn("surv.coxph")$train(task)$predict(task) p2 = lrn("surv.kaplan")$train(task)$predict(task) poc = po("survavg", param_vals = list(weights = c(0.2, 0.8))) poc$train(list(NULL)) # need to train first, even if nothing happens poc$predict(list(p1, p2)) ## End(Not run)## Not run: library(mlr3) library(mlr3pipelines) task = tsk("rats") p1 = lrn("surv.coxph")$train(task)$predict(task) p2 = lrn("surv.kaplan")$train(task)$predict(task) poc = po("survavg", param_vals = list(weights = c(0.2, 0.8))) poc$train(list(NULL)) # need to train first, even if nothing happens poc$predict(list(p1, p2)) ## End(Not run)
Transform PredictionClassif to PredictionSurv by converting event probabilities of a pseudo status variable (discrete time hazards) to survival probabilities using the product rule (Tutz et al. 2016):
Where:
We assume that continuous time is divided into time intervals
is the survival probability at time
is the discrete-time hazard (classifier prediction), i.e. the
conditional probability for an event in the -interval.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po():
PipeOpPredClassifSurvDiscTime$new()
mlr_pipeops$get("trafopred_classifsurv_disctime")
po("trafopred_classifsurv_disctime")
The input is a PredictionClassif and a data.table with the transformed data both generated by PipeOpTaskSurvClassifDiscTime. The output is the input PredictionClassif transformed to a PredictionSurv. Only works during prediction phase.
mlr3pipelines::PipeOp -> PipeOpPredClassifSurvDiscTime
predict_type(character(1))
Returns the active predict type of this PipeOp, which is "crank"
new()
Creates a new instance of this R6 class.
PipeOpPredClassifSurvDiscTime$new(id = "trafopred_classifsurv_disctime")
id(character(1))
Identifier of the resulting object.
clone()
The objects of this class are cloneable with this method.
PipeOpPredClassifSurvDiscTime$clone(deep = FALSE)
deepWhether to make a deep clone.
Tutz, Gerhard, Schmid, Matthias (2016). Modeling Discrete Time-to-Event Data, series Springer Series in Statistics. Springer International Publishing. ISBN 978-3-319-28156-8 978-3-319-28158-2, http://link.springer.com/10.1007/978-3-319-28158-2.
pipeline_survtoclassif_disctime
Other Transformation PipeOps:
mlr_pipeops_trafopred_classifsurv_IPCW,
mlr_pipeops_trafopred_regrsurv_pem,
mlr_pipeops_trafotask_survclassif_IPCW,
mlr_pipeops_trafotask_survclassif_disctime,
mlr_pipeops_trafotask_survregr_pem
Transform PredictionClassif to PredictionSurv using the Inverse Probability of Censoring Weights (IPCW) method by Vock et al. (2016).
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po():
PipeOpPredClassifSurvIPCW$new()
mlr_pipeops$get("trafopred_classifsurv_IPCW")
po("trafopred_classifsurv_IPCW")
The input is a PredictionClassif and a data.table containing observed times, censoring indicators and row ids, all generated by PipeOpTaskSurvClassifIPCW during the prediction phase.
The output is the input PredictionClassif transformed
to a PredictionSurv.
Each input classification probability prediction corresponds to the
probability of having the event up to the specified cutoff time
,
see Vock et al. (2016) and PipeOpTaskSurvClassifIPCW.
Therefore, these predictions serve as continuous risk scores that can be
directly interpreted as crank predictions in the right-censored survival
setting. We also map them to the survival distribution prediction distr,
at the specified cutoff time point , i.e. as
.
Survival measures that use the survival distribution (eg ISBS)
should be evaluated exactly at the cutoff time point .
mlr3pipelines::PipeOp -> PipeOpPredClassifSurvIPCW
predict_type(character(1))
Returns the active predict type of this PipeOp, which is "crank"
new()
Creates a new instance of this R6 class.
PipeOpPredClassifSurvIPCW$new(id = "trafopred_classifsurv_IPCW")
id(character(1))
Identifier of the resulting object.
clone()
The objects of this class are cloneable with this method.
PipeOpPredClassifSurvIPCW$clone(deep = FALSE)
deepWhether to make a deep clone.
Vock, M D, Wolfson, Julian, Bandyopadhyay, Sunayan, Adomavicius, Gediminas, Johnson, E P, Vazquez-Benitez, Gabriela, O'Connor, J P (2016). “Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.” Journal of Biomedical Informatics, 61, 119–131. doi:10.1016/j.jbi.2016.03.009, https://www.sciencedirect.com/science/article/pii/S1532046416000496.
Other Transformation PipeOps:
mlr_pipeops_trafopred_classifsurv_disctime,
mlr_pipeops_trafopred_regrsurv_pem,
mlr_pipeops_trafotask_survclassif_IPCW,
mlr_pipeops_trafotask_survclassif_disctime,
mlr_pipeops_trafotask_survregr_pem
Transform PredictionRegr to PredictionSurv. The predicted piece-wise constant hazards contained in PredictionRegr are transformed into survival probabilities and wrapped in a PredictionSurv object.
We compute the survival probability from the predicted hazards using the following relation:
where denotes the interval, the time, and the duration of interval .
For a more detailed description of PEM, refer to pipeline_survtoregr_pem or the referred article.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po():
PipeOpPredRegrSurvPEM$new()
mlr_pipeops$get("trafopred_regrsurv_pem")
po("trafopred_regrsurv_pem")
The input consists of a PredictionRegr and a data.table containing the transformed data. The PredictionRegr is provided by the mlr3::LearnerRegr, while the data.table is generated by PipeOpTaskSurvRegrPEM. The output is the input PredictionRegr transformed to a PredictionSurv. Only works during prediction phase.
mlr3pipelines::PipeOp -> PipeOpPredRegrSurvPEM
predict_type(character(1))
Returns the active predict type of this PipeOp, which is "crank"
new()
Creates a new instance of this R6 class.
PipeOpPredRegrSurvPEM$new(id = "trafopred_regrsurv_pem")
id(character(1))
Identifier of the resulting object.
clone()
The objects of this class are cloneable with this method.
PipeOpPredRegrSurvPEM$clone(deep = FALSE)
deepWhether to make a deep clone.
Bender, Andreas, Groll, Andreas, Scheipl, Fabian (2018). “A generalized additive model approach to time-to-event analysis.” Statistical Modelling, 18(3-4), 299–321. https://doi.org/10.1177/1471082X17748083.
Other PipeOps:
mlr_pipeops_survavg,
mlr_pipeops_trafotask_survregr_pem
Other Transformation PipeOps:
mlr_pipeops_trafopred_classifsurv_IPCW,
mlr_pipeops_trafopred_classifsurv_disctime,
mlr_pipeops_trafotask_survclassif_IPCW,
mlr_pipeops_trafotask_survclassif_disctime,
mlr_pipeops_trafotask_survregr_pem
Transform TaskSurv to TaskClassif by dividing continuous
time into multiple time intervals for each observation.
This transformation creates a new target variable disc_status that indicates
whether an event occurred within each time interval.
This approach facilitates survival analysis within a classification framework
using discrete time intervals (Tutz et al. 2016).
Note that this data transformation is compatible with learners that support
the "validation" property and can track performance on holdout data during
training, enabling early stopping and logging.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po():
PipeOpTaskSurvClassifDiscTime$new()
mlr_pipeops$get("trafotask_survclassif_disctime")
po("trafotask_survclassif_disctime")
PipeOpTaskSurvClassifDiscTime has one input channel named "input", and two output channels, one named "output" and the other "transformed_data".
During training, the "output" is the "input" TaskSurv transformed to a
TaskClassif.
The target column is named "disc_status" and indicates whether an event occurred
in each time interval.
An additional numeric feature named "tend" contains the end time point of each interval.
Lastly, the "output" task has a column with the original observation ids,
under the role "original_ids".
The "transformed_data" is an empty data.table.
During prediction, the "input" TaskSurv is transformed to the "output"
TaskClassif with "disc_status" as target and the "tend"
feature included.
The "transformed_data" is a data.table with columns
the "disc_status" target of the "output" task, the "id" (original observation ids),
"obs_times" (observed times per "id") and "tend" (end time of each interval).
This "transformed_data" is only meant to be used with the PipeOpPredClassifSurvDiscTime.
The $state contains information about the cut parameter used.
The parameters are
cut :: numeric()
Split points, used to partition the data into intervals based on the time column.
If unspecified, all unique event times will be used.
If cut is a single integer, it will be interpreted as the number of equidistant
intervals from 0 until the maximum event time.
max_time :: numeric(1)
If cut is unspecified, this will be the last possible event time.
All event times after max_time will be administratively censored at max_time.
Needs to be greater than the minimum event time in the given task.
mlr3pipelines::PipeOp -> PipeOpTaskSurvClassifDiscTime
new()
Creates a new instance of this R6 class.
PipeOpTaskSurvClassifDiscTime$new(id = "trafotask_survclassif_disctime")
id(character(1))
Identifier of the resulting object.
clone()
The objects of this class are cloneable with this method.
PipeOpTaskSurvClassifDiscTime$clone(deep = FALSE)
deepWhether to make a deep clone.
Tutz, Gerhard, Schmid, Matthias (2016). Modeling Discrete Time-to-Event Data, series Springer Series in Statistics. Springer International Publishing. ISBN 978-3-319-28156-8 978-3-319-28158-2, http://link.springer.com/10.1007/978-3-319-28158-2.
pipeline_survtoclassif_disctime
Other Transformation PipeOps:
mlr_pipeops_trafopred_classifsurv_IPCW,
mlr_pipeops_trafopred_classifsurv_disctime,
mlr_pipeops_trafopred_regrsurv_pem,
mlr_pipeops_trafotask_survclassif_IPCW,
mlr_pipeops_trafotask_survregr_pem
## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") # transform the survival task to a classification task # all unique event times are used as cutpoints po_disc = po("trafotask_survclassif_disctime") task_classif = po_disc$train(list(task))[[1L]] # the end time points of the discrete time intervals unique(task_classif$data(cols = "tend"))[[1L]] # train a classification learner learner = lrn("classif.log_reg", predict_type = "prob") learner$train(task_classif) ## End(Not run)## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") # transform the survival task to a classification task # all unique event times are used as cutpoints po_disc = po("trafotask_survclassif_disctime") task_classif = po_disc$train(list(task))[[1L]] # the end time points of the discrete time intervals unique(task_classif$data(cols = "tend"))[[1L]] # train a classification learner learner = lrn("classif.log_reg", predict_type = "prob") learner$train(task_classif) ## End(Not run)
Transform TaskSurv to TaskClassif using the Inverse Probability of Censoring Weights (IPCW) method by Vock et al. (2016).
Let be the observed times (event or censoring) and
the censoring indicators for each observation in the training set.
The IPCW technique consists of two steps: first we estimate the censoring
distribution using the Kaplan-Meier estimator from the
training data. Then we calculate the observation weights given a cutoff time
as:
Observations that are censored prior to are assigned zero weights, i.e.
.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po():
PipeOpTaskSurvClassifIPCW$new()
mlr_pipeops$get("trafotask_survclassif_IPCW")
po("trafotask_survclassif_IPCW")
PipeOpTaskSurvClassifIPCW has one input channel named "input", and two output channels, one named "output" and the other "data".
Training transforms the "input" TaskSurv to a TaskClassif,
which is the "output".
The target column is named "status" and indicates whether an event occurred
before the cutoff time (1 = yes, 0 = no).
The observed times column is removed from the "output" task.
The transformed task has the property "weights_learner" (the ).
The "data" is NULL.
During prediction, the "input" TaskSurv is transformed to the "output"
TaskClassif with "status" as target (again indicating
if the event occurred before the cutoff time).
The "data" is a data.table containing the observed times and
censoring indicators/status of each subject as well as the corresponding
row_ids.
This "data" is only meant to be used with the PipeOpPredClassifSurvIPCW.
The parameters are
tau :: numeric()
Predefined time point for IPCW. Observations with time larger than are censored.
Must be less or equal to the maximum event time.
eps :: numeric()
Small value to replace censoring probabilities to prevent
infinite weights (a warning is triggered if this happens).
mlr3pipelines::PipeOp -> PipeOpTaskSurvClassifIPCW
new()
Creates a new instance of this R6 class.
PipeOpTaskSurvClassifIPCW$new(id = "trafotask_survclassif_IPCW")
id(character(1))
Identifier of the resulting object.
clone()
The objects of this class are cloneable with this method.
PipeOpTaskSurvClassifIPCW$clone(deep = FALSE)
deepWhether to make a deep clone.
Vock, M D, Wolfson, Julian, Bandyopadhyay, Sunayan, Adomavicius, Gediminas, Johnson, E P, Vazquez-Benitez, Gabriela, O'Connor, J P (2016). “Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.” Journal of Biomedical Informatics, 61, 119–131. doi:10.1016/j.jbi.2016.03.009, https://www.sciencedirect.com/science/article/pii/S1532046416000496.
Other Transformation PipeOps:
mlr_pipeops_trafopred_classifsurv_IPCW,
mlr_pipeops_trafopred_classifsurv_disctime,
mlr_pipeops_trafopred_regrsurv_pem,
mlr_pipeops_trafotask_survclassif_disctime,
mlr_pipeops_trafotask_survregr_pem
## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") # split task to train and test subtasks part = partition(task) task_train = task$clone()$filter(part$train) task_test = task$clone()$filter(part$test) # define IPCW pipeop po_ipcw = po("trafotask_survclassif_IPCW", tau = 365) # during training, output is a classification task with weights task_classif_train = po_ipcw$train(list(task_train))[[1]] task_classif_train # during prediction, output is a classification task (no weights) task_classif_test = po_ipcw$predict(list(task_test))[[1]] task_classif_test # train classif learner on the train task with weights learner = lrn("classif.rpart", predict_type = "prob") learner$train(task_classif_train) # predict using the test output task p = learner$predict(task_classif_test) # use classif measures for evaluation p$confusion p$score() p$score(msr("classif.auc")) ## End(Not run)## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") # split task to train and test subtasks part = partition(task) task_train = task$clone()$filter(part$train) task_test = task$clone()$filter(part$test) # define IPCW pipeop po_ipcw = po("trafotask_survclassif_IPCW", tau = 365) # during training, output is a classification task with weights task_classif_train = po_ipcw$train(list(task_train))[[1]] task_classif_train # during prediction, output is a classification task (no weights) task_classif_test = po_ipcw$predict(list(task_test))[[1]] task_classif_test # train classif learner on the train task with weights learner = lrn("classif.rpart", predict_type = "prob") learner$train(task_classif_train) # predict using the test output task p = learner$predict(task_classif_test) # use classif measures for evaluation p$confusion p$score() p$score(msr("classif.auc")) ## End(Not run)
Transform TaskSurv to TaskRegr by dividing continuous
time into multiple time intervals for each observation.
This transformation creates a new target variable pem_status that indicates
whether an event occurred within each time interval.
The piece-wise exponential modeling approach (PEM) facilitates survival analysis
within a regression framework using discrete time intervals (Bender et al. 2018).
Note that this data transformation is compatible with learners that support
the "validation" property and can track performance on holdout data during
training, enabling early stopping and logging.
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po():
PipeOpTaskSurvRegrPEM$new()
mlr_pipeops$get("trafotask_survregr_pem")
po("trafotask_survregr_pem")
PipeOpTaskSurvRegrPEM has one input channel named "input", and two output channels, one named "output" and the other "transformed_data".
During training, the "output" is the "input" TaskSurv transformed to a
TaskRegr.
The target column is named "pem_status" and indicates whether an event occurred
in each time interval.
An additional numeric feature named "tend" contains the end time point of each interval.
Lastly, the "output" task has an offset column "offset".
The offset, also referred to as exposure, is the logarithm of time spent in interval , i.e. .
The "transformed_data" is an empty data.table.
During prediction, the "input" TaskSurv is transformed to the "output"
TaskRegr with "pem_status" as target, "tend" included as feature and
and the "offset" column which is assigned the offset "col_role".
The "transformed_data" is a data.table with columns the "pem_status"
target of the "output" task, the "id" (original observation ids),
"obs_times" (observed times per "id") and "tend" (end time of each interval).
This "transformed_data" is only meant to be used with the PipeOpPredRegrSurvPEM.
The $state contains information about the cut parameter used.
The parameters are
cut :: numeric()
Split points, used to partition the data into intervals based on the time column.
If unspecified, all unique event times will be used.
If cut is a single integer, it will be interpreted as the number of equidistant
intervals from 0 until the maximum event time.
max_time :: numeric(1)
If cut is unspecified, this will be the last possible event time.
All event times after max_time will be administratively censored at max_time.
Needs to be greater than the minimum event time in the given task.
mlr3pipelines::PipeOp -> PipeOpTaskSurvRegrPEM
new()
Creates a new instance of this R6 class.
PipeOpTaskSurvRegrPEM$new(id = "trafotask_survregr_pem")
id(character(1))
Identifier of the resulting object.
clone()
The objects of this class are cloneable with this method.
PipeOpTaskSurvRegrPEM$clone(deep = FALSE)
deepWhether to make a deep clone.
Bender, Andreas, Groll, Andreas, Scheipl, Fabian (2018). “A generalized additive model approach to time-to-event analysis.” Statistical Modelling, 18(3-4), 299–321. https://doi.org/10.1177/1471082X17748083.
Other PipeOps:
mlr_pipeops_survavg,
mlr_pipeops_trafopred_regrsurv_pem
Other Transformation PipeOps:
mlr_pipeops_trafopred_classifsurv_IPCW,
mlr_pipeops_trafopred_classifsurv_disctime,
mlr_pipeops_trafopred_regrsurv_pem,
mlr_pipeops_trafotask_survclassif_IPCW,
mlr_pipeops_trafotask_survclassif_disctime
## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") # transform the survival task to a regression task # all unique event times are used as cutpoints po_pem = po("trafotask_survregr_pem") task_regr = po_pem$train(list(task))[[1L]] # the end time points of the discrete time intervals unique(task_regr$data(cols = "tend")[[1L]]) # train a regression learner that supports poisson regression # e.g. regr.gam # won't run unless learner can accept offset column role learner = lrn("regr.gam", formula = pem_status ~ s(age) + s(tend), family = "poisson") learner$train(task_regr) # e.g. regr.xgboost, note prior data processing steps learner = as_learner( po("modelmatrix", formula = ~ as.factor(tend) + .) %>>% lrn("regr.xgboost", objective = "count:poisson", nrounds = 100, eta = 0.1) ) learner$train(task_regr) ## End(Not run)## Not run: library(mlr3) library(mlr3learners) library(mlr3pipelines) task = tsk("lung") # transform the survival task to a regression task # all unique event times are used as cutpoints po_pem = po("trafotask_survregr_pem") task_regr = po_pem$train(list(task))[[1L]] # the end time points of the discrete time intervals unique(task_regr$data(cols = "tend")[[1L]]) # train a regression learner that supports poisson regression # e.g. regr.gam # won't run unless learner can accept offset column role learner = lrn("regr.gam", formula = pem_status ~ s(age) + s(tend), family = "poisson") learner$train(task_regr) # e.g. regr.xgboost, note prior data processing steps learner = as_learner( po("modelmatrix", formula = ~ as.factor(tend) + .) %>>% lrn("regr.xgboost", objective = "count:poisson", nrounds = 100, eta = 0.1) ) learner$train(task_regr) ## End(Not run)
A TaskGenerator calling coxed::sim.survdata().
This generator creates a survival dataset using coxed, and exposes
some parameters from the coxed::sim.survdata() function.
We don't include the parameters X (user-specified variables), covariate,
low, high, compare, beta and hazard.fun for this generator.
The latter means that no user-specified hazard function can be used and the
generated datasets always use the flexible-hazard method from the package.
This TaskGenerator can be instantiated via the dictionary mlr_task_generators or with the associated sugar function tgen():
mlr_task_generators$get("coxed")
tgen("coxed")
| Id | Type | Default | Levels | Range |
| T | numeric | 100 | |
|
| type | character | none | none, tvbeta | - |
| knots | integer | 8 | |
|
| spline | logical | TRUE | TRUE, FALSE | - |
| xvars | integer | 3 | |
|
| mu | untyped | 0 | - | |
| sd | untyped | 0.5 | - | |
| censor | numeric | 0.1 | |
|
| censor.cond | logical | FALSE | TRUE, FALSE | - |
mlr3::TaskGenerator -> TaskGeneratorCoxed
new()
Creates a new instance of this R6 class.
TaskGeneratorCoxed$new()
help()
Opens the corresponding help page referenced by field $man.
TaskGeneratorCoxed$help()
clone()
The objects of this class are cloneable with this method.
TaskGeneratorCoxed$clone(deep = FALSE)
deepWhether to make a deep clone.
Harden, J. J, Kropko, Jonathan (2019). “Simulating Duration Data for the Cox Model.” Political Science Research and Methods, 7(4), 921–928. doi:10.1017/PSRM.2018.19.
as.data.table(mlr_task_generators) for a table of available TaskGenerators in the running session
Other TaskGenerator:
mlr_task_generators_simdens,
mlr_task_generators_simsurv
library(mlr3) # time horizon = 365 days, censoring proportion = 60%, 6 covariates normally # distributed with mean = 1 and sd = 2, independent censoring, no time-varying # effects gen = tgen("coxed", T = 365, type = "none", censor = 0.6, xvars = 6, mu = 1, sd = 2, censor.cond = FALSE) gen$generate(50) # same as above, but with time-varying coefficients gen$param_set$set_values(type = "tvbeta") gen$generate(50)library(mlr3) # time horizon = 365 days, censoring proportion = 60%, 6 covariates normally # distributed with mean = 1 and sd = 2, independent censoring, no time-varying # effects gen = tgen("coxed", T = 365, type = "none", censor = 0.6, xvars = 6, mu = 1, sd = 2, censor.cond = FALSE) gen$generate(50) # same as above, but with time-varying coefficients gen$param_set$set_values(type = "tvbeta") gen$generate(50)
A mlr3::TaskGenerator calling distr6::distrSimulate().
See distr6::distrSimulate() for an explanation of the hyperparameters.
See distr6::listDistributions() for the names of the available distributions.
This TaskGenerator can be instantiated via the dictionary mlr_task_generators or with the associated sugar function tgen():
mlr_task_generators$get("simdens")
tgen("simdens")
| Id | Type | Default | Levels |
| distribution | character | Normal | Arcsine, Arrdist, Bernoulli, Beta, BetaNoncentral, Binomial, Categorical, Cauchy, ChiSquared, ChiSquaredNoncentral, ... |
| pars | untyped | - |
mlr3::TaskGenerator -> TaskGeneratorSimdens
new()
Creates a new instance of this R6 class.
TaskGeneratorSimdens$new()
help()
Opens the corresponding help page referenced by field $man.
TaskGeneratorSimdens$help()
clone()
The objects of this class are cloneable with this method.
TaskGeneratorSimdens$clone(deep = FALSE)
deepWhether to make a deep clone.
as.data.table(mlr_task_generators) for a table of available TaskGenerators in the running session
Other TaskGenerator:
mlr_task_generators_coxed,
mlr_task_generators_simsurv
# generate 20 samples from a standard Normal distribution dens_gen = tgen("simdens") dens_gen$param_set task = dens_gen$generate(20) head(task) # generate 50 samples from a Binomial distribution with specific parameters dens_gen = tgen("simdens", distribution = "Bernoulli", pars = list(prob = 0.8)) task = dens_gen$generate(50) task$data()[["x"]]# generate 20 samples from a standard Normal distribution dens_gen = tgen("simdens") dens_gen$param_set task = dens_gen$generate(20) head(task) # generate 50 samples from a Binomial distribution with specific parameters dens_gen = tgen("simdens", distribution = "Bernoulli", pars = list(prob = 0.8)) task = dens_gen$generate(50) task$data()[["x"]]
A mlr3::TaskGenerator calling simsurv::simsurv() from package simsurv.
This generator currently only exposes a small subset of the flexibility of simsurv, and just creates a small dataset with the following numerical covariates:
treatment: Bernoulli distributed with hazard ratio 0.5.
height: Normally distributed with hazard ratio 1.
weight: normally distributed with hazard ratio 1.
See simsurv::simsurv() for an explanation of the hyperparameters.
Initial values for hyperparameters are lambdas = 0.1, gammas = 1.5 and maxt = 5.
The last one, by default generates samples which are administratively censored at , so increase this value if you want to change this.
This TaskGenerator can be instantiated via the dictionary mlr_task_generators or with the associated sugar function tgen():
mlr_task_generators$get("simsurv")
tgen("simsurv")
| Id | Type | Default | Levels | Range |
| dist | character | weibull | weibull, exponential, gompertz | - |
| lambdas | numeric | - | |
|
| gammas | numeric | - | |
|
| maxt | numeric | - |
|
mlr3::TaskGenerator -> TaskGeneratorSimsurv
new()
Creates a new instance of this R6 class.
TaskGeneratorSimsurv$new()
help()
Opens the corresponding help page referenced by field $man.
TaskGeneratorSimsurv$help()
clone()
The objects of this class are cloneable with this method.
TaskGeneratorSimsurv$clone(deep = FALSE)
deepWhether to make a deep clone.
Brilleman, L. S, Wolfe, Rory, Moreno-Betancur, Margarita, Crowther, J. M (2021). “Simulating Survival Data Using the simsurv R Package.” Journal of Statistical Software, 97(3), 1–27. doi:10.18637/JSS.V097.I03.
as.data.table(mlr_task_generators) for a table of available TaskGenerators in the running session
Other TaskGenerator:
mlr_task_generators_coxed,
mlr_task_generators_simdens
# generate 20 samples with Weibull survival distribution gen = tgen("simsurv") task = gen$generate(20) head(task) # generate 100 samples with exponential survival distribution and tau = 40 gen = tgen("simsurv", dist = "exponential", gammas = NULL, maxt = 40) task = gen$generate(100) head(task)# generate 20 samples with Weibull survival distribution gen = tgen("simsurv") task = gen$generate(20) head(task) # generate 100 samples with exponential survival distribution and tau = 40 gen = tgen("simsurv", dist = "exponential", gammas = NULL, maxt = 40) task = gen$generate(100) head(task)
A survival task for the actg data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("actg")
tsk("actg")
Task type: “surv”
Dimensions: 1151x13
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “cd4”, “hemophil”, “ivdrug”, “karnof”, “priorzdv”, “raceth”, “sexF”, “strat2”, “tx”, “txgrp”
Column sex has been renamed to sexF and censor has been renamed to status.
Columns id, time_d, and censor_d have been removed so target is time
to AIDS diagnosis (in days).
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A density task for the faithful data set.
R6::R6Class inheriting from TaskDens.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("faithful")
tsk("faithful")
Task type: “dens”
Dimensions: 272x1
Properties: -
Has Missings: FALSE
Target: -
Features: “eruptions”
Only the eruptions column is kept in this task.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A survival task for the gbcs data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("gbcs")
tsk("gbcs")
Task type: “surv”
Dimensions: 686x10
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “estrg_recp”, “grade”, “hormone”, “menopause”, “nodes”, “prog_recp”, “size”
Column id and all date columns have been removed, as well as rectime
and censrec.
Target columns (survtime, censdead) have been renamed to (time, status).
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A survival task for the gbsg data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("gbsg")
tsk("gbsg")
Task type: “surv”
Dimensions: 686x10
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “er”, “grade”, “hormon”, “meno”, “nodes”, “pgr”, “size”
Removed column pid.
Column meno has been converted to factor and 0/1 values have been
replaced with premenopausal and postmenopausal respectively.
Column hormon has been converted to factor and 0/1 values have been
replaced with no and yes respectively.
Column grade has been converted to factor.
Renamed target column rfstime to time.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A survival task for the grace data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("grace")
tsk("grace")
Task type: “surv”
Dimensions: 1000x8
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “los”, “revasc”, “revascdays”, “stchange”, “sysbp”
Column id is removed.
Target columns (days, death) have been renamed to (time, status).
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A survival task for the lung data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("lung")
tsk("lung")
Task type: “surv”
Dimensions: 168x9
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “meal.cal”, “pat.karno”, “ph.ecog”, “ph.karno”, “sex”, “wt.loss”
Column inst has been removed.
Column sex has been converted to a factor, all others have been
converted to integer.
Kept only complete cases (no missing values).
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A survival task for the mgus data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("mgus")
tsk("mgus")
Task type: “surv”
Dimensions: 176x9
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “alb”, “creat”, “dxyr”, “hgb”, “mspike”, “sex”
Removed columns id, pcdx and pctime.
Renamed target columns from (fultime, death) to (time, status).
Kept only complete cases (no missing values).
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A density task for the precip data set.
R6::R6Class inheriting from TaskDens.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("precip")
tsk("precip")
Task type: “dens”
Dimensions: 70x1
Properties: -
Has Missings: FALSE
Target: -
Features: “precip”
Only the precip column is kept in this task.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
A survival task for the rats data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("rats")
tsk("rats")
Task type: “surv”
Dimensions: 300x5
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “litter”, “rx”, “sex”
Column sex has been converted to a factor, all others have been
converted to integer.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_veteran,
mlr_tasks_whas
A survival task for the veteran data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("veteran")
tsk("veteran")
Task type: “surv”
Dimensions: 137x8
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “celltype”, “diagtime”, “karno”, “prior”, “trt”
Columns age, time, status, diagtime and karno have been converted
to integer.
Columns trt, prior have been converted to factors. Prior therapy
values are no/yes instead of 0/10.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_whas
A survival task for the whas data set.
R6::R6Class inheriting from TaskSurv.
This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():
mlr_tasks$get("whas")
tsk("whas")
Task type: “surv”
Dimensions: 481x11
Properties: -
Has Missings: FALSE
Target: “time”, “status”
Features: “age”, “chf”, “cpk”, “lenstay”, “miord”, “mitype”, “sexF”, “sho”, “year”
Columns id, yrgrp, and dstat are removed.
Column sex is renamed to sexF, lenfol to time, and fstat to status.
Target is total follow-up time from hospital admission.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html
as.data.table(mlr_tasks) for a table of available Tasks in the running session
Other Task:
TaskDens,
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran
Methods to plot prediction error curves (pecs) for either a PredictionSurv object or a list of trained LearnerSurvs.
pecs(x, measure = c("graf", "logloss"), times, n, eps = NULL, ...) ## S3 method for class 'list' pecs( x, measure = c("graf", "logloss"), times, n, eps = 0.001, task = NULL, row_ids = NULL, newdata = NULL, train_task = NULL, train_set = NULL, ... ) ## S3 method for class 'PredictionSurv' pecs( x, measure = c("graf", "logloss"), times, n, eps = 0.001, train_task = NULL, train_set = NULL, ... )pecs(x, measure = c("graf", "logloss"), times, n, eps = NULL, ...) ## S3 method for class 'list' pecs( x, measure = c("graf", "logloss"), times, n, eps = 0.001, task = NULL, row_ids = NULL, newdata = NULL, train_task = NULL, train_set = NULL, ... ) ## S3 method for class 'PredictionSurv' pecs( x, measure = c("graf", "logloss"), times, n, eps = 0.001, train_task = NULL, train_set = NULL, ... )
x |
(PredictionSurv or |
measure |
( |
times |
( |
n |
( |
eps |
( |
... |
Additional arguments. |
task |
(TaskSurv) |
row_ids |
( |
newdata |
( |
train_task |
(TaskSurv) |
train_set |
( |
If times and n are missing then measure is evaluated over all observed time-points
from the PredictionSurv or TaskSurv object. If a range is provided for times without n,
then all time-points between the range are returned.
# Prediction Error Curves for prediction object task = tsk("lung") learner = lrn("surv.coxph") p = learner$train(task)$predict(task) pecs(p) pecs(p, measure = "logloss", times = seq(0, 1000, 50)) + ggplot2::geom_point() + ggplot2::labs(title = "Prediction Error Curve for Cox PH", y = "ISLL") # Access underlying data x = pecs(p) x$data # Prediction Error Curves for fitted learners learners = lrns(c("surv.kaplan", "surv.coxph")) lapply(learners, function(x) x$train(task)) pecs(learners, task = task, measure = "logloss", times = c(0, 1000), n = 100) + ggplot2::labs(y = "ISLL") pecs(learners, task = task, measure = "graf", times = c(0, 1000), n = 100) + ggplot2::labs(y = "ISBS")# Prediction Error Curves for prediction object task = tsk("lung") learner = lrn("surv.coxph") p = learner$train(task)$predict(task) pecs(p) pecs(p, measure = "logloss", times = seq(0, 1000, 50)) + ggplot2::geom_point() + ggplot2::labs(title = "Prediction Error Curve for Cox PH", y = "ISLL") # Access underlying data x = pecs(p) x$data # Prediction Error Curves for fitted learners learners = lrns(c("surv.kaplan", "surv.coxph")) lapply(learners, function(x) x$train(task)) pecs(learners, task = task, measure = "logloss", times = c(0, 1000), n = 100) + ggplot2::labs(y = "ISLL") pecs(learners, task = task, measure = "graf", times = c(0, 1000), n = 100) + ggplot2::labs(y = "ISBS")
Plots probability density functions from n predicted probability
distributions.
plot_probregr( p, n, type = c("point", "line", "both", "none"), which_plot = c("random", "top"), rm_zero = TRUE, ... )plot_probregr( p, n, type = c("point", "line", "both", "none"), which_plot = c("random", "top"), rm_zero = TRUE, ... )
p |
(PredictionRegr) |
n |
( |
type |
( |
which_plot |
( |
rm_zero |
( |
... |
Unused |
type:
"point" (default) - Truth plotted as point (truth, predicted_pdf(truth))
"line" - Truth plotted as vertical line intercepting x-axis at the truth.
"both" - Plots both the above options.
"none" - Truth not plotted (default if p$truth is missing).
which_plot:
"random"(default) - Random selection ofn' distributions are plotted.
"top"- Topn' distributions are plotted.
It is unlikely the plot will be interpretable when n >> 5.
## Not run: library(mlr3verse) task = tsk("boston_housing") pipe = as_learner(ppl("probregr", lrn("regr.ranger"), dist = "Normal")) p = pipe$train(task)$predict(task) plot_probregr(p, 10, "point", "top") ## End(Not run)## Not run: library(mlr3verse) task = tsk("boston_housing") pipe = as_learner(ppl("probregr", lrn("regr.ranger"), dist = "Normal")) p = pipe$train(task)$predict(task) plot_probregr(p, 10, "point", "top") ## End(Not run)
This object stores the predictions returned by a learner of class LearnerDens.
The task_type is set to "dens".
mlr3::Prediction -> PredictionDens
pdf(numeric())
Access the stored predicted probability density function.
cdf(numeric())
Access the stored predicted cumulative distribution function.
distr(Distribution)
Access the stored estimated distribution.
new()
Creates a new instance of this R6 class.
PredictionDens$new( task = NULL, row_ids = task$row_ids, pdf = NULL, cdf = NULL, distr = NULL, check = TRUE )
task(TaskSurv)
Task, used to extract defaults for row_ids.
row_ids(integer())
Row ids of the predicted observations, i.e. the row ids of the test set.
pdf(numeric())
Numeric vector of estimated probability density function, evaluated at values in test set.
One element for each observation in the test set.
cdf(numeric())
Numeric vector of estimated cumulative distribution function, evaluated at values in test
set. One element for each observation in the test set.
distr(Distribution)
Distribution from distr6.
The distribution from which pdf and cdf are derived.
check(logical(1))
If TRUE, performs argument checks and predict type conversions.
clone()
The objects of this class are cloneable with this method.
PredictionDens$clone(deep = FALSE)
deepWhether to make a deep clone.
Other Prediction:
PredictionSurv
library(mlr3) task = mlr_tasks$get("precip") learner = mlr_learners$get("dens.hist") p = learner$train(task)$predict(task) head(as.data.table(p))library(mlr3) task = mlr_tasks$get("precip") learner = mlr_learners$get("dens.hist") p = learner$train(task)$predict(task) head(as.data.table(p))
This object stores the predictions returned by a learner of class LearnerSurv.
The task_type is set to "surv".
For accessing survival and hazard functions, as well as other complex methods
from a PredictionSurv object, see public methods on distr6::ExoticStatistics()
and example below.
mlr3::Prediction -> PredictionSurv
truth(Surv)
True (observed) outcome.
crank(numeric())
Access the stored predicted continuous ranking.
distr(distr6::Matdist|distr6::Arrdist|distr6::VectorDistribution)
Convert the stored survival array or matrix to a survival distribution.
lp(numeric())
Access the stored predicted linear predictor.
response(numeric())
Access the stored predicted survival time.
new()
Creates a new instance of this R6 class.
PredictionSurv$new( task = NULL, row_ids = task$row_ids, truth = task$truth(), crank = NULL, distr = NULL, lp = NULL, response = NULL, check = TRUE )
task(TaskSurv)
Task, used to extract defaults for row_ids and truth.
row_ids(integer())
Row ids of the predicted observations, i.e. the row ids of the test set.
truth(survival::Surv())
True (observed) response.
crank(numeric())
Numeric vector of predicted continuous rankings (or relative risks). One element for each
observation in the test set. For a pair of continuous ranks, a higher rank indicates that
the observation is more likely to experience the event.
distr(matrix()|[distr6::Arrdist]|[distr6::Matdist]|[distr6::VectorDistribution])
Either a matrix of predicted survival probabilities, a distr6::VectorDistribution,
a distr6::Matdist or an distr6::Arrdist.
If a matrix/array then column names must be given and correspond to survival times.
Rows of matrix correspond to individual predictions. It is advised that the
first column should be time 0 with all entries 1 and the last
with all entries 0. If a VectorDistribution then each distribution in the vector
should correspond to a predicted survival distribution.
lp(numeric())
Numeric vector of linear predictor scores. One element for each observation in the test
set. where is a matrix of covariates and is a vector
of estimated coefficients.
response(numeric())
Numeric vector of predicted survival times.
One element for each observation in the test set.
check(logical(1))
If TRUE, performs argument checks and predict type conversions.
Upon initialization, if the distr input is a Distribution,
we try to coerce it either to a survival matrix or a survival array and store it
in the $data$distr slot for internal use.
If the stored $data$distr is a Distribution object,
the active field $distr (external user API) returns it without modification.
Otherwise, if $data$distr is a survival matrix or array, $distr
constructs a distribution out of the $data$distr object, which will be a
Matdist or Arrdist respectively.
Note that if a survival 3d array is stored in $data$distr, the $distr
field returns an Arrdist initialized with which.curve = 0.5
by default (i.e. the median curve). This means that measures that require
a distr prediction like MeasureSurvGraf, MeasureSurvRCLL, etc.
will use the median survival probabilities.
Note that it is possible to manually change which.curve after construction
of the predicted distribution but we advise against this as it may lead to
inconsistent results.
clone()
The objects of this class are cloneable with this method.
PredictionSurv$clone(deep = FALSE)
deepWhether to make a deep clone.
Other Prediction:
PredictionDens
library(mlr3) task = tsk("rats") learner = lrn("surv.kaplan") p = learner$train(task, row_ids = 1:26)$predict(task, row_ids = 27:30) head(as.data.table(p)) p$distr # distr6::Matdist class (test obs x time points) # survival probabilities of the 4 test rats at two time points p$distr$survival(c(20, 100))library(mlr3) task = tsk("rats") learner = lrn("surv.kaplan") p = learner$train(task, row_ids = 1:26)$predict(task, row_ids = 27:30) head(as.data.table(p)) p$distr # distr6::Matdist class (test obs x time points) # survival probabilities of the 4 test rats at two time points p$distr$survival(c(20, 100))
Internal helper function to easily return the correct survival predict types.
surv_return( times = NULL, surv = NULL, crank = NULL, lp = NULL, response = NULL, which.curve = NULL )surv_return( times = NULL, surv = NULL, crank = NULL, lp = NULL, response = NULL, which.curve = NULL )
times |
( |
surv |
( |
crank |
( |
lp |
( |
response |
( |
which.curve |
Which curve (3rd dimension) should the |
Sonabend, Raphael, Bender, Andreas, Vollmer, Sebastian (2022). “Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.” Bioinformatics. ISSN 1367-4803, doi:10.1093/BIOINFORMATICS/BTAC451, https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac451/6640155.
n = 10 # number of observations k = 50 # time points # Create the matrix with random values between 0 and 1 mat = matrix(runif(n * k, min = 0, max = 1), nrow = n, ncol = k) # transform it to a survival matrix surv_mat = t(apply(mat, 1L, function(row) sort(row, decreasing = TRUE))) # crank is expected mortality, distr is the survival matrix surv_return(times = 1:k, surv = surv_mat) # if crank is set, it's not overwritten surv_return(times = 1:k, surv = surv_mat, crank = rnorm(n)) # lp = crank surv_return(lp = rnorm(n)) # if response is set and no crank, crank = -response surv_return(response = sample(1:100, n)) # if both are set, they are not overwritten surv_return(crank = rnorm(n), response = sample(1:100, n))n = 10 # number of observations k = 50 # time points # Create the matrix with random values between 0 and 1 mat = matrix(runif(n * k, min = 0, max = 1), nrow = n, ncol = k) # transform it to a survival matrix surv_mat = t(apply(mat, 1L, function(row) sort(row, decreasing = TRUE))) # crank is expected mortality, distr is the survival matrix surv_return(times = 1:k, surv = surv_mat) # if crank is set, it's not overwritten surv_return(times = 1:k, surv = surv_mat, crank = rnorm(n)) # lp = crank surv_return(lp = rnorm(n)) # if response is set and no crank, crank = -response surv_return(response = sample(1:100, n)) # if both are set, they are not overwritten surv_return(crank = rnorm(n), response = sample(1:100, n))
This task specializes TaskUnsupervised for density estimation problems.
The data in backend should be a numeric vector or a one column matrix-like object.
The task_type is set to "density".
Predefined tasks are stored in the dictionary mlr_tasks.
mlr3::Task -> mlr3::TaskUnsupervised -> TaskDens
mlr3::Task$add_strata()mlr3::Task$cbind()mlr3::Task$data()mlr3::Task$divide()mlr3::Task$droplevels()mlr3::Task$filter()mlr3::Task$format()mlr3::Task$formula()mlr3::Task$head()mlr3::Task$help()mlr3::Task$levels()mlr3::Task$materialize_view()mlr3::Task$missings()mlr3::Task$print()mlr3::Task$rbind()mlr3::Task$rename()mlr3::Task$select()mlr3::Task$set_col_roles()mlr3::Task$set_levels()mlr3::Task$set_row_roles()new()
Creates a new instance of this R6 class.
TaskDens$new(id, backend, label = NA_character_)
id(character(1))
Identifier for the new instance.
backend(mlr3::DataBackend)
Either a DataBackend, a matrix-like object, or a numeric vector.
If weights are used then two columns expected, otherwise one column. The weight column
must be clearly specified (via [Task]$col_roles) or the learners will break.
label(character(1))
Label for the new instance.
clone()
The objects of this class are cloneable with this method.
TaskDens$clone(deep = FALSE)
deepWhether to make a deep clone.
Other Task:
TaskSurv,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
task = TaskDens$new("precip", backend = precip) task$task_typetask = TaskDens$new("precip", backend = precip) task$task_type
This task specializes mlr3::Task and mlr3::TaskSupervised for
single-event survival problems.
The target is comprised of survival times and an event indicator (
represents censored observations, represents observations that had the
event).
Every row corresponds to one subject/observation.
Predefined tasks are stored in mlr3::mlr_tasks.
The task_type is set to "surv".
Note: Currently only right-censoring is supported, though it possible to create tasks with left and interval censoring using the Surv interface.
mlr3::Task -> mlr3::TaskSupervised -> TaskSurv
cens_type(character(1))
Returns the type of censoring, one of "right", "left" or "interval".
Currently, only the "right" censoring type is fully supported, the rest
are experimental and the API might change in the future.
mlr3::Task$add_strata()mlr3::Task$cbind()mlr3::Task$data()mlr3::Task$divide()mlr3::Task$droplevels()mlr3::Task$filter()mlr3::Task$format()mlr3::Task$head()mlr3::Task$help()mlr3::Task$levels()mlr3::Task$materialize_view()mlr3::Task$missings()mlr3::Task$print()mlr3::Task$rbind()mlr3::Task$rename()mlr3::Task$select()mlr3::Task$set_col_roles()mlr3::Task$set_levels()mlr3::Task$set_row_roles()new()
Creates a new instance of this R6 class.
TaskSurv$new( id, backend, time = "time", event = "event", time2 = "time2", type = "right", label = NA_character_ )
id(character(1))
Identifier for the new instance.
backend(mlr3::DataBackend)
Either a DataBackend, or any object which is convertible to a DataBackend with as_data_backend().
E.g., a data.frame() will be converted to a DataBackendDataTable.
time(character(1))
Name of the column for event time if data is right censored, otherwise starting time if
interval censored.
event(character(1))
Name of the column giving the event indicator.
If data is right censored then 0 means alive (no event) and 1 means dead (event).
If type is "interval" then event is ignored.
time2(character(1))
Name of the column for ending time of the interval for interval censored data,
otherwise ignored.
type(character(1))
The type of censoring. Can be "right" (default), "left" or "interval" censoring.
label(character(1))
Label for the new instance.
Depending on the censoring type ("type"), the output of a survival
task's "$target_names" is a character() vector with values the names
of the target columns.
Specifically, the output is as follows (and in the specified order):
For type = "right" or "left": ("time", "event")
For type = "interval": ("time", "time2")
truth()
True response for specified row_ids. This is the survival outcome
using the Surv format and depends on the censoring
type. Defaults to all rows with role "use".
For censoring type:
"right|left": Surv(time, event, type = "right|left")
"interval": Surv(time, time2, type = "interval2")
TaskSurv$truth(rows = NULL)
rows(integer())
Row indices.
formula()
Creates a formula for survival models with survival::Surv() on the LHS
(left hand side).
TaskSurv$formula(rhs = NULL, reverse = FALSE)
rhsIf NULL, RHS (right hand side) is ".", otherwise RHS is "rhs".
reverseIf TRUE then formula calculated with 1 - status. Only applicable to "right"
or "left" censoring.
times()
Returns the (unsorted) outcome times.
TaskSurv$times(rows = NULL)
rows(integer())
Row indices.
numeric()
status()
Returns the event indicator (aka censoring/survival indicator).
If censoring type is "right" or "left" then 1 is event and 0 is censored.
If censoring type is "interval" then 0 means right-censored, 1 is
event, 2 is left-censored and 3 is interval-censored.
See survival::Surv().
TaskSurv$status(rows = NULL)
rows(integer())
Row indices.
integer()
unique_times()
Returns the sorted unique outcome times.
TaskSurv$unique_times(rows = NULL)
rows(integer())
Row indices.
numeric()
unique_event_times()
Returns the sorted unique event (or failure) outcome times.
TaskSurv$unique_event_times(rows = NULL)
rows(integer())
Row indices.
numeric()
kaplan()
Calls survival::survfit() to calculate the Kaplan-Meier estimator.
TaskSurv$kaplan(strata = NULL, rows = NULL, reverse = FALSE, ...)
strata(character())
Stratification variables to use.
rows(integer())
Subset of row indices.
reverse(logical())
If TRUE calculates Kaplan-Meier of censoring distribution (1-status). Default FALSE.
...(any)
Additional arguments passed down to survival::survfit.formula().
reverse()
Returns the same task with the status variable reversed, i.e., 1 - status.
TaskSurv$reverse()
cens_prop()
Returns the proportion of censoring for this survival task.
This the proportion of censored observations in case of "right" or
"left" censoring, otherwise the proportion of left (2), right (0) and
interval censored (3) observations when censoring type is "interval".
By default, this is returned for all observations, otherwise only the
specified ones (rows).
TaskSurv$cens_prop(rows = NULL)
rows(integer())
Row indices.
numeric()
admin_cens_prop()
Returns an estimated proportion of administratively censored observations (i.e. censored at or after a user-specified time point). Our main assumption here is that in an administratively censored dataset, the maximum censoring time is likely close to the maximum event time and so we expect higher proportion of censored subjects near the study end date.
Only designed for "right" censoring.
TaskSurv$admin_cens_prop(rows = NULL, admin_time = NULL, quantile_prob = 0.99)
rows(integer())
Row indices.
admin_time(numeric(1))
Administrative censoring time (in case it is known a priori).
quantile_prob(numeric(1))
Quantile probability value with which we calculate the cutoff time for
administrative censoring. Ignored, if admin_time is given.
By default, quantile_prob is equal to , which translates to a
time point very close to the maximum outcome time in the dataset.
A lower value will result in an earlier time point and therefore in a more
relaxed definition (i.e. higher proportion) of administrative censoring.
numeric()
dep_cens_prop()
Returns the proportion of covariates (task features) that are found to be significantly associated with censoring. This function fits a logistic regression model via glm with the censoring status as the response and using all features as predictors. If a covariate is significantly associated with the censoring status, it suggests that censoring may be informative (dependent) rather than random (non-informative). This methodology is more suitable for low-dimensional datasets where the number of features is relatively small compared to the number of observations.
Only designed for "right" censoring.
TaskSurv$dep_cens_prop(rows = NULL, method = "holm", sign_level = 0.05)
rows(integer())
Row indices.
method(character(1))
Method to adjust p-values for multiple comparisons, see p.adjust.methods.
Default is "holm".
sign_level(numeric(1))
Significance level for each coefficient's p-value from the logistic
regression model. Default is .
numeric()
prop_haz()
Checks if the data satisfy the proportional hazards (PH) assumption using the Grambsch-Therneau test, Grambsch (1994). Uses cox.zph. This method should be used only for low-dimensional datasets where the number of features is relatively small compared to the number of observations.
Only designed for "right" censoring.
TaskSurv$prop_haz()
numeric()
If no errors, the p-value of the global chi-square test.
A p-value is an indication of possible PH violation.
clone()
The objects of this class are cloneable with this method.
TaskSurv$clone(deep = FALSE)
deepWhether to make a deep clone.
Grambsch, Patricia, Therneau, Terry (1994). “Proportional hazards tests and diagnostics based on weighted residuals.” Biometrika, 81(3), 515–526. doi:10.1093/biomet/81.3.515, https://doi.org/10.1093/biomet/81.3.515.
Other Task:
TaskDens,
mlr_tasks_actg,
mlr_tasks_faithful,
mlr_tasks_gbcs,
mlr_tasks_gbsg,
mlr_tasks_grace,
mlr_tasks_lung,
mlr_tasks_mgus,
mlr_tasks_precip,
mlr_tasks_rats,
mlr_tasks_veteran,
mlr_tasks_whas
library(mlr3) task = tsk("lung") # meta data task$target_names # target is always (time, status) for right-censoring tasks task$feature_names task$formula() # survival data task$truth() # survival::Surv() object task$times() # (unsorted) times task$status() # event indicators (1 = death, 0 = censored) task$unique_times() # sorted unique times task$unique_event_times() # sorted unique event times task$kaplan(strata = "sex") # stratified Kaplan-Meier task$kaplan(reverse = TRUE) # Kaplan-Meier of the censoring distribution # proportion of censored observations across all dataset task$cens_prop() # proportion of censored observations at or after the 95% time quantile task$admin_cens_prop(quantile_prob = 0.95) # proportion of variables that are significantly associated with the # censoring status via a logistic regression model task$dep_cens_prop() # 0 indicates independent censoring # data barely satisfies proportional hazards assumption (p > 0.05) task$prop_haz() # veteran data is definitely non-PH (p << 0.05) tsk("veteran")$prop_haz()library(mlr3) task = tsk("lung") # meta data task$target_names # target is always (time, status) for right-censoring tasks task$feature_names task$formula() # survival data task$truth() # survival::Surv() object task$times() # (unsorted) times task$status() # event indicators (1 = death, 0 = censored) task$unique_times() # sorted unique times task$unique_event_times() # sorted unique event times task$kaplan(strata = "sex") # stratified Kaplan-Meier task$kaplan(reverse = TRUE) # Kaplan-Meier of the censoring distribution # proportion of censored observations across all dataset task$cens_prop() # proportion of censored observations at or after the 95% time quantile task$admin_cens_prop(quantile_prob = 0.95) # proportion of variables that are significantly associated with the # censoring status via a logistic regression model task$dep_cens_prop() # 0 indicates independent censoring # data barely satisfies proportional hazards assumption (p > 0.05) task$prop_haz() # veteran data is definitely non-PH (p << 0.05) tsk("veteran")$prop_haz()
whas dataset from Hosmer et al. (2008)
whaswhas
Identification Code
Age (per chart) (years).
Sex. 0 = Male. 1 = Female.
Peak cardiac enzyme (iu).
Cardiogenic shock complications. 1 = Yes. 0 = No.
Left heart failure complications. 1 = Yes. 0 = No.
MI Order. 1 = Recurrent. 0 = First.
MI Type. 1 = Q-wave. 2 = Not Q-wave. 3 = Indeterminate.
Cohort year.
Grouped cohort year.
Days in hospital.
Discharge status from hospital. 1 = Dead. 0 = Alive.
Total length of follow-up from hospital admission (days).
Status as of last follow-up. 1 = Dead. 0 = Alive.
https://onlinelibrary.wiley.com/doi/book/10.1002/9780470258019
Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY