Title: | Cox Models by Likelihood Based Boosting for a Single Survival Endpoint or Competing Risks |
---|---|
Description: | Provides routines for fitting Cox models by likelihood based boosting for a single endpoint or in presence of competing risks. |
Authors: | Harald Binder <[email protected]> |
Maintainer: | Harald Binder <[email protected]> |
License: | GPL-2 |
Version: | 1.5 |
Built: | 2024-11-03 03:50:08 UTC |
Source: | https://github.com/binderh/CoxBoost |
Extracts the coefficients from the specified boosting steps of a CoxBoost
object fitted by CoxBoost
.
## S3 method for class 'CoxBoost' coef(object, at.step = NULL, scaled = TRUE, ...)
## S3 method for class 'CoxBoost' coef(object, at.step = NULL, scaled = TRUE, ...)
object |
fitted CoxBoost object from a |
at.step |
scalar or vector of boosting step(s) at which prediction is wanted. If no step is given, the final boosting step is used. |
scaled |
logical value indicating whether coefficients should be
returned scaled to be at the level of the original covariate values, or
unscaled as used internally when |
... |
miscellaneous arguments, none of which is used at the moment. |
For a vector of length p
(number of covariates)
(at.step
being a scalar) or a length(at.step) * p
matrix
(at.step
being a vector).
Harald Binder [email protected]
CoxBoost
is used to fit a Cox proportional hazards model by
componentwise likelihood based boosting. It is especially suited for models
with a large number of predictors and allows for mandatory covariates with
unpenalized parameter estimates.
CoxBoost( time, status, x, unpen.index = NULL, standardize = TRUE, subset = 1:length(time), weights = NULL, stratum = NULL, stepno = 100, penalty = 9 * sum(status[subset] == 1), criterion = c("pscore", "score", "hpscore", "hscore"), cmprsk = c("sh", "csh", "ccsh"), coupled.strata = TRUE, stepsize.factor = 1, sf.scheme = c("sigmoid", "linear"), pendistmat = NULL, connected.index = NULL, x.is.01 = FALSE, return.score = TRUE, trace = FALSE )
CoxBoost( time, status, x, unpen.index = NULL, standardize = TRUE, subset = 1:length(time), weights = NULL, stratum = NULL, stepno = 100, penalty = 9 * sum(status[subset] == 1), criterion = c("pscore", "score", "hpscore", "hscore"), cmprsk = c("sh", "csh", "ccsh"), coupled.strata = TRUE, stepsize.factor = 1, sf.scheme = c("sigmoid", "linear"), pendistmat = NULL, connected.index = NULL, x.is.01 = FALSE, return.score = TRUE, trace = FALSE )
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
x |
|
unpen.index |
vector of length |
standardize |
logical value indicating whether covariates should be standardized for estimation. This does not apply for mandatory covariates, i.e., these are not standardized. |
subset |
a vector specifying a subset of observations to be used in the fitting process. |
weights |
optional vector of length |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
stepno |
number of boosting steps ( |
penalty |
penalty value for the update of an individual element of the parameter vector in each boosting step. |
criterion |
indicates the criterion to be used for selection in each
boosting step. |
cmprsk |
type of competing risk, specific hazards or cause-specific |
stepsize.factor |
determines the step-size modification factor by which
the natural step size of boosting steps should be changed after a covariate
has been selected in a boosting step. The default (value |
sf.scheme |
scheme for changing step sizes (via
|
pendistmat |
connection matrix with entries ranging between 0 and 1,
with entry |
connected.index |
indices of the |
x.is.01 |
logical value indicating whether (the non-mandatory part of)
|
return.score |
logical value indicating whether the value of the score
statistic (or penalized score statistic, depending on |
trace |
logical value indicating whether progress in estimation should be indicated by printing the name of the covariate updated. |
In contrast to gradient boosting (implemented e.g. in the glmboost
routine in the R package mboost
, using the CoxPH
loss
function), CoxBoost
is not based on gradients of loss functions, but
adapts the offset-based boosting approach from Tutz and Binder (2007) for
estimating Cox proportional hazards models. In each boosting step the
previous boosting steps are incorporated as an offset in penalized partial
likelihood estimation, which is employed for obtain an update for one single
parameter, i.e., one covariate, in every boosting step. This results in
sparse fits similar to Lasso-like approaches, with many estimated
coefficients being zero. The main model complexity parameter, which has to
be selected (e.g. by cross-validation using cv.CoxBoost
), is
the number of boosting steps stepno
. The penalty parameter
penalty
can be chosen rather coarsely, either by hand or using
optimCoxBoostPenalty
.
The advantage of the offset-based approach compared to gradient boosting is
that the penalty structure is very flexible. In the present implementation
this is used for allowing for unpenalized mandatory covariates, which
receive a very fast coefficient build-up in the course of the boosting
steps, while the other (optional) covariates are subjected to penalization.
For example in a microarray setting, the (many) microarray features would be
taken to be optional covariates, and the (few) potential clinical covariates
would be taken to be mandatory, by including their indices in
unpen.index
.
If a group of correlated covariates has influence on the response, e.g.
genes from the same pathway, componentwise boosting will often result in a
non-zero estimate for only one member of this group. To avoid this,
information on the connection between covariates can be provided in
pendistmat
. If then, in addition, a penalty updating scheme with
stepsize.factor
< 1 is chosen, connected covariates are more likely
to be chosen in future boosting steps, if a directly connected covariate has
been chosen in an earlier boosting step (see Binder and Schumacher, 2009b).
CoxBoost
returns an object of class CoxBoost
.
n , p
|
number of observations and number of covariates. |
stepno |
number of boosting steps. |
xnames |
vector of length
|
are used.
coefficients |
|
.
scoremat |
|
meanx , sdx
|
vector of mean values and standard deviations used for standardizing the covariates. |
unpen.index |
indices of the mandatory
covariates in the original covariate matrix |
penalty |
If
|
time |
observed times given in the
|
status |
censoring indicator given in the
|
event.times |
vector with event times from the
data given in the |
linear.predictors |
|
Lambda |
matrix with the Breslow estimate for the
cumulative baseline hazard for boosting steps |
logplik |
partial log-likelihood of the fitted model in the final boosting step. |
Written by Harald Binder [email protected].
Binder, H., Benner, A., Bullinger, L., and Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in Medicine, doi: 10.1002/sim.5490.
Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25:890-896.
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Tutz, G. and Binder, H. (2007) Boosting ridge regression. Computational Statistics \& Data Analysis, 51(12):6044-6059.
Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94:496-509.
predict.CoxBoost
, cv.CoxBoost
.
# Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a Cox proportional hazards model by CoxBoost cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100,penalty=100) summary(cbfit) # ... with covariates 1 and 2 being mandatory cbfit.mand <- CoxBoost(time=obs.time,status=status,x=x,unpen.index=c(1,2), stepno=100,penalty=100) summary(cbfit.mand)
# Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a Cox proportional hazards model by CoxBoost cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100,penalty=100) summary(cbfit) # ... with covariates 1 and 2 being mandatory cbfit.mand <- CoxBoost(time=obs.time,status=status,x=x,unpen.index=c(1,2), stepno=100,penalty=100) summary(cbfit.mand)
Performs a K-fold cross-validation for CoxBoost
in search for
the optimal number of boosting steps.
cv.CoxBoost( time, status, x, subset = 1:length(time), weights = NULL, stratum = NULL, maxstepno = 100, K = 10, type = c("verweij", "naive"), cmprsk = c("sh", "csh", "ccsh"), parallel = FALSE, upload.x = TRUE, multicore = FALSE, folds = NULL, trace = FALSE, ... )
cv.CoxBoost( time, status, x, subset = 1:length(time), weights = NULL, stratum = NULL, maxstepno = 100, K = 10, type = c("verweij", "naive"), cmprsk = c("sh", "csh", "ccsh"), parallel = FALSE, upload.x = TRUE, multicore = FALSE, folds = NULL, trace = FALSE, ... )
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
x |
|
subset |
a vector specifying a subset of observations to be used in the fitting process. |
weights |
weights to be passed to |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
maxstepno |
maximum number of boosting steps to evaluate, i.e, the
returned “optimal” number of boosting steps will be in the range
|
K |
number of folds to be used for cross-validation. If |
type |
way of calculating the partial likelihood contribution of the
observation in the hold-out folds: |
cmprsk |
type of competing risk, specific hazards or cause-specific |
parallel |
logical value indicating whether computations in the
cross-validation folds should be performed in parallel on a compute cluster,
using package |
upload.x |
logical value indicating whether |
multicore |
indicates whether computations in the cross-validation
folds should be performed in parallel, using package |
folds |
if not |
trace |
logical value indicating whether progress in estimation should be indicated by printing the number of the cross-validation fold and the index of the covariate updated. |
... |
miscellaneous parameters for the calls to
|
List with the following components:
mean.logplik |
vector of
length |
se.logplik |
vector with standard error estimates for the mean partial log-likelihood criterion for each boosting step. |
optimal.step |
optimal boosting step number, i.e., with minimum mean partial log-likelihood. |
folds |
list of length |
Harald Binder [email protected]
Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24):2305-2314.
CoxBoost
, optimCoxBoostPenalty
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # 10-fold cross-validation cv.res <- cv.CoxBoost(time=obs.time,status=status,x=x,maxstepno=500, K=10,type="verweij",penalty=100) # examine mean partial log-likelihood in the course of the boosting steps plot(cv.res$mean.logplik) # Fit with optimal number of boosting steps cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=cv.res$optimal.step, penalty=100) summary(cbfit) ## End(Not run)
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # 10-fold cross-validation cv.res <- cv.CoxBoost(time=obs.time,status=status,x=x,maxstepno=500, K=10,type="verweij",penalty=100) # examine mean partial log-likelihood in the course of the boosting steps plot(cv.res$mean.logplik) # Fit with optimal number of boosting steps cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=cv.res$optimal.step, penalty=100) summary(cbfit) ## End(Not run)
iCoxBoost
This function allows to set the control parameters for cross-validation to
be passed into a call to iCoxBoost
.
cvcb.control( K = 10, type = c("verweij", "naive"), parallel = FALSE, upload.x = TRUE, multicore = FALSE, folds = NULL )
cvcb.control( K = 10, type = c("verweij", "naive"), parallel = FALSE, upload.x = TRUE, multicore = FALSE, folds = NULL )
K |
number of folds to be used for cross-validation. If |
type |
way of calculating the partial likelihood contribution of the
observation in the hold-out folds: |
parallel |
logical value indicating whether computations in the
cross-validation folds should be performed in parallel on a compute cluster,
using package |
upload.x |
logical value indicating whether |
multicore |
indicates whether computations in the cross-validation
folds should be performed in parallel, using package |
folds |
if not |
List with elements corresponding to the call arguments.
Written by Harald Binder [email protected].
Verweij, P. J. M. and van Houwelingen, H. C. (1993). Cross-validation in survival analysis. Statistics in Medicine, 12(24):2305-2314.
Performs permutation-based p-value estimation for the optional covariates in
a fit from CoxBoost
.
estimPVal( object, x, permute.n = 10, per.covariate = FALSE, parallel = FALSE, multicore = FALSE, trace = FALSE, ... )
estimPVal( object, x, permute.n = 10, per.covariate = FALSE, parallel = FALSE, multicore = FALSE, trace = FALSE, ... )
object |
fit object obtained from |
x |
|
permute.n |
number of permutations employed for obtaining a null distribution. |
per.covariate |
logical value indicating whether a separate null distribution should be considered for each covariate. A larger number of permutations will be needed if this is wanted. |
parallel |
logical value indicating whether computations for obtaining
a null distribution via permutation should be performed in parallel on a
compute cluster. Parallelization is performed via the package
|
multicore |
indicates whether computations in the permuted data sets
should be performed in parallel, using package |
trace |
logical value indicating whether progress in estimation should be indicated by printing the number of the permutation that is currently being evaluated. |
... |
miscellaneous parameters for the calls to
|
As p-value estimates are based on permutations, random numbers are drawn for
determining permutation indices. Therfore, the results depend on the state
of the random number generator. This can be used to explore the variability
due to random variation and help to determine an adequate value for
permute.n
. A value of 100 should be sufficient, but this can be quite
slow. If there is a considerable number of covariates, e.g., larger than
100, a much smaller number of permutations, e.g., 10, might already work
well. The estimates might also be negatively affected, if only a small
number of boosting steps (say <50) was employed for the original fit.
Vector with p-value estimates, one value for each optional covariate
specificed in the original call to CoxBoost
.
Harald Binder [email protected]
Binder, H., Porzelius, C. and Schumacher, M. (2009). Rank-based p-values for sparse high-dimensional risk prediction models fitted by componentwise boosting. FDM-Preprint Nr. 101, University of Freiburg, Germany.
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a Cox proportional hazards model by CoxBoost cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100, penalty=100) # estimate p-values p1 <- estimPVal(cbfit,x,permute.n=10) # get a second vector of estimates for checking how large # random variation is p2 <- estimPVal(cbfit,x,permute.n=10) plot(p1,p2,xlim=c(0,1),ylim=c(0,1),xlab="permute 1",ylab="permute 2") ## End(Not run)
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a Cox proportional hazards model by CoxBoost cbfit <- CoxBoost(time=obs.time,status=status,x=x,stepno=100, penalty=100) # estimate p-values p1 <- estimPVal(cbfit,x,permute.n=10) # get a second vector of estimates for checking how large # random variation is p2 <- estimPVal(cbfit,x,permute.n=10) plot(p1,p2,xlim=c(0,1),ylim=c(0,1),xlab="permute 1",ylab="permute 2") ## End(Not run)
Formula interface for fitting a Cox proportional hazards model by
componentwise likelihood based boosting (via a call to
CoxBoost
), where cross-validation can be performed
automatically for determining the number of boosting steps (via a call to
cv.CoxBoost
).
iCoxBoost( formula, data = NULL, weights = NULL, subset = NULL, mandatory = NULL, cause = 1, cmprsk = c("sh", "csh", "ccsh"), standardize = TRUE, stepno = 200, criterion = c("pscore", "score", "hpscore", "hscore"), nu = 0.1, stepsize.factor = 1, varlink = NULL, cv = cvcb.control(), trace = FALSE, ... )
iCoxBoost( formula, data = NULL, weights = NULL, subset = NULL, mandatory = NULL, cause = 1, cmprsk = c("sh", "csh", "ccsh"), standardize = TRUE, stepno = 200, criterion = c("pscore", "score", "hpscore", "hscore"), nu = 0.1, stepsize.factor = 1, varlink = NULL, cv = cvcb.control(), trace = FALSE, ... )
formula |
A formula describing the model to be fitted, similar to a
call to |
data |
data frame containing the variables described in the formula. |
weights |
optional vector, for specifying weights for the individual observations. |
subset |
a vector specifying a subset of observations to be used in the fitting process. |
mandatory |
vector containing the names of the covariates whose effect is to be estimated un-regularized. |
cause |
cause of interest in a competing risks setting, when the
response is specified by |
cmprsk |
type of competing risk, specific hazards or cause-specific |
standardize |
logical value indicating whether covariates should be standardized for estimation. This does not apply for mandatory covariates, i.e., these are not standardized. |
stepno |
maximum number of boosting steps to be evaluated when determining the number of boosting steps by cross-validation, otherwise the number of boosting seps itself. |
criterion |
indicates the criterion to be used for selection in each
boosting step. |
nu |
(roughly) the fraction of the partial maximum likelihood estimate
used for the update in each boosting step. This is converted into a penalty
for the call to |
stepsize.factor |
determines the step-size modification factor by which
the natural step size of boosting steps should be changed after a covariate
has been selected in a boosting step. The default (value |
varlink |
list for specifying links between covariates, used to
re-distribute step sizes when |
cv |
|
trace |
logical value indicating whether progress in estimation should be indicated by printing the name of the covariate updated. |
... |
miscellaneous arguments, passed to the call to
|
In contrast to gradient boosting (implemented e.g. in the glmboost
routine in the R package mboost
, using the CoxPH
loss
function), CoxBoost
is not based on gradients of loss functions, but
adapts the offset-based boosting approach from Tutz and Binder (2007) for
estimating Cox proportional hazards models. In each boosting step the
previous boosting steps are incorporated as an offset in penalized partial
likelihood estimation, which is employed for obtain an update for one single
parameter, i.e., one covariate, in every boosting step. This results in
sparse fits similar to Lasso-like approaches, with many estimated
coefficients being zero. The main model complexity parameter, the number of
boosting steps, is automatically selected by cross-validation using a call
to cv.CoxBoost
). Note that this will introduce random
variation when repeatedly calling iCoxBoost
, i.e. it is advised to
set/save the random number generator state for reproducible results.
The advantage of the offset-based approach compared to gradient boosting is
that the penalty structure is very flexible. In the present implementation
this is used for allowing for unpenalized mandatory covariates, which
receive a very fast coefficient build-up in the course of the boosting
steps, while the other (optional) covariates are subjected to penalization.
For example in a microarray setting, the (many) microarray features would be
taken to be optional covariates, and the (few) potential clinical covariates
would be taken to be mandatory, by including their names in
mandatory
.
If a group of correlated covariates has influence on the response, e.g.
genes from the same pathway, componentwise boosting will often result in a
non-zero estimate for only one member of this group. To avoid this,
information on the connection between covariates can be provided in
varlink
. If then, in addition, a penalty updating scheme with
stepsize.factor
< 1 is chosen, connected covariates are more likely
to be chosen in future boosting steps, if a directly connected covariate has
been chosen in an earlier boosting step (see Binder and Schumacher, 2009b).
iCoxBoost
returns an object of class iCoxBoost
, which
also has class CoxBoost
. In addition to the elements from
CoxBoost
it has the following elements:
call , formula , terms
|
call, formula and terms from the formula interface. |
cause |
cause of interest. |
cv.res |
result from
|
Written by Harald Binder [email protected].
Binder, H., Benner, A., Bullinger, L., and Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in Medicine, doi: 10.1002/sim.5490.
Binder, H., Allignol, A., Schumacher, M., and Beyersmann, J. (2009). Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics, 25:890-896.
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Tutz, G. and Binder, H. (2007) Boosting ridge regression. Computational Statistics \& Data Analysis, 51(12):6044-6059.
Fine, J. P. and Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94:496-509.
predict.iCoxBoost
, CoxBoost
,
cv.CoxBoost
.
# Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,2),rep(0,p-2)) x <- matrix(rnorm(n*p),n,p) actual.data <- as.data.frame(x) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) actual.data$status <- ifelse(real.time <= cens.time,1,0) actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a Cox proportional hazards model by iCoxBoost cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data) summary(cbfit) plot(cbfit) # ... with covariates 1 and 2 being mandatory cbfit.mand <- iCoxBoost(Surv(time,status) ~ .,data=actual.data,mandatory=c("V1")) summary(cbfit.mand) plot(cbfit.mand)
# Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,2),rep(0,p-2)) x <- matrix(rnorm(n*p),n,p) actual.data <- as.data.frame(x) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) actual.data$status <- ifelse(real.time <= cens.time,1,0) actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a Cox proportional hazards model by iCoxBoost cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data) summary(cbfit) plot(cbfit) # ... with covariates 1 and 2 being mandatory cbfit.mand <- iCoxBoost(Surv(time,status) ~ .,data=actual.data,mandatory=c("V1")) summary(cbfit.mand) plot(cbfit.mand)
This routine helps in finding a penalty value that leads to an “optimal” number of boosting steps for CoxBoost, determined by cross-validation, that is not too small/in a specified range.
optimCoxBoostPenalty( time, status, x, minstepno = 50, maxstepno = 200, start.penalty = 9 * sum(status == 1), iter.max = 10, upper.margin = 0.05, parallel = FALSE, trace = FALSE, ... )
optimCoxBoostPenalty( time, status, x, minstepno = 50, maxstepno = 200, start.penalty = 9 * sum(status == 1), iter.max = 10, upper.margin = 0.05, parallel = FALSE, trace = FALSE, ... )
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
x |
|
minstepno , maxstepno
|
range of boosting steps in which the “optimal” number of boosting steps is wanted to be. |
start.penalty |
start value for the search for the appropriate penalty. |
iter.max |
maximum number of search iterations. |
upper.margin |
specifies the fraction of |
parallel |
logical value indicating whether computations in the
cross-validation folds should be performed in parallel on a compute cluster.
Parallelization is performed via the package |
trace |
logical value indicating whether information on progress should be printed. |
... |
miscellaneous parameters for |
The penalty parameter for CoxBoost
has to be chosen only very
coarsely. In Tutz and Binder (2006) it is suggested for likelihood based
boosting just to make sure, that the optimal number of boosting steps,
according to some criterion such as cross-validation, is larger or equal to
50. With a smaller number of steps, boosting may become too “greedy” and
show sub-optimal performance. This procedure uses a very coarse line search
and so one should specify a rather large range of boosting steps.
List with element penalty
containing the “optimal” penalty
and cv.res
containing the corresponding result of cv.CoxBoost
.
Written by Harald Binder [email protected].
Tutz, G. and Binder, H. (2006) Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics, 62:961-971.
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # determine penalty parameter optim.res <- optimCoxBoostPenalty(time=obs.time,status=status,x=x, trace=TRUE,start.penalty=500) # Fit with obtained penalty parameter and optimal number of boosting # steps obtained by cross-validation cbfit <- CoxBoost(time=obs.time,status=status,x=x, stepno=optim.res$cv.res$optimal.step, penalty=optim.res$penalty) summary(cbfit) ## End(Not run)
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # determine penalty parameter optim.res <- optimCoxBoostPenalty(time=obs.time,status=status,x=x, trace=TRUE,start.penalty=500) # Fit with obtained penalty parameter and optimal number of boosting # steps obtained by cross-validation cbfit <- CoxBoost(time=obs.time,status=status,x=x, stepno=optim.res$cv.res$optimal.step, penalty=optim.res$penalty) summary(cbfit) ## End(Not run)
This routine helps in finding an optimum step-size modification factor for
CoxBoost
, i.e., that results in an optimum in terms of
cross-validated partial log-likelihood.
optimStepSizeFactor( time, status, x, direction = c("down", "up", "both"), start.stepsize = 0.1, iter.max = 10, constant.cv.res = NULL, parallel = FALSE, trace = FALSE, ... )
optimStepSizeFactor( time, status, x, direction = c("down", "up", "both"), start.stepsize = 0.1, iter.max = 10, constant.cv.res = NULL, parallel = FALSE, trace = FALSE, ... )
time |
vector of length |
status |
censoring indicator, i.e., vector of length |
x |
|
direction |
direction of line search for an optimal step-size modification factor (starting from value 1). |
start.stepsize |
step size used for the line search. A final step is performed using half this size. |
iter.max |
maximum number of search iterations. |
constant.cv.res |
result of |
parallel |
logical value indicating whether computations in the
cross-validation folds should be performed in parallel on a compute cluster.
Parallelization is performed via the package |
trace |
logical value indicating whether information on progress should be printed. |
... |
miscellaneous parameters for |
A coarse line search is performed for finding the best parameter
stepsize.factor
for CoxBoost
. If an pendistmat
argument is provided (which is passed on to CoxBoost
), a
search for factors smaller than 1 is sensible (corresponding to
direction="down"
). If no connection information is provided, it is
reasonable to employ direction="both"
, for avoiding restrictions
without subject matter knowledge.
List with the following components:
factor.list |
array with the evaluated step-size modification factors. |
critmat |
matrix with the mean partial log-likelihood for each step-size modification factor in the course of the boosting steps. |
optimal.factor.index |
index of the optimal step-size modification factor. |
optimal.factor |
optimal step-size modification factor. |
optimal.step |
optimal boosting step
number, i.e., with minimum mean partial log-likelihood, for step-size
modification factor |
Written by Harald Binder [email protected].
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Determine step-size modification factor. As there is no connection matrix, # perform search into both directions optim.res <- optimStepSizeFactor(direction="both", time=obs.time,status=status,x=x, trace=TRUE) # Fit with obtained step-size modification parameter and optimal number of boosting # steps obtained by cross-validation cbfit <- CoxBoost(time=obs.time,status=status,x=x, stepno=optim.res$optimal.step, stepsize.factor=optim.res$optimal.factor) summary(cbfit) ## End(Not run)
## Not run: # Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Determine step-size modification factor. As there is no connection matrix, # perform search into both directions optim.res <- optimStepSizeFactor(direction="both", time=obs.time,status=status,x=x, trace=TRUE) # Fit with obtained step-size modification parameter and optimal number of boosting # steps obtained by cross-validation cbfit <- CoxBoost(time=obs.time,status=status,x=x, stepno=optim.res$optimal.step, stepsize.factor=optim.res$optimal.factor) summary(cbfit) ## End(Not run)
Plots coefficient paths, i.e. the parameter estimates plotted against the
boosting steps as obtained from a CoxBoost object fitted by
CoxBoost
.
## S3 method for class 'CoxBoost' plot( x, line.col = "dark grey", label.cex = 0.6, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, main = NULL, ... )
## S3 method for class 'CoxBoost' plot( x, line.col = "dark grey", label.cex = 0.6, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, main = NULL, ... )
x |
fitted CoxBoost object from a |
line.col |
color of the lines of the coefficient path |
label.cex |
scaling factor for the variable labels. |
xlab |
label for the x axis, default label when |
ylab |
label for the y axis, default label when |
xlim , ylim
|
plotting ranges, default label when |
main |
a main title for the plot |
... |
miscellaneous arguments, passed to the plot routine. |
No value is returned, but a plot is generated.
Harald Binder [email protected]
Obtains predictions at specified boosting steps from a CoxBoost object
fitted by CoxBoost
.
## S3 method for class 'CoxBoost' predict( object, newdata = NULL, newtime = NULL, newstatus = NULL, subset = NULL, weights = NULL, stratum = NULL, at.step = NULL, times = NULL, type = c("lp", "logplik", "risk", "CIF"), ... )
## S3 method for class 'CoxBoost' predict( object, newdata = NULL, newtime = NULL, newstatus = NULL, subset = NULL, weights = NULL, stratum = NULL, at.step = NULL, times = NULL, type = c("lp", "logplik", "risk", "CIF"), ... )
object |
fitted CoxBoost object from a |
newdata |
|
newtime , newstatus
|
vectors with observed time and censoring indicator
(0 for censoring, 1 for no censoring, and any other values for competing
events in a competing risks setting) for new observations, where prediction
is wanted. Only required if predicted partial log-likelihood is wanted,
i.e., if |
subset |
an optional vector specifying a subset of observations to be used for evaluation. |
weights |
weights for each time. |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
at.step |
scalar or vector of boosting step(s) at which prediction is
wanted. If |
times |
vector with |
type |
type of prediction to be returned: |
... |
miscellaneous arguments, none of which is used at the moment. |
For type="lp"
and type="logplik"
a vector of length
n.new
(at.step
being a scalar) or a n.new *
length(at.step)
matrix (at.step
being a vector) with predictions is
returned. For type="risk"
or type="CIF"
a n.new * T
matrix with predicted probabilities at the specific time points is returned.
Harald Binder [email protected]
# Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # define training and test set train.index <- 1:100 test.index <- 101:200 # Fit CoxBoost to the training data cbfit <- CoxBoost(time=obs.time[train.index],status=status[train.index], x=x[train.index,],stepno=300,penalty=100) # mean partial log-likelihood for test set in every boosting step step.logplik <- predict(cbfit,newdata=x[test.index,], newtime=obs.time[test.index], newstatus=status[test.index], at.step=0:300,type="logplik") plot(step.logplik) # names of covariates with non-zero coefficients at boosting step # with maximal test set partial log-likelihood print(cbfit$xnames[cbfit$coefficients[which.max(step.logplik),] != 0])
# Generate some survival data with 10 informative covariates n <- 200; p <- 100 beta <- c(rep(1,10),rep(0,p-10)) x <- matrix(rnorm(n*p),n,p) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # define training and test set train.index <- 1:100 test.index <- 101:200 # Fit CoxBoost to the training data cbfit <- CoxBoost(time=obs.time[train.index],status=status[train.index], x=x[train.index,],stepno=300,penalty=100) # mean partial log-likelihood for test set in every boosting step step.logplik <- predict(cbfit,newdata=x[test.index,], newtime=obs.time[test.index], newstatus=status[test.index], at.step=0:300,type="logplik") plot(step.logplik) # names of covariates with non-zero coefficients at boosting step # with maximal test set partial log-likelihood print(cbfit$xnames[cbfit$coefficients[which.max(step.logplik),] != 0])
Obtains predictions at specified boosting steps from a iCoxBoost object
fitted by iCoxBoost
.
## S3 method for class 'iCoxBoost' predict( object, newdata = NULL, subset = NULL, at.step = NULL, times = NULL, type = c("lp", "logplik", "risk", "CIF"), ... )
## S3 method for class 'iCoxBoost' predict( object, newdata = NULL, subset = NULL, at.step = NULL, times = NULL, type = c("lp", "logplik", "risk", "CIF"), ... )
object |
fitted CoxBoost object from a |
newdata |
data frame with new covariate values (for |
subset |
an optional vector specifying a subset of observations to be used for evaluation. |
at.step |
scalar or vector of boosting step(s) at which prediction is
wanted. If |
times |
vector with |
type |
type of prediction to be returned: |
... |
miscellaneous arguments, none of which is used at the moment. |
For type="lp"
and type="logplik"
a vector of length
n.new
(at.step
being a scalar) or a n.new *
length(at.step)
matrix (at.step
being a vector) with predictions is
returned. For type="risk"
or type="CIF"
a n.new * T
matrix with predicted probabilities at the specific time points is returned.
Harald Binder [email protected]
n <- 200; p <- 100 beta <- c(rep(1,2),rep(0,p-2)) x <- matrix(rnorm(n*p),n,p) actual.data <- as.data.frame(x) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) actual.data$status <- ifelse(real.time <= cens.time,1,0) actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time) # define training and test set train.index <- 1:100 test.index <- 101:200 # Fit a Cox proportional hazards model by iCoxBoost cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data[train.index,], stepno=300,cv=FALSE) # mean partial log-likelihood for test set in every boosting step step.logplik <- predict(cbfit,newdata=actual.data[test.index,], at.step=0:300,type="logplik") plot(step.logplik) # names of covariates with non-zero coefficients at boosting step # with maximal test set partial log-likelihood print(coef(cbfit,at.step=which.max(step.logplik)-1))
n <- 200; p <- 100 beta <- c(rep(1,2),rep(0,p-2)) x <- matrix(rnorm(n*p),n,p) actual.data <- as.data.frame(x) real.time <- -(log(runif(n)))/(10*exp(drop(x %*% beta))) cens.time <- rexp(n,rate=1/10) actual.data$status <- ifelse(real.time <= cens.time,1,0) actual.data$time <- ifelse(real.time <= cens.time,real.time,cens.time) # define training and test set train.index <- 1:100 test.index <- 101:200 # Fit a Cox proportional hazards model by iCoxBoost cbfit <- iCoxBoost(Surv(time,status) ~ .,data=actual.data[train.index,], stepno=300,cv=FALSE) # mean partial log-likelihood for test set in every boosting step step.logplik <- predict(cbfit,newdata=actual.data[test.index,], at.step=0:300,type="logplik") plot(step.logplik) # names of covariates with non-zero coefficients at boosting step # with maximal test set partial log-likelihood print(coef(cbfit,at.step=which.max(step.logplik)-1))
Print a CoxBoost type
## S3 method for class 'CoxBoost' print(x, ...)
## S3 method for class 'CoxBoost' print(x, ...)
x |
a CoxBoost object |
... |
not used |
resample.CoxBoost
is used to perform resampling for stratified Cox
proportional hazards models by componentwise likelihood based boosting with
weights for developing subgroup covariate signatures. Specifically, this
weighted Cox regression approach is for performing a subgroup analysis, not
for the standard subgroup analysis but for a weighted form of it where the
observations of the not analyzed subgroup are down-weighted.
resample.CoxBoost
calls cv.CoxBoost
and CoxBoost
.
resample.CoxBoost( time, status, x, rep = 100, maxstepno = 200, multicore = TRUE, mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), stratum, stratnotinfocus = 0, penalty = sum(status) * (1/0.02 - 1), criterion = "hscore", unpen.index = NULL )
resample.CoxBoost( time, status, x, rep = 100, maxstepno = 200, multicore = TRUE, mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), stratum, stratnotinfocus = 0, penalty = sum(status) * (1/0.02 - 1), criterion = "hscore", unpen.index = NULL )
time |
vector of length |
status |
vector of length |
x |
|
rep |
number of resampling data sets used for the evaluation of model
stability. Should be equal to |
maxstepno |
maximum number of boosting steps needed for
cross-validation in |
multicore |
logical value: If |
mix.list |
vector of different relative weights between zero and one, for specifying weights for the individual observations for the weighted regression approach. For each weight, a stratified Cox regression by componentwise boosting is fitted. |
stratum |
vector specifying different groups of individuals for a
stratified Cox regression. In |
stratnotinfocus |
value for the group which is not analyzed but should
be weighted down in the analysis with the different weights element of
|
penalty |
penalty value for the covariates for the update of an
individual element in each boosting step. The mandatory covariates should
get a penalty value of zero, which can be realized with |
criterion |
indicates the criterion to be used for selection in each
boosting step in |
unpen.index |
index for mandatory covariates, which should get no penalty value in the log-partial likelihood function. |
The CoxBoost
function can be used for performing variable selection
for time-to-event data based on componentwise likelihood based boosting
(Binder and Schumacher, 2009; Binder et al., 2013) for obtaining signatures
for high dimensional data, such as gene expression or high throughput
methylation measurements. If there is any heterogeneity due to known
patient subgroups, it would be interesting for developing subgroup
signatures to account for this heterogeneity. To account for heterogeneity
in the data a stratified Cox model via stratum
can be performed,
which allows for different baseline hazards in each subgroup. For performing
a stratified CoxBoost
for such heterogeneous subgroups the
stratum
can be set to a subgroup variable, containing also the
subgroup for which a subgroup signature should be developed. A standard
subgroup analysis may lead to a decreased statistical power. With a
weighted approach based on componentwise likelihood-based boosting one can
develop subgroup signatures without losing statistical power as it is the
case in standard subgroup analysis. This approach focuses on building a risk
prediction signature for a specific stratum by down-weighting the
observations (Simon, 2002) from the other strata using a range of weights.
The not analyzed subgroup has to be indicated in stratnotinfocus
, the
observations of this subgroup, which is not in focus but should be retained
in the analysis, are weighted down by different relative weights. Binder et
al. (2012) propose such a weighting approach for high dimensional data but
not for time-to-event endpoints. resample.CoxBoost
performs a
weighted regression approach for time-to-event data while using resampling
over rep
resampling data sets, specifically for evaluating the
results. The different weights can be entered by mix.list
which
should be from a range between zero and one. For examining the variable
selection stability for specific covariates as a function of the weights the
beta coefficients are an output value of resample.CoxBoost
, so that
resampling inclusion frequencies (Sauerbrei et. al, 2011) can be calculated
and visualized via the functions stabtrajec
and weightfreqmap
.
For mandatory variables (Binder and Schumacher, 2008), which are only
included for adjusting the model no penalty should be added, these can be
indicated by the indices in unpen.index
.
resample.CoxBoost
returns a list of length rep
list
elements with each list element being further a list of the following two
objects:
CV.opt |
number of optimal boosting steps obtained from
|
beta |
beta coefficients obtained
from |
Written by Veronika Weyer [email protected] and Harald Binder [email protected].
Binder, H., Benner, A., Bullinger, L., Schumacher, M. (2013). Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Statistics in medicine 32(10), 1778-1791
Binder, H., Müller, T., Schwender, H., Golka, K., Steffens, M., G., H.J., Ickstadt, K., and Schumacher, M (2012). Cluster-Localized Sparse Logistic Regression for SNP Data. Statistical applications in genetics and molecular biology, 11(4), 1-31
Binder, H. and Schumacher, M. (2009). Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 10:18.
Binder, H. and Schumacher, M. (2008). Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 9:14.
Sauerbrei, W., Boulesteix, A.-L., Binder, H. (2011). Stability investigations of multivariable regression models derived from low- and high-dimensional data. Journal of biopharmaceutical statistics 21(6), 1206-31
Simon, R. (2002). Bayesian subset analysis: application to studying treatment-by-gender interactions. Statistics in medicine 21(19), 2909-16
## Not run: # Generate survival data with five informative covariates for one subgroup n <- 400; p <- 1000 set.seed(129) group<-rbinom(n,1,0.5) x <- matrix(rnorm(n*p,0,1),n,p) beta.vec1 <- c(c(1,1,1,1,1),rep(0,p-5)) beta.vec0 <- c(c(0,0,0,0,0),rep(0,p-5)) linpred<-ifelse(group==1,x %*% beta.vec1,x %*% beta.vec0) set.seed(1234) real.time<- (-(log(runif(n)))/(1/20*exp(linpred))) cens.time <- rexp(n,rate=1/20) obs.status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a stratified weighted Cox proportional hazards model # by \code{resample.CoxBoost} RIF <- resample.CoxBoost( time=obs.time,status=obs.status,x=x,rep=100, maxstepno=200,multicore=FALSE, mix.list=c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), stratum=group,stratnotinfocus=0,penalty=sum(obs.status)*(1/0.02-1), criterion="hscore",unpen.index=NULL) # RIF is a list with number of resampling data sets list objects, each list # object has further two list objects, the beta coefficients for all # weights and the selected number of boosting steps. # For each list object i \code{RIF[[i]]$beta} contains the beta # coefficients with length mix.list*p for p covariates and with # \code{RIF[[i]]$CV.opt} # For getting an insight in the RIF Distribution of different weights # use the following code: RIF1<-c() for (i in 1: length(RIF)){RIF1<-c(RIF1,RIF[[i]][[1]])} freqmat <-matrix(apply(matrix(unlist(RIF1), ncol=length(RIF))!=0,1,mean), ncol=length(mix.list)) # freqmat is a matrix with p rows and \code{length(mix.list)} columns # which contains resampling inclusion frequencies for the different # covariates and different weights. ## End(Not run)
## Not run: # Generate survival data with five informative covariates for one subgroup n <- 400; p <- 1000 set.seed(129) group<-rbinom(n,1,0.5) x <- matrix(rnorm(n*p,0,1),n,p) beta.vec1 <- c(c(1,1,1,1,1),rep(0,p-5)) beta.vec0 <- c(c(0,0,0,0,0),rep(0,p-5)) linpred<-ifelse(group==1,x %*% beta.vec1,x %*% beta.vec0) set.seed(1234) real.time<- (-(log(runif(n)))/(1/20*exp(linpred))) cens.time <- rexp(n,rate=1/20) obs.status <- ifelse(real.time <= cens.time,1,0) obs.time <- ifelse(real.time <= cens.time,real.time,cens.time) # Fit a stratified weighted Cox proportional hazards model # by \code{resample.CoxBoost} RIF <- resample.CoxBoost( time=obs.time,status=obs.status,x=x,rep=100, maxstepno=200,multicore=FALSE, mix.list=c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), stratum=group,stratnotinfocus=0,penalty=sum(obs.status)*(1/0.02-1), criterion="hscore",unpen.index=NULL) # RIF is a list with number of resampling data sets list objects, each list # object has further two list objects, the beta coefficients for all # weights and the selected number of boosting steps. # For each list object i \code{RIF[[i]]$beta} contains the beta # coefficients with length mix.list*p for p covariates and with # \code{RIF[[i]]$CV.opt} # For getting an insight in the RIF Distribution of different weights # use the following code: RIF1<-c() for (i in 1: length(RIF)){RIF1<-c(RIF1,RIF[[i]][[1]])} freqmat <-matrix(apply(matrix(unlist(RIF1), ncol=length(RIF))!=0,1,mean), ncol=length(mix.list)) # freqmat is a matrix with p rows and \code{length(mix.list)} columns # which contains resampling inclusion frequencies for the different # covariates and different weights. ## End(Not run)
Plots the stability trajectories, i.e. the variable selection stability is
plotted in form of resampling inclusion frequencies for different weights
and a selection of different stably selected genes as obtained from a
resCoxBoost object fitted by resample.CoxBoost
.
stabtrajec( RIF, mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), plotmix = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), my.colors = grDevices::gray(seq(0.99, 0, len = 10)), yupperlim = 1, huge = 0.6, lowerRIFlimit = 0.6, legendval = 4.5 ) weightfreqmap( RIF, mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), plotmix = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), lowerRIFlimit = 0.5, method = "complete" )
stabtrajec( RIF, mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), plotmix = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), my.colors = grDevices::gray(seq(0.99, 0, len = 10)), yupperlim = 1, huge = 0.6, lowerRIFlimit = 0.6, legendval = 4.5 ) weightfreqmap( RIF, mix.list = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), plotmix = c(0.001, 0.01, 0.05, 0.1, 0.25, 0.35, 0.5, 0.7, 0.9, 0.99), lowerRIFlimit = 0.5, method = "complete" )
RIF |
list obtained from a |
mix.list |
vector of weights which were also entered in the
|
plotmix |
vector of weights which should be plotted in the stability trajectory plot. |
my.colors |
vector with length(plotmix) of different colors for plotting the different weights, default are gray shades. |
yupperlim |
value for the upper y coordinate. |
huge |
size of the labels plottet on the x-axis. |
lowerRIFlimit |
cutoff RIF value: All covariates which have an resampling inclusion frequency (RIF) greater or equal this lowerRIFlimit value at any weight are presented in the stability trajectory plot. |
legendval |
space between the last plotted covariate and the legend on the right side of the plot. |
method |
passed to |
The stabtrajec
function produces a visualization tool (stability
trajectory plot), which shows the RIFs as a function of weights for stably
selected covariates selected via resample.CoxBoost
. With this tool it
can be shown that with weights between zero and one and so the additional
information from the observations of the not analyzed subgroup in comparison
to the standard subgroup analysis (weight near zero) can improve the
variable selection stability. The number of plotted covariates in this plot
depends on lowerRIFlimit
. How many weights are plotted which are
element of mix.list
can be changed with plotmix
.
No value is returned, but the stability trajectory plot is generated.
Veronika Weyer [email protected] and Harald Binder [email protected]
Summary of a CoxBoost type
## S3 method for class 'CoxBoost' summary(object, ...)
## S3 method for class 'CoxBoost' summary(object, ...)
object |
a CoxBoost object |
... |
not used |