The predictor (or risk score) will often be the result of a Cox model or other regression” and notes that: “For continuous covariates concordance is equivalent to Kendall’s tau, and for logistic regression is is equivalent to the area under the ROC curve.” To demonstrate using the , I’ll fit Aalen’s additive regression model for censored data to the veteran data.The documentation states: “The Aalen model assumes that the cumulative hazard H(t) for a subject can be expressed as a(t) X B(t), where a(t) is a time-dependent intercept term, X is the vector of covariates for the subject (possibly time-dependent), and B(t) is a time-dependent matrix of coefficients.” The plots show how the effects of the covariates change over time.

Today, survival analysis models are important in Engineering, Insurance, Marketing, Medicine, and many more application areas.

So, it is not surprising that R should be rich in survival analysis functions.

Finally, to provide an “eyeball comparison” of the three survival curves, I’ll plot them on the same graph.

The following code pulls out the survival data from the three model objects and puts them into a data frame for For this data set, I would put my money on a carefully constructed Cox model that takes into account the time varying coefficients.

The package, the numerous online resources, and the statistics such as concordance and Harrell’s c-index packed into the objects produced by fitting the models gives some idea of the statistical depth that underlies almost everything R.

For a very nice, basic tutorial on survival analysis, have a look at the Survival Analysis in R [5] and the OIsurv package produced by the folks at Open Intro.

Data scientists who are accustomed to computing ROC curves to assess model performance should be interested in the Concordance statistic.

The documentation for the package defines concordance as “the probability of agreement for any two randomly chosen observations, where in this case agreement means that the observation with the shorter survival time of the two also has the larger risk score.

This is a generalization of the ROC curve, which reduces to the Wilcoxon-Mann-Whitney statistic for binary variables, which in turn, is equivalent to computing the area under the ROC curve.

