This document shows all code, intermediate outcomes, and final outcomes belonging to the manuscript “Simulating Progression-Free and Overall Survival for First-Line Doublet Chemotherapy With or Without Bevacizumab in Metastatic Colorectal Cancer Patients Based on Real-World Registry Data” by K Degeling et al. in PharmacoEconomics (available here). Using real world data from the Treatment of Recurrent and Advanced Colorectal Cancer registry, two parametric survival models and a logistic regression model are estimated to populate a discrete event simulation model. Please read the manuscript before reading through this document, as the manuscript contains information essential for understanding decisions made in the code that is included in this document.
Please appropriately cite the work above if you use any code provided in this document for your own analyses.
This document contains the following sections:
Please note that the number of bootstrap iterations and simulation runs performed is set to a lower number for illustration purposes due to the required runtime to perform the 10,000 iterations/simulations that are performed for the final analyses presented in the manuscript. Hence, results and plots in this document may be different from those reported in the manuscript. Results presented in the manuscript should be considered as the final and definitive results at all times.
To intialize the data analysis, several actions are performed. Most importantly, the data are loaded from an encrypted hard drive, as are the input matrices defined by clinicians to recode the treatments as observed in the unprocessed data.
# Clear Workspace and plots, and clear console
rm(list=ls()); gc(); if(!is.null(dev.list())) dev.off(); cat("\014");
# Load packages
library(dplyr);
library(readxl);
library(stringr);
# Set the locale so R handles dates as English regardless of location
Sys.setlocale("LC_TIME", "English");
# Load the raw TRACC data for the patient characteristics and first line treatment
df <- read_excel("G:/Files/mCRC Targeting Melbourne/CRC MODELING DATA 20181105.xlsx",
col_types = c("numeric", "date", "text", "text",
"text", "text", "text", "date", "date",
"date", "date", "numeric", "text",
"text", "text", "text", "text", "text",
"text", "text", "text", "text", "text",
"text", "text", "text", "text", "text",
"text", "text", "numeric", "text",
"text", "text", "text", "text", "text",
"text", "text", "text", "text", "text",
"text", "text", "text", "text", "text",
"text", "text", "text", "numeric",
"text", "text", "text", "text", "date",
"text", "text", "text", "date", "text",
"text", "text", "text", "text", "date",
"numeric","text",
"text", "text", "text", "text", "date",
"date", "date", "text", "text", "text",
"text", "text", "text", "text", "numeric",
"text", "numeric", "text", "numeric",
"text", "text", "text", "numeric",
"text", "numeric", "text", "numeric",
"text", "numeric", "text", "numeric",
"text", "numeric", "text", "numeric",
"text", "numeric", "text", "numeric",
"text", "numeric", "text", "text",
"numeric", "text", "text"));
# Replace spaces in column names with dots/points to handle columns more easily
colnames(df) <- str_replace_all(string=colnames(df), pattern=" ", replace=".");
# Load the chemo and biologic classification by Hui-Li & Koen
chemo.class <- read_excel("G:/Files/mCRC Targeting Melbourne/Chemotherapy Classification 20181123.xlsx");
biolo.class <- read_excel("G:/Files/mCRC Targeting Melbourne/Biological Classification 20181123.xlsx");
Before all models can be fitted, the data need to be cleaned and the appropriate patient population is to be selected. To this end, several steps are performed:
In the first step of the data cleaning, the patient selection is performed. All data required for this selection are recoded and cleaned in this step as well. Patients included are those …
# Inclusion is based on whether the first line of palliative treatment included
# doublet chemotherapy with or without bevacizumab. Patients for whom treatment
# intent was indicated as curative or potentially curative, are excluded.
# Only patients with recorded first-line chemotherapy treatment are considered for inclusion
df <- df %>%
filter(!is.na(REGIMEN.1L)) %>%
filter(TREATMENTINTENT=="Palliative only")
# Recode first-line treatment information for inclusion
df <- df %>%
mutate(L1.Regimen = REGIMEN.1L) %>%
mutate(L1.Biologic = BIOLOGIC.1L) %>%
mutate(L1.Start.Date = as.character(START.DATE.1L))
# In the TRACC registry, all regimen in a treatment line need to be recoded first.
# This has been manually done in an Excel file, which has been loaded in the global
# environment as object 'chemo.class' and will now be used to recode the regimen.
# The 'chemo.class' object contains a column 'Recorded.Regimen' with all uncoded
# treatments and a columns 'Classification.G0' with the recoded treatments. The
# column 'L1.Chemo.Eligible' is added to indicate whether a patient received doublet
# chemotherapy and, hence, is eligible for inclusion.
df <- df %>%
left_join(select(chemo.class, Recorded.Regimen, Classification.G0), by=c("L1.Regimen" = "Recorded.Regimen")) %>%
rename(L1.Chemo.Class = Classification.G0) %>%
mutate(L1.Chemo.Class = ifelse(is.na(L1.Chemo.Class), "None", L1.Chemo.Class)) %>%
mutate(L1.Chemo.Eligible = ifelse(L1.Chemo.Class=="Doublet", T, F))
# Since only patients who had a doublet chemotherapy with or without bevacizumab,
# so not with another additional agent like cetuximab, are include, it need to be
# checked whether patients had bevacizumab yes or no, and whether they are
# eligible based on any other biologicals. This is done based on the first
# reported biological
df <- df %>%
left_join(select(biolo.class, Recorded.Biologicals, Classification.G0), by=c("L1.Biologic" = "Recorded.Biologicals")) %>%
rename(L1.Biologic.Class=Classification.G0) %>%
mutate(L1.Biologic.Class = ifelse(is.na(L1.Biologic.Class), "None", L1.Biologic.Class)) %>%
mutate(L1.Biologic.Eligible = ifelse(L1.Biologic.Class %in% c("None", "Bevacizumab"), T, F)) %>%
mutate(L1.Bevacizumab = ifelse(L1.Biologic.Class=="Bevacizumab", T, F))
# Final selection of patients is performed based on whether they had ...
# 1) doublet chemotherapy with or without bevacizumab and no other biological
# 2) treatment start date after 2009-01-01
# 3) treatment start date before 2017-06-01
df<- df %>%
filter(L1.Chemo.Eligible) %>%
filter(L1.Biologic.Eligible) %>%
filter(L1.Start.Date >= as.Date("2009-01-01")) %>%
filter(L1.Start.Date < as.Date("2017-06-01"))
# Allocate patients to the appropriate treatment arm
df <- df %>%
mutate(L1.Arm.ChemBev = ifelse(L1.Bevacizumab, T, F))
The second step in cleaning the data is making sure all variables are recoded in relevant categories and all observations are categorized in one of these defined categories. The following variables are extracted from the data:
df <- df %>%
# Age in years at treatment start
mutate(Age = floor(as.numeric(difftime(L1.Start.Date, DOB, units="weeks")/52))) %>%
# Gender is female or not
mutate(Gender.Female = ifelse(GENDER=="F", T, F)) %>%
# Hospital type is private or not
mutate(Hospital.Private = ifelse(TLOCATIONTYPE=="Private", T, F)) %>%
# Hospital is located in Victoria or not
mutate(Hospital.Victoria = ifelse(STATE=="VIC", T, F)) %>%
# Stage of disease at initial diagnosis
mutate(Stage.Diagnosis = ifelse(STAGE.AT.DIAGNOSIS=="Unknown", NA, NA))%>%
mutate(Stage.Diagnosis = ifelse(STAGE.AT.DIAGNOSIS=="Locally advanced (rectal only)", 0, Stage.Diagnosis)) %>%
mutate(Stage.Diagnosis = ifelse(STAGE.AT.DIAGNOSIS=="Stage I", 1, Stage.Diagnosis)) %>%
mutate(Stage.Diagnosis = ifelse(STAGE.AT.DIAGNOSIS=="Stage II", 2, Stage.Diagnosis)) %>%
mutate(Stage.Diagnosis = ifelse(STAGE.AT.DIAGNOSIS=="Stage III", 3, Stage.Diagnosis)) %>%
mutate(Stage.Diagnosis = ifelse(STAGE.AT.DIAGNOSIS=="Stage IV", 4, Stage.Diagnosis)) %>%
# Primary tumour site
mutate(Primary.Site = ifelse(PRIMARYSITE=="Occult", NA, NA)) %>%
mutate(Primary.Site = ifelse(PRIMARYSITE=="Colon", "Colon", Primary.Site)) %>%
mutate(Primary.Site = ifelse(PRIMARYSITE=="Colon NOS", "Colon", Primary.Site)) %>%
mutate(Primary.Site = ifelse(PRIMARYSITE=="Left colon", "Left", Primary.Site)) %>%
mutate(Primary.Site = ifelse(PRIMARYSITE=="Right colon", "Right", Primary.Site)) %>%
mutate(Primary.Site = ifelse(PRIMARYSITE=="Rectum", "Rectum", Primary.Site)) %>%
mutate(Primary.Site = ifelse(PRIMARYSITE=="Multiple primaries", "Colon", Primary.Site)) %>%
# Specific metastatic sites
mutate(Metastases.Liver = ifelse(is.na(LIVER.METS), F, T)) %>%
mutate(Metastases.Peritoneum = ifelse(is.na(PERITONEUM.OMENTUM.METS), F, T)) %>%
# Number of metastatic sites
mutate(Metastases.Number = NUMBER.OF.METS) %>%
# ECOG Performance Status
mutate(L1.ECOG = as.numeric(str_sub(ECOG.First.Line, 1, 1))) %>%
# Whether the patient has comorbidities or not
mutate(Comorbidities = ifelse(NO.COMORBIDITIES=="Yes", FALSE, TRUE)) %>%
# Recode the comorbidities
mutate(MYOCARDIAL.INFARCTION = ifelse(is.na(MYOCARDIAL.INFARCTION) | MYOCARDIAL.INFARCTION=="No", 0, 1)) %>%
mutate(CONGESTIVE.HEART.FAILURE = ifelse(is.na(CONGESTIVE.HEART.FAILURE) | CONGESTIVE.HEART.FAILURE=="No", 0, 1)) %>%
mutate(PERIPHERAL.VASCULAR.DISEASE = ifelse(is.na(PERIPHERAL.VASCULAR.DISEASE) | PERIPHERAL.VASCULAR.DISEASE=="No", 0, 1)) %>%
mutate(CEREBROVASCULAR.DISEASE = ifelse(is.na(CEREBROVASCULAR.DISEASE) | CEREBROVASCULAR.DISEASE=="No", 0, 1)) %>%
mutate(DEMENTIA = ifelse(is.na(DEMENTIA) | DEMENTIA=="No", 0, 1)) %>%
mutate(COPD = ifelse(is.na(COPD) | COPD=="No", 0, 1)) %>%
mutate(CONNECTIVE.TISSUE.DISEASE = ifelse(is.na(CONNECTIVE.TISSUE.DISEASE) | CONNECTIVE.TISSUE.DISEASE=="No", 0, 1)) %>%
mutate(PEPTIC.ULCER.DISEASE = ifelse(is.na(PEPTIC.ULCER.DISEASE) | PEPTIC.ULCER.DISEASE=="No", 0, 1)) %>%
mutate(MILD.LIVER.DISEASE = ifelse(is.na(MILD.LIVER.DISEASE) | MILD.LIVER.DISEASE=="No", 0, 1)) %>%
mutate(DIABETES.NO.COMPS = ifelse(is.na(DIABETES.NO.COMPS) | DIABETES.NO.COMPS=="No", 0, 1)) %>%
mutate(DIABETES.COMPS = ifelse(is.na(DIABETES.COMPS) | DIABETES.COMPS=="No", 0, 1)) %>%
mutate(HEMIPLEGIA.STROKE = ifelse(is.na(HEMIPLEGIA.STROKE) | HEMIPLEGIA.STROKE=="No", 0, 1)) %>%
mutate(RENAL.DISEASE.MOD.SEVERE = ifelse(is.na(RENAL.DISEASE.MOD.SEVERE) | RENAL.DISEASE.MOD.SEVERE=="No", 0, 1)) %>%
mutate(SECOND.TUMOUR.NO.METS = ifelse(is.na(SECOND.TUMOUR.NO.METS) | SECOND.TUMOUR.NO.METS=="No", 0, 1)) %>%
mutate(LYMPHOMA.MYELOMA = ifelse(is.na(LYMPHOMA.MYELOMA) | LYMPHOMA.MYELOMA=="No", 0, 1)) %>%
mutate(LEUKAEMIA = ifelse(is.na(LEUKAEMIA) | LEUKAEMIA=="No", 0, 1)) %>%
mutate(LIVER.DIS.MOD.SEVERE = ifelse(is.na(LIVER.DIS.MOD.SEVERE) | LIVER.DIS.MOD.SEVERE=="No", 0, 1)) %>%
mutate(IMMUNE.DEFICIENCY = 0) %>%
mutate(SECOND.MET.SOLID.TUMOUR = ifelse(is.na(SECOND.MET.SOLID.TUMOUR) | SECOND.MET.SOLID.TUMOUR=="No", 0, 1)) %>%
# Charlson comorbidity score (NOT CORRECTED FOR AGE!!!)
mutate(Charlson =
(1 * MYOCARDIAL.INFARCTION) +
(1 * CONGESTIVE.HEART.FAILURE) +
(1 * PERIPHERAL.VASCULAR.DISEASE) +
(1 * CEREBROVASCULAR.DISEASE) +
(1 * DEMENTIA) +
(1 * COPD) +
(1 * CONNECTIVE.TISSUE.DISEASE) +
(1 * PEPTIC.ULCER.DISEASE) +
(1 * MILD.LIVER.DISEASE) +
(1 * DIABETES.NO.COMPS) +
(2 * DIABETES.COMPS) +
(2 * HEMIPLEGIA.STROKE) +
(2 * RENAL.DISEASE.MOD.SEVERE) +
(2 * SECOND.TUMOUR.NO.METS) +
(2 * LYMPHOMA.MYELOMA) +
(2 * LEUKAEMIA) +
(3 * LIVER.DIS.MOD.SEVERE) +
(6 * IMMUNE.DEFICIENCY) +
(6 * SECOND.MET.SOLID.TUMOUR)
) %>%
# CEA Level
mutate(CEA.Level = CEALEVEL) %>%
# RAS status
mutate(RAS.Mutated = ifelse(RAS.STATUS=="Pending", NA, NA)) %>%
mutate(RAS.Mutated = ifelse(RAS.STATUS=="Unknown", NA, RAS.Mutated)) %>%
mutate(RAS.Mutated = ifelse(RAS.STATUS=="Mutated", T, RAS.Mutated)) %>%
mutate(RAS.Mutated = ifelse(RAS.STATUS=="Wild-type", F, RAS.Mutated)) %>%
# BRAF status
mutate(BRAF.Mutated = ifelse(BRAFSTATUS=="Pending", NA, NA)) %>%
mutate(BRAF.Mutated = ifelse(BRAFSTATUS=="Unknown", NA, BRAF.Mutated)) %>%
mutate(BRAF.Mutated = ifelse(BRAFSTATUS=="Mutated", T, BRAF.Mutated)) %>%
mutate(BRAF.Mutated = ifelse(BRAFSTATUS=="Wild-type", F, BRAF.Mutated)) %>%
# Treatment intent
mutate(Treatment.Intent = ifelse(TREATMENTINTENT=="Curative resection occurred", "Curative", NA)) %>%
mutate(Treatment.Intent = ifelse(TREATMENTINTENT=="Curative resection planned", "Curative", Treatment.Intent)) %>%
mutate(Treatment.Intent = ifelse(TREATMENTINTENT=="Potentially curative resection (if good response to treatment)", "Potentially", Treatment.Intent)) %>%
mutate(Treatment.Intent = ifelse(TREATMENTINTENT=="Palliative only", "Palliative", Treatment.Intent)) %>%
# Treatment start
mutate(Treatment.Start = format(as.Date(L1.Start.Date), "%Y"))
Clinical outcomes for the first line of treatment are extrated from the data in this third step of the data cleaning. In the manucript, only progression-free survival is considered as clinical outcome for first-line treatment. First-line progression is defined as physician-defined progression (so a patient will be able to receive further treatment or follow up) or death.
# First, the treatment stop date, date of progression, and date of death need to be extraced from the data
df<- df %>%
mutate(L1.Stop.Date = as.character(STOP.DATE.1L)) %>%
mutate(L1.Progression.Date = as.character(PROGRESSION.DATE.First.Line)) %>%
mutate(Death.Date = as.character(DATEOFDEATH))
# The progression-free survival (PFS) and overall survival (OS) are calculated using the
# treatment start date as origin
df <- df %>%
mutate(L1.PFS.Days = as.numeric(difftime(L1.Progression.Date, L1.Start.Date, units="days"))) %>%
mutate(L1.OS.Days = as.numeric(difftime(Death.Date, L1.Start.Date, units="days"))) %>%
mutate(L1.CENS.Days = as.numeric(difftime(L1.Stop.Date, L1.Start.Date, units="days"))) %>%
mutate(L1.CENS.Days = ifelse(is.na(L1.CENS.Days), as.numeric(difftime(DATEOFREVIEW, L1.Start.Date, units="days")), L1.CENS.Days))
# Determine which of the events occured to the patient with regard to first line progression:
# 0) Censored: when neither progression or death is recorded
# 1) Progression: when progression is recorded
# 2) Death: when progression is not recorded but death is
df <- df %>%
mutate(L1.PFS.Events = ifelse(!is.na(L1.OS.Days) & (L1.OS.Days<=L1.PFS.Days | is.na(L1.PFS.Days)), "Death", NA)) %>%
mutate(L1.PFS.Events = ifelse(!is.na(L1.PFS.Days) & (L1.PFS.Days<L1.OS.Days | is.na(L1.OS.Days)), "Progression", L1.PFS.Events)) %>%
mutate(L1.PFS.Events = ifelse(is.na(L1.PFS.Days) & is.na(L1.OS.Days), "Censored", L1.PFS.Events))
# Determine whether an event occured or the patient was censored and the correspoding time
df <- df %>%
mutate(L1.PFS.Event = ifelse(L1.PFS.Events=="Censored", 0, 1)) %>%
mutate(L1.PFS.Time = ifelse(L1.PFS.Events=="Progression", L1.PFS.Days, NA)) %>%
mutate(L1.PFS.Time = ifelse(L1.PFS.Events=="Death", L1.OS.Days, L1.PFS.Time)) %>%
mutate(L1.PFS.Time = ifelse(L1.PFS.Events=="Censored", L1.CENS.Days, L1.PFS.Time))
# Since a negative PFS or OS is not possible, patients with a non-positive time-to-event
# are excluded from the analysis, as their data is assumed to be incorrect
df <- df %>%
mutate(L1.PFS.Include = ifelse(!is.na(L1.PFS.Time) & L1.PFS.Time > 0, T, F));
With regard to the clinical outcomes of treatment lines or follow up after first-line progression, only overall survival is considered. Overall survival is extracted from the data in this fourth step of the data cleaning.
# Time till death after progression
df <- df %>%
mutate(LX.OS.Days = as.numeric(difftime(DATEOFDEATH, L1.Progression.Date, units="days"))) %>%
mutate(LX.CENS.Days = as.numeric(difftime(DATEOFREVIEW, L1.Progression.Date, units="days")))
# Determine whether the patient was censored:
# 0) Censored: when death is not recorded
# 1) Death: when death is recorded
df <- df %>%
mutate(LX.OS.Events = ifelse(!is.na(LX.OS.Days), "Death", NA)) %>%
mutate(LX.OS.Events = ifelse(is.na(LX.OS.Days) & !is.na(LX.CENS.Days), "Censored", LX.OS.Events)) %>%
mutate(LX.OS.Events = ifelse(L1.PFS.Events=="Progression", LX.OS.Events, NA))
# Determine whether death occured or the patient was censored and the correspoding time
df <- df %>%
mutate(LX.OS.Event = ifelse(LX.OS.Events=="Censored", 0, 1)) %>%
mutate(LX.OS.Time = ifelse(LX.OS.Events=="Death", LX.OS.Days, NA)) %>%
mutate(LX.OS.Time = ifelse(LX.OS.Events=="Censored", LX.CENS.Days, LX.OS.Time))
# Since a negative PFS or OS is not possible, patients with a non-positive time-to-event
# are excluded from the analysis, as their data is assumed to be incorrect
df <- df %>%
mutate(LX.OS.Include = ifelse(L1.PFS.Include & !is.na(LX.OS.Time) & LX.OS.Time > 0, T, F))
Patients with a negative progression-free survival or overall survival are excluded from the sample, as it is assumed that these are data entry errors. Finally, for the included patients, an overview table of their characteristics is generated.
# Only include patients who at least are included for first-line treatment
df <- df %>%
filter(L1.PFS.Include)
# Only the new colomns need to be maintained
df <- df %>%
select(L1.Regimen:LX.OS.Include)
# General function to get an overview table of the patient characteristics
getTable <- function(data, column.class=NULL, columns.names=NULL, columns.discrete=NULL, chisq.simulate=T, chisq.nsim=10000, export=F, export.file=NULL) {
if(is.null(columns.names)) {
columns.names <- c("Age", "Gender.Female", "Hospital.Private", "Stage.Diagnosis", "Primary.Site",
"Metastases.Number", "Metastases.Liver", "Metastases.Peritoneum", "L1.ECOG",
"Charlson", "CEA.Level", "RAS.Mutated", "BRAF.Mutated");
columns.discrete <- c(F, T, T, T, T, T, T, T, T, T, F, T, T);
}
if(is.null(column.class)) {
column.class <- "L1.Arm.ChemBev";
}
if(length(columns.names)!=length(columns.discrete)) stop("OBJECTS 'columns.names' AND 'columns.discrete' SHOULD HAVE EQUAL LENGTHS!!!");
classification <- unlist(data[, column.class]);
if(length(unique(classification))!=2) stop("NEEDS A CLASSIFICATION THAT RESULTS IN TWO CLASSES!!!")
out.discrete <- matrix(c(table(classification), NA, NA, NA), nrow=1);
rownames(out.discrete) <- "n";
colnames(out.discrete) <- c(rep(names(table(classification)),2 ), "p-value");
for(column in columns.names) {
variable <- unlist(data[, column]);
if(columns.discrete[which(columns.names==column)[1]]) {
variable[is.na(variable)] <- "Missings";
counts <- table(variable, classification);
rownames(counts) <- str_c(column, "-", rownames(counts));
percentages <- t(t(counts) / out.discrete[1, 1:2])*100;
chisq <- chisq.test(counts, simulate.p.value = chisq.simulate, B = chisq.nsim)
chisq.p <- c(chisq$p.value, rep(NA, nrow(counts)-1));
out.discrete <- rbind(out.discrete, cbind(counts, percentages, chisq.p));
} else {
means <- tapply(variable, classification, mean, na.rm=T);
sd <- tapply(variable, classification, sd, na.rm=T);
NA.counts <- tapply(is.na(variable), classification, sum);
NA.percentages <- tapply(is.na(variable), classification, mean)*100;
ttest <- t.test(x = variable[classification==unique(classification[1])],
y = variable[classification!=unique(classification[1])],
alteranative = "two-sided", paired = F);
out <- rbind(c(means, sd, ttest$p.value),
c(NA.counts, NA.percentages, NA));
rownames(out) <- str_c(column, c("-MeanSD", "-Missings"));
out.discrete <- rbind(out.discrete, out);
}
}
colnames(out.discrete)[1:2] <- str_c(str_c(column.class, "==", colnames(out.discrete)[1:2]), " (#/Mean)");
colnames(out.discrete)[3:4] <- str_c(str_c(column.class, "==", colnames(out.discrete)[3:4]), " (%/SD)");
if(export) write.table(out.discrete, file=export.file, sep=";");
return(out.discrete)
}
# Export Characteristics Tables
Table1.L1.PFS <- getTable(data=filter(df, L1.PFS.Include), export=F);
Table1.LX.OS <- getTable(data=filter(df, LX.OS.Include), export=F);
L1.Arm.ChemBev==FALSE (#/Mean) | L1.Arm.ChemBev==TRUE (#/Mean) | L1.Arm.ChemBev==FALSE (%/SD) | L1.Arm.ChemBev==TRUE (%/SD) | p-value | |
---|---|---|---|---|---|
n | 191.00000 | 676.00000 | NA | NA | NA |
Age-MeanSD | 63.17801 | 62.54882 | 13.9234652 | 12.145134 | 0.5714513 |
Age-Missings | 0.00000 | 0.00000 | 0.0000000 | 0.000000 | NA |
Gender.Female-FALSE | 118.00000 | 407.00000 | 61.7801047 | 60.207101 | 0.7425257 |
Gender.Female-TRUE | 73.00000 | 269.00000 | 38.2198953 | 39.792899 | NA |
Hospital.Private-FALSE | 115.00000 | 337.00000 | 60.2094241 | 49.852071 | 0.0121988 |
Hospital.Private-TRUE | 76.00000 | 339.00000 | 39.7905759 | 50.147929 | NA |
Stage.Diagnosis-0 | 8.00000 | 35.00000 | 4.1884817 | 5.177515 | 0.2915708 |
Stage.Diagnosis-1 | 0.00000 | 11.00000 | 0.0000000 | 1.627219 | NA |
Stage.Diagnosis-2 | 10.00000 | 35.00000 | 5.2356021 | 5.177515 | NA |
Stage.Diagnosis-3 | 21.00000 | 95.00000 | 10.9947644 | 14.053254 | NA |
Stage.Diagnosis-4 | 151.00000 | 491.00000 | 79.0575916 | 72.633136 | NA |
Stage.Diagnosis-Missings | 1.00000 | 9.00000 | 0.5235602 | 1.331361 | NA |
Primary.Site-Colon | 13.00000 | 26.00000 | 6.8062827 | 3.846154 | 0.3540646 |
Primary.Site-Left | 73.00000 | 261.00000 | 38.2198953 | 38.609468 | NA |
Primary.Site-Missings | 3.00000 | 11.00000 | 1.5706806 | 1.627219 | NA |
Primary.Site-Rectum | 51.00000 | 165.00000 | 26.7015707 | 24.408284 | NA |
Primary.Site-Right | 51.00000 | 213.00000 | 26.7015707 | 31.508876 | NA |
Metastases.Number-1 | 91.00000 | 256.00000 | 47.6439791 | 37.869822 | 0.1606839 |
Metastases.Number-2 | 55.00000 | 254.00000 | 28.7958115 | 37.573965 | NA |
Metastases.Number-3 | 33.00000 | 123.00000 | 17.2774869 | 18.195266 | NA |
Metastases.Number-4 | 9.00000 | 36.00000 | 4.7120419 | 5.325444 | NA |
Metastases.Number-5 | 3.00000 | 6.00000 | 1.5706806 | 0.887574 | NA |
Metastases.Number-6 | 0.00000 | 1.00000 | 0.0000000 | 0.147929 | NA |
Metastases.Liver-FALSE | 64.00000 | 211.00000 | 33.5078534 | 31.213018 | 0.5971403 |
Metastases.Liver-TRUE | 127.00000 | 465.00000 | 66.4921466 | 68.786982 | NA |
Metastases.Peritoneum-FALSE | 138.00000 | 491.00000 | 72.2513089 | 72.633136 | 0.9280072 |
Metastases.Peritoneum-TRUE | 53.00000 | 185.00000 | 27.7486911 | 27.366864 | NA |
L1.ECOG-0 | 51.00000 | 284.00000 | 26.7015707 | 42.011834 | 0.0001000 |
L1.ECOG-1 | 95.00000 | 344.00000 | 49.7382199 | 50.887574 | NA |
L1.ECOG-2 | 34.00000 | 43.00000 | 17.8010471 | 6.360947 | NA |
L1.ECOG-3 | 11.00000 | 5.00000 | 5.7591623 | 0.739645 | NA |
Charlson-0 | 127.00000 | 487.00000 | 66.4921466 | 72.041420 | 0.0014999 |
Charlson-1 | 27.00000 | 125.00000 | 14.1361257 | 18.491124 | NA |
Charlson-2 | 22.00000 | 41.00000 | 11.5183246 | 6.065089 | NA |
Charlson-3 | 7.00000 | 18.00000 | 3.6649215 | 2.662722 | NA |
Charlson-4 | 2.00000 | 2.00000 | 1.0471204 | 0.295858 | NA |
Charlson-5 | 3.00000 | 1.00000 | 1.5706806 | 0.147929 | NA |
Charlson-6 | 1.00000 | 1.00000 | 0.5235602 | 0.147929 | NA |
Charlson-7 | 0.00000 | 1.00000 | 0.0000000 | 0.147929 | NA |
Charlson-8 | 2.00000 | 0.00000 | 1.0471204 | 0.000000 | NA |
CEA.Level-MeanSD | 725.91907 | 478.48598 | 5254.2478968 | 2048.838683 | 0.5459979 |
CEA.Level-Missings | 19.00000 | 61.00000 | 9.9476440 | 9.023669 | NA |
RAS.Mutated-FALSE | 72.00000 | 317.00000 | 37.6963351 | 46.893491 | 0.0001000 |
RAS.Mutated-Missings | 63.00000 | 112.00000 | 32.9842932 | 16.568047 | NA |
RAS.Mutated-TRUE | 56.00000 | 247.00000 | 29.3193717 | 36.538461 | NA |
BRAF.Mutated-FALSE | 52.00000 | 266.00000 | 27.2251309 | 39.349112 | 0.0085991 |
BRAF.Mutated-Missings | 127.00000 | 369.00000 | 66.4921466 | 54.585799 | NA |
BRAF.Mutated-TRUE | 12.00000 | 41.00000 | 6.2827225 | 6.065089 | NA |
L1.Arm.ChemBev==FALSE (#/Mean) | L1.Arm.ChemBev==TRUE (#/Mean) | L1.Arm.ChemBev==FALSE (%/SD) | L1.Arm.ChemBev==TRUE (%/SD) | p-value | |
---|---|---|---|---|---|
n | 156.00000 | 562.00000 | NA | NA | NA |
Age-MeanSD | 62.87179 | 62.32028 | 14.0789103 | 12.2110342 | 0.6567444 |
Age-Missings | 0.00000 | 0.00000 | 0.0000000 | 0.0000000 | NA |
Gender.Female-FALSE | 96.00000 | 341.00000 | 61.5384615 | 60.6761566 | 0.8505149 |
Gender.Female-TRUE | 60.00000 | 221.00000 | 38.4615385 | 39.3238434 | NA |
Hospital.Private-FALSE | 93.00000 | 272.00000 | 59.6153846 | 48.3985765 | 0.0153985 |
Hospital.Private-TRUE | 63.00000 | 290.00000 | 40.3846154 | 51.6014235 | NA |
Stage.Diagnosis-0 | 4.00000 | 28.00000 | 2.5641026 | 4.9822064 | 0.2659734 |
Stage.Diagnosis-1 | 0.00000 | 8.00000 | 0.0000000 | 1.4234875 | NA |
Stage.Diagnosis-2 | 9.00000 | 28.00000 | 5.7692308 | 4.9822064 | NA |
Stage.Diagnosis-3 | 18.00000 | 82.00000 | 11.5384615 | 14.5907473 | NA |
Stage.Diagnosis-4 | 124.00000 | 407.00000 | 79.4871795 | 72.4199288 | NA |
Stage.Diagnosis-Missings | 1.00000 | 9.00000 | 0.6410256 | 1.6014235 | NA |
Primary.Site-Colon | 10.00000 | 18.00000 | 6.4102564 | 3.2028470 | 0.2894711 |
Primary.Site-Left | 61.00000 | 216.00000 | 39.1025641 | 38.4341637 | NA |
Primary.Site-Missings | 1.00000 | 11.00000 | 0.6410256 | 1.9572954 | NA |
Primary.Site-Rectum | 38.00000 | 133.00000 | 24.3589744 | 23.6654804 | NA |
Primary.Site-Right | 46.00000 | 184.00000 | 29.4871795 | 32.7402135 | NA |
Metastases.Number-1 | 70.00000 | 218.00000 | 44.8717949 | 38.7900356 | 0.6216378 |
Metastases.Number-2 | 50.00000 | 214.00000 | 32.0512821 | 38.0782918 | NA |
Metastases.Number-3 | 28.00000 | 95.00000 | 17.9487179 | 16.9039146 | NA |
Metastases.Number-4 | 7.00000 | 31.00000 | 4.4871795 | 5.5160142 | NA |
Metastases.Number-5 | 1.00000 | 4.00000 | 0.6410256 | 0.7117438 | NA |
Metastases.Liver-FALSE | 53.00000 | 172.00000 | 33.9743590 | 30.6049822 | 0.4386561 |
Metastases.Liver-TRUE | 103.00000 | 390.00000 | 66.0256410 | 69.3950178 | NA |
Metastases.Peritoneum-FALSE | 109.00000 | 402.00000 | 69.8717949 | 71.5302491 | 0.6921308 |
Metastases.Peritoneum-TRUE | 47.00000 | 160.00000 | 30.1282051 | 28.4697509 | NA |
L1.ECOG-0 | 39.00000 | 230.00000 | 25.0000000 | 40.9252669 | 0.0001000 |
L1.ECOG-1 | 83.00000 | 291.00000 | 53.2051282 | 51.7793594 | NA |
L1.ECOG-2 | 28.00000 | 37.00000 | 17.9487179 | 6.5836299 | NA |
L1.ECOG-3 | 6.00000 | 4.00000 | 3.8461538 | 0.7117438 | NA |
Charlson-0 | 106.00000 | 406.00000 | 67.9487179 | 72.2419929 | 0.0057994 |
Charlson-1 | 22.00000 | 106.00000 | 14.1025641 | 18.8612100 | NA |
Charlson-2 | 16.00000 | 32.00000 | 10.2564103 | 5.6939502 | NA |
Charlson-3 | 5.00000 | 14.00000 | 3.2051282 | 2.4911032 | NA |
Charlson-4 | 2.00000 | 1.00000 | 1.2820513 | 0.1779359 | NA |
Charlson-5 | 2.00000 | 1.00000 | 1.2820513 | 0.1779359 | NA |
Charlson-6 | 1.00000 | 1.00000 | 0.6410256 | 0.1779359 | NA |
Charlson-7 | 0.00000 | 1.00000 | 0.0000000 | 0.1779359 | NA |
Charlson-8 | 2.00000 | 0.00000 | 1.2820513 | 0.0000000 | NA |
CEA.Level-MeanSD | 844.29064 | 488.46878 | 5796.9883585 | 2031.4895195 | 0.4746236 |
CEA.Level-Missings | 15.00000 | 52.00000 | 9.6153846 | 9.2526690 | NA |
RAS.Mutated-FALSE | 63.00000 | 269.00000 | 40.3846154 | 47.8647687 | 0.0001000 |
RAS.Mutated-Missings | 50.00000 | 75.00000 | 32.0512821 | 13.3451957 | NA |
RAS.Mutated-TRUE | 43.00000 | 218.00000 | 27.5641026 | 38.7900356 | NA |
BRAF.Mutated-FALSE | 41.00000 | 217.00000 | 26.2820513 | 38.6120996 | 0.0146985 |
BRAF.Mutated-Missings | 105.00000 | 309.00000 | 67.3076923 | 54.9822064 | NA |
BRAF.Mutated-TRUE | 10.00000 | 36.00000 | 6.4102564 | 6.4056940 | NA |
After cleaning the data in Step 1, the missing data are addressed in Step 2. Two steps are performed in order to do so:
As explained in the manuscript, missing data for the RAS mutation status and BRAF mutation status may be partly addressed by assuming these mutations to be mutatual exclusive. For example, this indicates that if a RAS mutation is known for a patient with a missing BRAF status, the BRAF status can be assumed to be ‘wild-type’.
# Address missing values in Mutations
# Potentially 175 missing values can be addressed:
# - 1 for RAS
# - 142 for BRAF
table(list(RAS=df$RAS.Mutated, BRAF=df$BRAF.Mutated), useNA="always");
## BRAF
## RAS FALSE TRUE <NA>
## FALSE 158 51 180
## TRUE 160 1 142
## <NA> 0 1 174
# If patients have a KRAS mutation and the RAS status is unknown,
# we assume that they will not have a RAS mutation. Similarly, if
# the BRAF status is unknown and the patient has a RAS mutation,
# we assume that the patient does not have a BRAF mutation.
df <- df %>%
mutate(RAS.Mutated = ifelse(is.na(RAS.Mutated) & BRAF.Mutated, F, RAS.Mutated)) %>%
mutate(BRAF.Mutated = ifelse(is.na(BRAF.Mutated) & RAS.Mutated, F, BRAF.Mutated))
# Oberve new distribution of mutations and missing values
table(list(RAS=df$RAS.Mutated, BRAF=df$BRAF.Mutated), useNA="always");
## BRAF
## RAS FALSE TRUE <NA>
## FALSE 158 52 180
## TRUE 302 1 0
## <NA> 0 0 174
After addressing part of the missing values for the RAS mutation status and BRAF mutation status, remaining missing data for these and the other variables are addressed by performing multiple imputation by chained equations. Missing values are addressed for the following variables (imputation method used):
The following variables were additionally used as predictors:
After performing the multiple imputation, post-processing is performed while extracting the imputed datasets to make sure RAS mutation status and BRAF mutation status remain mutually exclusive.
# Load required package
library(mice);
# List the included columns
mice.cols.included <- c("Age", "Gender.Female", "Hospital.Private", "Metastases.Liver", "Metastases.Peritoneum", "Metastases.Number", "L1.ECOG", "Charlson", "Primary.Site", "Stage.Diagnosis", "CEA.Level", "RAS.Mutated", "BRAF.Mutated");
# Select the data used for the multiple imputation
df.mice <- df %>%
select(mice.cols.included)
# Make sure the type of all columns is correct
df.mice <- df.mice %>%
mutate(Age = as.numeric(Age)) %>%
mutate(Gender.Female = as.logical(Gender.Female)) %>%
mutate(Hospital.Private = as.logical(Hospital.Private)) %>%
mutate(Stage.Diagnosis = as.factor(Stage.Diagnosis)) %>%
mutate(Primary.Site = as.factor(Primary.Site)) %>%
mutate(Metastases.Liver = as.logical(Metastases.Liver)) %>%
mutate(Metastases.Peritoneum = as.logical(Metastases.Peritoneum)) %>%
mutate(Metastases.Number = as.numeric(Metastases.Number)) %>%
mutate(Charlson = as.numeric(Charlson)) %>%
mutate(CEA.Level = as.numeric(CEA.Level)) %>%
mutate(RAS.Mutated = as.logical(RAS.Mutated)) %>%
mutate(BRAF.Mutated = as.logical(BRAF.Mutated)) %>%
mutate(L1.ECOG = as.factor(L1.ECOG))
# Define predictor matrix
predictor.matrix <- matrix(1, length(mice.cols.included), length(mice.cols.included)) - diag(1, length(mice.cols.included), length(mice.cols.included));
colnames(predictor.matrix) <- rownames(predictor.matrix) <- mice.cols.included;
# Perform multiple imputation
m <- 25; # Number of imputations
set.seed(20181211); # Random seed for reproduction
mice.out <- mice(data=df.mice, m=m, maxit=20, predictorMatrix=predictor.matrix, printFlag=F,
method=c(Age="",
Gender.Female="",
Hospital.Private="",
Metastases.Liver="",
Metastases.Peritoneum="",
Metastases.Number="",
L1.ECOG="",
Charlson="",
Primary.Site="polyreg",
Stage.Diagnosis="polyreg",
CEA.Level="norm",
RAS.Mutated="logreg",
BRAF.Mutated="logreg")
);
# Extract the imputed datasets
count.BRAF.and.RAS <- 0; # Counter to track how much post-processing is required
datasets <- lapply(1:m, function(i.dataset) {
# Get the imputed columns
df.imputed <- complete(mice.out, action=i.dataset);
cols.imputed <- colnames(df.imputed);
# Replace the colums in the original dataset by the imputed columns
df.new <- df;
df.new[, cols.imputed] <- df.imputed;
# Correct when RAS and BRAF mutations conflict
count.BRAF.and.RAS <<- count.BRAF.and.RAS + sum(df.new$BRAF.Mutated & df.new$RAS.Mutated);
df.new$BRAF.Mutated <- ifelse(df.new$RAS.Mutated==1 & df.new$BRAF.Mutated==1, 0, df.new$BRAF.Mutated);
# Return the newly completed datasets
df.new;
});
# Check number of post-processed mutations
prob.BRAF.and.RAS <- (count.BRAF.and.RAS/m) / nrow(datasets[[1]]) * 100;
prob.BRAF.and.RAS;
## [1] 0.3460208
Initial analyses were performed to determine which distribution types would be used to parameterize the survival models and which transformations would be used for each variable in the multivariable modeling. To this end, the following steps were performed as part of this univariable modeling:
The following transformations were considered:
In modeling first-line progression-free survival, there are two competing events that need to be accounted for: physician-defined progression and death. To determine which modeling approach will be implemented to account for these two competing events, the overlap of the corresponding time-to-event distributions is important to know. It was decided that competing events will be handled by using the ‘Unimodal Distribution and Regression model (UDR)’ approach based on the strategy of selecting the time-to-event first based on a joint time-to-event distribution, and then select the event corresponding to that time.
# function for calculating the overlap between the distribution for time-to-progression and time-to-death
getOverlap <- function(data, n.bins=512, digits=3) {
library(flux);
# Get events
events <- sort(unique(data$event));
n.events <- length(events);
# Get event-specific times
times <- lapply(events, function(e) subset(data, event==e)$time );
# Density settings
min <- min(data$time);
max <- max(data$time);
n <- n.bins;
x <- seq(from=min, to=max, length.out=n);
# Get event-specific densities
y <- sapply(1:n.events, function(i1) density(times[[i1]], from=min, to=max, n=n)$y );
# Get event=specific density overlap
yo <- sapply(1:n.events, function(i2) {
sapply(1:n, function(j){
own <- y[j,i2];
other <- max(y[j,which(c(1:n.events)!=i2)]);
if(own<=0) {
return(0)
} else
return(min(own,other))
})
});
# Calculate event-specific overlap
o <- sapply(1:n.events, function(i3) {
(auc(x=x, y=yo[,i3]) / auc(x=x, y=y[,i3])) *100
});
return(round(mean(o), digits=digits));
}
# Calculate the distribution overlap
getOverlap(data=data.frame(time = df$L1.PFS.Time[df$L1.PFS.Events!="Censored"], event = df$L1.PFS.Events[df$L1.PFS.Events!="Censored"]))
## [1] 72.457
# Visualize distribution overlap
plot(density(df$L1.PFS.Time[df$L1.PFS.Events!="Censored"]), xlab="Time-to-Event in Days", ylab=NA, main="Density Plot", las=1, lwd=2, xlim=c(0,2500), ylim=c(0,0.0025))
lines(density(df$L1.PFS.Time[df$L1.PFS.Events=="Progression"]), lwd=2, col=3)
lines(density(df$L1.PFS.Time[df$L1.PFS.Events=="Death"]), lwd=2, col=2)
lines(density(df$L1.PFS.Time[df$L1.PFS.Events=="Censored"]), lwd=2, col=4)
legend("topright", lty=1, lwd=2, col=c(1,3,2, 4), c(str_c("Combined [excl. censored patients](n=", sum(df$L1.PFS.Events!="Censored"),")"), str_c("Progression (n=", sum(df$L1.PFS.Events=="Progression"),")"), str_c("Death (n=", sum(df$L1.PFS.Events=="Death"),")"), str_c("Censored (n = ", sum(df$L1.PFS.Events=="Censored"),")")))
A distribution type is selected for survival Model L1-Time that represents the two competing events applicable to first-line progression-free survival (physcian-defined progression and death) in one unimodal survival model. Selection was based on the likelihood and assessment of quantile-quantile plots and Kaplan-Meier plots. The following distributions were considered:
# Load packages
library(survival); # Fit Survival Models
library(EnvStats); # Q-Q Plot for censored data
# Distributions considered
dist.considered <- c("exponential",
"weibull",
"logistic",
"lognormal");
# Data for first-line
df.L1 <- df;
# Select distribution
dist.performance.L1 <- sapply(dist.considered, function(dist) {
# Fit survival model for distribution
model <- survreg(formula=Surv(time=L1.PFS.Time, event=L1.PFS.Event) ~ 1, data=df.L1, dist=dist);
# Return the likelihood
return(model$loglik[1]);
});
# Print the results and which distribution would be selected based on the likelihood
dist.selected.L1 <- dist.considered[which.max(dist.performance.L1)];
print(dist.performance.L1);
## exponential weibull logistic lognormal
## -5522.443 -5483.372 -5646.513 -5512.667
print(dist.selected.L1);
## [1] "weibull"
# Plot the survival curves
for(i in 1:length(dist.considered)) {
# Get distribution type
dist <- dist.considered[i];
# Fit survival model for distribution
model <- survreg(formula=Surv(time=L1.PFS.Time, event=L1.PFS.Event) ~ 1, data=df.L1, dist=dist);
# If it is the first distribution considered, plot kaplan-meier
if(i==1) plot(survfit(Surv(time=L1.PFS.Time, event=L1.PFS.Event) ~ 1, data=df.L1), conf.int=F, xlab="Time to Death or Progression (days)", lwd=2);
# Plot fitted distribution
lines(y=seq(.99, .01, by=-.01), x=predict(model, type="quantile", p=seq(.01, .99, by=.01))[1,], col=rainbow(length(dist.considered))[i], lwd=2)
# Add the legend if it is the last distribution
if(i==length(dist.considered)) legend("right", legend=dist.considered, col=rainbow(length(dist.considered)), lty=1, lwd=2);
}
# Plot the Q-Q Plots curves
par(mfrow=c(2,2));
for(i in 1:length(dist.considered)) {
# Get distribution type
dist <- dist.considered[i];
# Fit survival model for distribution
model <- survreg(formula=Surv(time=L1.PFS.Time, event=L1.PFS.Event) ~ 1, data=df.L1, dist=dist);
# Plot the Q-Q Plot
if(dist=="exponential") qqPlotCensored(x=df.L1$L1.PFS.Time, censored=df.L1$L1.PFS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="exp", param.list=list(rate=1/exp(model$coefficients)));
if(dist=="weibull") qqPlotCensored(x=df.L1$L1.PFS.Time, censored=df.L1$L1.PFS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="weibull", param.list=list(shape=1/model$scale, scale=exp(model$coefficients)));
if(dist=="logistic") qqPlotCensored(x=df.L1$L1.PFS.Time, censored=df.L1$L1.PFS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="logis", param.list=list(location=model$coefficients, scale=model$scale));
if(dist=="lognormal") qqPlotCensored(x=df.L1$L1.PFS.Time, censored=df.L1$L1.PFS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="lnorm", param.list=list(meanlog=model$coefficients, sdlog=model$scale));
}
A distribution type is selected for survival Model LX-Time that represents overall survival after first-line progression. Selection was based on the likelihood and assessment of quantile-quantile plots and Kaplan-Meier plots. As for Model L1-Time, the following distributions were considered:
# Data for first-line
df.LX <- filter(df, LX.OS.Include);
# Select distribution
dist.performance.LX <- sapply(dist.considered, function(dist) {
# Fit survival model for distribution
model <- survreg(formula=Surv(time=LX.OS.Time, event=LX.OS.Event) ~ 1, data=df.LX, dist=dist);
# Return the likelihood
return(model$loglik[1]);
});
# Print the results and which distribution would be selected based on the likelihood
dist.selected.LX <- dist.considered[which.max(dist.performance.LX)];
print(dist.performance.LX);
## exponential weibull logistic lognormal
## -4207.651 -4207.648 -4430.506 -4245.090
print(dist.selected.LX);
## [1] "weibull"
# Plot the survival curves
for(i in 1:length(dist.considered)) {
# Get distribution type
dist <- dist.considered[i];
# Fit survival model for distribution
model <- survreg(formula=Surv(time=LX.OS.Time, event=LX.OS.Event) ~ 1, data=df.LX, dist=dist);
# If it is the first distribution considered, plot kaplan-meier
if(i==1) plot(survfit(Surv(time=LX.OS.Time, event=LX.OS.Event) ~ 1, data=df.LX), conf.int=F, xlab="Overall Survival (days)", lwd=2);
# Plot fitted distribution
lines(y=seq(.99, .01, by=-.01), x=predict(model, type="quantile", p=seq(.01, .99, by=.01))[1,], col=rainbow(length(dist.considered))[i], lwd=2)
# Add the legend if it is the last distribution
if(i==length(dist.considered)) legend("right", legend=dist.considered, col=rainbow(length(dist.considered)), lty=1, lwd=2);
}
# Plot the Q-Q Plots curves
par(mfrow=c(2,2));
for(i in 1:length(dist.considered)) {
# Get distribution type
dist <- dist.considered[i];
# Fit survival model for distribution
model <- survreg(formula=Surv(time=LX.OS.Time, event=LX.OS.Event) ~ 1, data=df.LX, dist=dist);
# Plot the Q-Q Plot
if(dist=="exponential") qqPlotCensored(x=df.LX$LX.OS.Time, censored=df.LX$LX.OS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="exp", param.list=list(rate=1/exp(model$coefficients)));
if(dist=="weibull") qqPlotCensored(x=df.LX$LX.OS.Time, censored=df.LX$LX.OS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="weibull", param.list=list(shape=1/model$scale, scale=exp(model$coefficients)));
if(dist=="logistic") qqPlotCensored(x=df.LX$LX.OS.Time, censored=df.LX$LX.OS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="logis", param.list=list(location=model$coefficients, scale=model$scale));
if(dist=="lognormal") qqPlotCensored(x=df.LX$LX.OS.Time, censored=df.LX$LX.OS.Event==0, censoring.side="right",
equal.axes=T, main=paste0("Q-Q Plot for the ", dist, " distribution"),
distribution="lnorm", param.list=list(meanlog=model$coefficients, sdlog=model$scale));
}
The best-performing transformation was selected for each variable and survival model separately by fitting individual models for each candidate transformation and comparing performance based on Wald test p-values. Univariable models were also fitted for variables without multiple candidate transformations to record p-values. All p-values were calculated for each of 25 imputed datasets, and for variables with multiple candidate transformations, transformations’ p-value ranks were determined for each imputed dataset. Final selection of the transformation to be used was based on the average transformation rank over all 25 imputed datasets.
# Load package
library(aod); # Wald-test
# Candidate variables, including their importance and potential transformations
vars.information.L1 <- list(
# Age at treatment start
Age = list(Transformations = c(Log = "log(Age)", SQRT = "sqrt(Age)", Poly1 = "poly(Age, 1)", Poly2 = "poly(Age, 2)", Poly3 = "poly(Age, 3)", Poly4 = "poly(Age, 4)")),
# Gender
Gender.Female = list(Transformations = c(Binary.FemaleYN = "Gender.Female")),
# Hospital Type
Hospital.Private = list(Transformations = c(Binary.PrivateYN = "Hospital.Private")),
# Stage at Initial Diagnosis
Stage.Diagnosis = list(Transformations = c(Binary.Stage4YN = "as.factor(as.character(Stage.Diagnosis)=='4')")),
# Primary Tumour Site
Primary.Site = list(Transformations = c(Factor.GroupColMul = "as.factor(ifelse(Primary.Site %in% c('Colon','Multiple'), 'ColonMultiple', Primary.Site))", Factor.GroupColMul.GroupLefRec = "as.factor(ifelse(Primary.Site %in% c('Colon','Multiple'), 'ColonMultiple', ifelse(Primary.Site %in% c('Left','Rectum'), 'LeftRectum', Primary.Site)))")),
# Number of Metastases
Metastases.Number = list(Transformations = c(Factor.Group3plus = "as.factor(ifelse(Metastases.Number>=3, 3, Metastases.Number))")),
# Liver Metastases
Metastases.Liver = list(Transformations = c(Binary = "Metastases.Liver")),
# Peritoneum Metastases
Metastases.Peritoneum = list(Transformations = c(Binary = "Metastases.Peritoneum")),
# ECOG Performance Status
L1.ECOG = list(Transformations = c(Factor.Group23 = "as.factor(ifelse(L1.ECOG==0, 0, ifelse(L1.ECOG==1, 1, 2)))", Binary.2plusYN = "as.factor(ifelse(L1.ECOG %in% c(0,1), F, T))")),
# Charlson Score
Charlson = list(Transformations = c(Factor.Group2plus = "as.factor(ifelse(Charlson>=2, 2, Charlson))", Factor.Group01.Group2plus = "as.factor(ifelse(Charlson %in% c(0,1), 1, ifelse(Charlson>=2, 2, Charlson)))", Binary.1plusYN = "as.factor(ifelse(Charlson>=1, T, F))")),
# CEA Level
CEA.Level = list(Transformations = c(Binary.200plus = "as.factor(ifelse(Metastases.Number<=2, T, F) * ifelse(CEA.Level>=200, T, F))")),
# RAS Status
RAS.Mutated = list(Transformations = c(Binary.MutatedYN = "RAS.Mutated")),
# BRAF Status
BRAF.Mutated = list(Transformations = c(Binary.MutatedYN = "BRAF.Mutated"))
);
# Set three levels of significance
significance.levels <- c(0.1, 0.05, 0.01);
# For each imputed dataset, select distribution and perform univariable analysis
univariable.L1 <- lapply(1:length(datasets), function(dataset) {
# Retrieve datasets
data.surv.L1 <- datasets[[dataset]];
# For each variable
vars.univariable <- lapply(vars.information.L1, function(var) {
# Consider all potential transformations
sapply(var$Transformations, function(var.form) {
# Define formula to be fitted
f <- as.formula(str_c("Surv(time=L1.PFS.Time, event=L1.PFS.Event) ~ ", var.form, collapse=""));
# Fit model
fit <- survreg(formula=f, data=data.surv.L1, dist=dist.selected.L1);
# Perform Wald-test for significance
test <- wald.test(Sigma=fit$var, b=c(fit$coefficients, fit$scale), Terms=c(2:length(fit$coefficients)))
# Determine significance
pvalue <- test$result$chi2["P"];
significance <- "";
if(pvalue<max(significance.levels)) {
significance <- str_c(rep("*", tail(which(pvalue<significance.levels), n=1)), collapse="");
}
# Return transformation and model performance
c(trans=f, ll=fit$loglik[2], pvalue=pvalue, sign=significance)
})
});
# Return outcomes
return(list(vars.univariable=vars.univariable));
})
# Add summary for variables to vars.information object
for(var in names(vars.information.L1)) {
# Get names of transformations
transformation.names <- names(vars.information.L1[[var]]$Transformations);
# Extract data from univariable object
loglik <- as.data.frame(
sapply(transformation.names, function(trans)
sapply(univariable.L1, function(univar)
univar$vars.univariable[[var]][2,trans]
)
)
)
pvalues <- as.data.frame(
sapply(transformation.names, function(trans)
sapply(univariable.L1, function(univar)
univar$vars.univariable[[var]][3,trans]
)
)
)
significance <- as.data.frame(
sapply(transformation.names, function(trans)
sapply(univariable.L1, function(univar)
univar$vars.univariable[[var]][4,trans]
)
)
)
# Transform List object to Matrix
pvalues <- matrix(unlist(pvalues), ncol=ncol(pvalues), byrow=F)
colnames(pvalues) <- transformation.names;
# Rank based on pvalues
ranks <- matrix(apply(pvalues, 1, rank, ties.method="average"), ncol=ncol(pvalues), byrow=T);
colnames(ranks) <- colnames(pvalues);
avgrank <- apply(ranks, 2, mean);
# Select best performing transformation
selected <- names(avgrank)[which.min(avgrank)];
selected.transformation <- vars.information.L1[[var]]$Transformations[selected];
selected.avgpvalue <- mean(pvalues[, selected]);
selected.significance <- "";
if(selected.avgpvalue<max(significance.levels)){
selected.significance <- str_c(rep("*", tail(which(selected.avgpvalue<significance.levels), n=1)), collapse="");
}
# Save to vars.information object
vars.information.L1[[var]]$LogLik <- loglik;
vars.information.L1[[var]]$Pvalues <- pvalues;
vars.information.L1[[var]]$Significance <- significance;
vars.information.L1[[var]]$Ranks <- ranks;
vars.information.L1[[var]]$AvgRank <- avgrank;
vars.information.L1[[var]]$SelectedTransformation <- selected.transformation;
vars.information.L1[[var]]$SelectedPvalue <- selected.avgpvalue;
vars.information.L1[[var]]$SelectedSignificance <- selected.significance;
}
# Generate summary Table
vars.univariable.L1 <- t(sapply(vars.information.L1, function(var) c(
trans = names(var$SelectedTransformation),
pvalue = var$SelectedPvalue,
sign = var$SelectedSignificance
)));
# View results
kable(vars.univariable.L1);
trans | pvalue | sign | |
---|---|---|---|
Age | Poly2 | 0.00543870740945696 | *** |
Gender.Female | Binary.FemaleYN | 0.0413078627543801 | ** |
Hospital.Private | Binary.PrivateYN | 0.132221927655926 | |
Stage.Diagnosis | Binary.Stage4YN | 0.0180820041944674 | ** |
Primary.Site | Factor.GroupColMul.GroupLefRec | 0.00371107490900566 | *** |
Metastases.Number | Factor.Group3plus | 0.40180927502595 | |
Metastases.Liver | Binary | 0.0548355494244619 | * |
Metastases.Peritoneum | Binary | 0.108781784235471 | |
L1.ECOG | Factor.Group23 | 8.5890183854076e-12 | *** |
Charlson | Binary.1plusYN | 0.164886584017105 | |
CEA.Level | Binary.200plus | 0.00551021390960015 | *** |
RAS.Mutated | Binary.MutatedYN | 0.208343809880177 | |
BRAF.Mutated | Binary.MutatedYN | 0.152022282845375 |
To distinguish between the two competing events of physisician-defined progression death for first-line progression-free survival, a logistic regression model is needed (Model L1-Event). For this model, all best-performing transformations indentified for corresponding Model L1-Time were used, so only the best transformation for the additional variable relating to the time-to-event needs to be selected.
# Define a binary variable 'Death' to represent whether first-line progression refered to Death (1), Physician-defined progression (0), or the patient was censored (NA)
Death <- ifelse(datasets[[1]]$L1.PFS.Events=="Death", 1, ifelse(datasets[[1]]$L1.PFS.Events=="Progression", 0, NA));
# Fit the different transformations
summary(glm(Death ~ 1))
##
## Call:
## glm(formula = Death ~ 1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.1023 -0.1023 -0.1023 -0.1023 0.8977
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.10234 0.01065 9.61 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.09198216)
##
## Null deviance: 74.506 on 810 degrees of freedom
## Residual deviance: 74.506 on 810 degrees of freedom
## (56 observations deleted due to missingness)
## AIC: 369.34
##
## Number of Fisher Scoring iterations: 2
summary(glm(Death ~ sqrt(datasets[[1]]$L1.PFS.Time)))
##
## Call:
## glm(formula = Death ~ sqrt(datasets[[1]]$L1.PFS.Time))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.21793 -0.11542 -0.09612 -0.07337 0.95884
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.029326 0.027825 1.054 0.29222
## sqrt(datasets[[1]]$L1.PFS.Time) 0.004473 0.001576 2.838 0.00465 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.09118778)
##
## Null deviance: 74.506 on 810 degrees of freedom
## Residual deviance: 73.771 on 809 degrees of freedom
## (56 observations deleted due to missingness)
## AIC: 363.31
##
## Number of Fisher Scoring iterations: 2
summary(glm(Death ~ log(datasets[[1]]$L1.PFS.Time)))
##
## Call:
## glm(formula = Death ~ log(datasets[[1]]$L1.PFS.Time))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.12443 -0.10784 -0.10247 -0.09366 0.93415
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.04526 0.06323 0.716 0.474
## log(datasets[[1]]$L1.PFS.Time) 0.01058 0.01155 0.916 0.360
##
## (Dispersion parameter for gaussian family taken to be 0.09200045)
##
## Null deviance: 74.506 on 810 degrees of freedom
## Residual deviance: 74.428 on 809 degrees of freedom
## (56 observations deleted due to missingness)
## AIC: 370.5
##
## Number of Fisher Scoring iterations: 2
summary(glm(Death ~ poly(datasets[[1]]$L1.PFS.Time, 1)))
##
## Call:
## glm(formula = Death ~ poly(datasets[[1]]$L1.PFS.Time, 1))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.35633 -0.11251 -0.08696 -0.06513 0.95045
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.10236 0.01054 9.710 < 2e-16
## poly(datasets[[1]]$L1.PFS.Time, 1) 1.31589 0.31269 4.208 2.86e-05
##
## (Intercept) ***
## poly(datasets[[1]]$L1.PFS.Time, 1) ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.09012303)
##
## Null deviance: 74.506 on 810 degrees of freedom
## Residual deviance: 72.910 on 809 degrees of freedom
## (56 observations deleted due to missingness)
## AIC: 353.78
##
## Number of Fisher Scoring iterations: 2
summary(glm(Death ~ poly(datasets[[1]]$L1.PFS.Time, 2)))
##
## Call:
## glm(formula = Death ~ poly(datasets[[1]]$L1.PFS.Time, 2))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.62677 -0.09385 -0.08514 -0.08335 0.91693
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1026 0.0105 9.764 < 2e-16
## poly(datasets[[1]]$L1.PFS.Time, 2)1 1.3072 0.3115 4.196 3.02e-05
## poly(datasets[[1]]$L1.PFS.Time, 2)2 0.8118 0.3048 2.664 0.00789
##
## (Intercept) ***
## poly(datasets[[1]]$L1.PFS.Time, 2)1 ***
## poly(datasets[[1]]$L1.PFS.Time, 2)2 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.08944915)
##
## Null deviance: 74.506 on 810 degrees of freedom
## Residual deviance: 72.275 on 808 degrees of freedom
## (56 observations deleted due to missingness)
## AIC: 348.69
##
## Number of Fisher Scoring iterations: 2
summary(glm(Death ~ poly(datasets[[1]]$L1.PFS.Time, 3)))
##
## Call:
## glm(formula = Death ~ poly(datasets[[1]]$L1.PFS.Time, 3))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.48879 -0.10220 -0.07718 -0.06743 0.93404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.10303 0.01045 9.856 < 2e-16
## poly(datasets[[1]]$L1.PFS.Time, 3)1 1.33858 0.31025 4.315 1.8e-05
## poly(datasets[[1]]$L1.PFS.Time, 3)2 0.83827 0.30347 2.762 0.00587
## poly(datasets[[1]]$L1.PFS.Time, 3)3 -0.90747 0.30755 -2.951 0.00326
##
## (Intercept) ***
## poly(datasets[[1]]$L1.PFS.Time, 3)1 ***
## poly(datasets[[1]]$L1.PFS.Time, 3)2 **
## poly(datasets[[1]]$L1.PFS.Time, 3)3 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.08860409)
##
## Null deviance: 74.506 on 810 degrees of freedom
## Residual deviance: 71.504 on 807 degrees of freedom
## (56 observations deleted due to missingness)
## AIC: 341.99
##
## Number of Fisher Scoring iterations: 2
Selection of the best-performing transformation for Model LX-Time was done in the same way as for Model L1-Tim. The only difference is that an additional variable representing the first-line progression-free survival was included.
# Candidate variables, including their importance and potential transformations
vars.information.LX <- list(
# Progression-free survival
PFS = list(Transformations = c(Log = "log(L1.PFS.Time)", SQRT = "sqrt(L1.PFS.Time)", Poly1 = "poly(L1.PFS.Time, 1)", Poly2 = "poly(L1.PFS.Time, 2)", Poly3 = "poly(L1.PFS.Time, 3)")),
# Age at treatment start
Age = list(Transformations = c(Log = "log(Age)", SQRT = "sqrt(Age)", Poly1 = "poly(Age, 1)", Poly2 = "poly(Age, 2)", Poly3 = "poly(Age, 3)")),
# Gender
Gender.Female = list(Transformations = c(Binary.FemaleYN = "Gender.Female")),
# Hospital Type
Hospital.Private = list(Transformations = c(Binary.PrivateYN = "Hospital.Private")),
# Stage at Initial Diagnosis
Stage.Diagnosis = list(Transformations = c(Binary.Stage4YN = "as.factor(as.character(Stage.Diagnosis)=='4')")),
# Primary Tumour Site
Primary.Site = list(Transformations = c(Factor.GroupColMul = "as.factor(ifelse(Primary.Site %in% c('Colon','Multiple'), 'ColonMultiple', Primary.Site))", Factor.GroupColMul.GroupLefRec = "as.factor(ifelse(Primary.Site %in% c('Colon','Multiple'), 'ColonMultiple', ifelse(Primary.Site %in% c('Left','Rectum'), 'LeftRectum', Primary.Site)))")),
# Number of Metastases
Metastases.Number = list(Transformations = c(Factor.Group3plus = "as.factor(ifelse(Metastases.Number>=3, 3, Metastases.Number))")),
# Liver Metastases
Metastases.Liver = list(Transformations = c(Binary = "Metastases.Liver")),
# Peritoneum Metastases
Metastases.Peritoneum = list(Transformations = c(Binary = "Metastases.Peritoneum")),
# ECOG Performance Status
L1.ECOG = list(Transformations = c(Factor.Group23 = "as.factor(ifelse(L1.ECOG==0, 0, ifelse(L1.ECOG==1, 1, 2)))", Binary.2plusYN = "as.factor(ifelse(L1.ECOG %in% c(0,1), F, T))")),
# Charlson Score
Charlson = list(Transformations = c(Factor.Group2plus = "as.factor(ifelse(Charlson>=2, 2, Charlson))", Factor.Group01.Group2plus = "as.factor(ifelse(Charlson %in% c(0,1), 1, ifelse(Charlson>=2, 2, Charlson)))", Binary.1plusYN = "as.factor(ifelse(Charlson>=1, T, F))")),
# CEA Level
CEA.Level = list(Transformations = c(Binary.200plus = "as.factor(ifelse(Metastases.Number<=2, T, F) * ifelse(CEA.Level>=200, T, F))")),
# RAS Status
RAS.Mutated = list(Transformations = c(Binary.MutatedYN = "RAS.Mutated")),
# BRAF Status
BRAF.Mutated = list(Transformations = c(Binary.MutatedYN = "BRAF.Mutated"))
);
# For each imputed dataset, select distribution and perform univariable analysis
univariable.LX <- lapply(1:length(datasets), function(dataset) {
# Retrieve datasets
data.surv <- datasets[[dataset]];
data.surv.LX <- filter(data.surv, LX.OS.Include);
# For each variable
vars.univariable <- lapply(vars.information.LX, function(var) {
# Consider all potential transformations
sapply(var$Transformations, function(var.form) {
# Define formula to be fitted
f <- as.formula(str_c("Surv(time=LX.OS.Time, event=LX.OS.Event) ~ ", var.form, collapse=""));
# Fit model
fit <- survreg(formula=f, data=data.surv.LX, dist=dist.selected.LX);
# Perform Wald-test for significance
test <- wald.test(Sigma=fit$var, b=c(fit$coefficients, fit$scale), Terms=c(2:length(fit$coefficients)))
# Determine significance
pvalue <- test$result$chi2["P"];
significance <- "";
if(pvalue<max(significance.levels)) {
significance <- str_c(rep("*", tail(which(pvalue<significance.levels), n=1)), collapse="");
}
# Return transformation and model performance
c(trans=f, ll=fit$loglik[2], pvalue=pvalue, sign=significance)
})
});
# Return outcomes
return(list(vars.univariable=vars.univariable));
})
# Add summary for variables to vars.information object
for(var in names(vars.information.LX)) {
# Get names of transformations
transformation.names <- names(vars.information.LX[[var]]$Transformations);
# Extract data from univariable object
loglik <- as.data.frame(
sapply(transformation.names, function(trans)
sapply(univariable.LX, function(univar)
univar$vars.univariable[[var]][2,trans]
)
)
)
pvalues <- as.data.frame(
sapply(transformation.names, function(trans)
sapply(univariable.LX, function(univar)
univar$vars.univariable[[var]][3,trans]
)
)
)
significance <- as.data.frame(
sapply(transformation.names, function(trans)
sapply(univariable.LX, function(univar)
univar$vars.univariable[[var]][4,trans]
)
)
)
# Transform List object to Matrix
pvalues <- matrix(unlist(pvalues), ncol=ncol(pvalues), byrow=F)
colnames(pvalues) <- transformation.names;
# Rank based on pvalues
ranks <- matrix(apply(pvalues, 1, rank, ties.method="average"), ncol=ncol(pvalues), byrow=T);
colnames(ranks) <- colnames(pvalues);
avgrank <- apply(ranks, 2, mean);
# Select best performing transformation
selected <- names(avgrank)[which.min(avgrank)];
selected.transformation <- vars.information.LX[[var]]$Transformations[selected];
selected.avgpvalue <- mean(pvalues[, selected]);
selected.significance <- "";
if(selected.avgpvalue<max(significance.levels)){
selected.significance <- str_c(rep("*", tail(which(selected.avgpvalue<significance.levels), n=1)), collapse="");
}
# Save to vars.information object
vars.information.LX[[var]]$LogLik <- loglik;
vars.information.LX[[var]]$Pvalues <- pvalues;
vars.information.LX[[var]]$Significance <- significance;
vars.information.LX[[var]]$Ranks <- ranks;
vars.information.LX[[var]]$AvgRank <- avgrank;
vars.information.LX[[var]]$SelectedTransformation <- selected.transformation;
vars.information.LX[[var]]$SelectedPvalue <- selected.avgpvalue;
vars.information.LX[[var]]$SelectedSignificance <- selected.significance;
}
# Generate summary Table
vars.univariable.LX <- t(sapply(vars.information.LX, function(var) c(
trans = names(var$SelectedTransformation),
pvalue = var$SelectedPvalue,
sign = var$SelectedSignificance
)));
# View results
kable(vars.univariable.LX);
trans | pvalue | sign | |
---|---|---|---|
PFS | Log | 0 | *** |
Age | Poly2 | 1.47682807410954e-05 | *** |
Gender.Female | Binary.FemaleYN | 0.0293104401172259 | ** |
Hospital.Private | Binary.PrivateYN | 0.00854917648739073 | *** |
Stage.Diagnosis | Binary.Stage4YN | 0.464603901073672 | |
Primary.Site | Factor.GroupColMul.GroupLefRec | 0.0240492505778438 | ** |
Metastases.Number | Factor.Group3plus | 0.678845101916153 | |
Metastases.Liver | Binary | 0.719065221171095 | |
Metastases.Peritoneum | Binary | 0.0120556554881923 | ** |
L1.ECOG | Factor.Group23 | 0 | *** |
Charlson | Binary.1plusYN | 0.912530732728782 | |
CEA.Level | Binary.200plus | 0.267061210740859 | |
RAS.Mutated | Binary.MutatedYN | 0.0110552668651976 | ** |
BRAF.Mutated | Binary.MutatedYN | 0.0330294095990516 | ** |
Multivariable (survival) models are fitted to reflect patient heterogeneity. A variable selection procedure is implemented that considers both clinical relevance and statistical performance. In total, the multivariable modeling exists of three steps:
Several custom functions are defined to perform the variable selection procedure that considers both clinical relevance and statistical performance. Please find an explanation of the functions in the R code below.
# The 'getSurvDataset.L1' and 'getSurvDataset.LX' functions take the cleaned data and transform these in a data.frame
# dat can be used to estimate (survival) models. This is necessary, because the cleaned data needs to be transformed
# according to the selected transformations and treatment interaction terms need to be calculated accordingly
getSurvDataset.L1 <- function(data) {
data <- data %>%
## Single variables
# Event
mutate(Event = L1.PFS.Event) %>%
# Events
mutate(Events = L1.PFS.Events) %>%
# Events.01
mutate(Events.01 = ifelse(Events=="Death", 1, NA)) %>%
mutate(Events.01 = ifelse(Events=="Progression", 0, Events.01)) %>%
# L1.PFS.Time = "Time",
mutate(Time = L1.PFS.Time) %>%
mutate(Time1 = L1.PFS.Time) %>%
mutate(Time2 = L1.PFS.Time^2) %>%
mutate(Time3 = L1.PFS.Time^3) %>%
mutate(Time4 = L1.PFS.Time^4) %>%
# Arm.ChemBev = "ChemBev",
mutate(ChemBev = as.numeric(as.logical(L1.Arm.ChemBev))) %>%
# Age = "Age1 + Age2",
mutate(Age1 = Age) %>%
mutate(Age2 = Age1^2) %>%
# Gender.Female = "Female.YN",
mutate(Female.YN = as.numeric(as.logical(Gender.Female))) %>%
# Hospital.Private = "Private.YN",
mutate(Private.YN = as.numeric(as.logical(Hospital.Private))) %>%
# Stage.Diagnosis = "Stage4.YN",
mutate(Stage4.YN = ifelse(as.character(Stage.Diagnosis)=="4", 1, 0)) %>%
# Primary.Site = "Site.Left + Site.Right + Site.Rectum",
mutate(Site.LeftRectum = ifelse(Primary.Site %in% c("Left","Rectum"), 1, 0)) %>%
mutate(Site.Right = ifelse(Primary.Site=="Right", 1, 0)) %>%
# Metastases.Number = "Metastases.2 + Metastases.3plus",
mutate(Metastases.2 = ifelse(as.numeric(as.character(Metastases.Number))==2, 1, 0)) %>%
mutate(Metastases.3plus = ifelse(as.numeric(as.character(Metastases.Number))>=3, 1, 0)) %>%
# Metastases.Liver = "Liver.YN",
mutate(Liver.YN = as.numeric(as.logical(Metastases.Liver))) %>%
# Metastases.Peritoneum = "Peritoneum.YN",
mutate(Peritoneum.YN = as.numeric(as.logical(Metastases.Peritoneum))) %>%
# L1.ECOG = "ECOG.1 + ECOG.23",
mutate(ECOG.1 = ifelse(as.numeric(as.character(L1.ECOG))==1, 1, 0)) %>%
mutate(ECOG.23 = ifelse(as.numeric(as.character(L1.ECOG))>=2, 1, 0)) %>%
# Charlson = "Charlson.1 + Charlson.2 + Charlson.3plus",
mutate(Charlson.1plus = ifelse(as.numeric(as.character(Charlson))>=1, 1, 0)) %>%
# CEA.Level = "CEA.200plus",
mutate(CEA.200plus = ifelse(as.numeric(as.character(Metastases.Number))<=2, 1, 0) * ifelse(as.numeric(CEA.Level)>=200, 1, 0)) %>%
# RAS.Mutated = "MutantRAS.YN",
mutate(MutantRAS.YN = as.numeric(as.logical(RAS.Mutated))) %>%
# BRAF.Mutated = "MutantBRAF.YN",
mutate(MutantBRAF.YN = as.numeric(as.logical(BRAF.Mutated))) %>%
## Interactions
mutate(Time1.Int = Time1 * ChemBev) %>%
mutate(Time2.Int = Time2 * ChemBev) %>%
mutate(Time3.Int = Time3 * ChemBev) %>%
mutate(Time4.Int = Time4 * ChemBev) %>%
mutate(Age1.Int = Age1 * ChemBev) %>%
mutate(Age2.Int = Age2 * ChemBev) %>%
mutate(Female.YN.Int = Female.YN * ChemBev) %>%
mutate(Private.YN.Int = Private.YN * ChemBev) %>%
mutate(Stage4.YN.Int = Stage4.YN * ChemBev) %>%
mutate(Site.LeftRectum.Int = Site.LeftRectum * ChemBev) %>%
mutate(Site.Right.Int = Site.Right * ChemBev) %>%
mutate(Metastases.2.Int = Metastases.2 * ChemBev) %>%
mutate(Metastases.3plus.Int = Metastases.3plus * ChemBev) %>%
mutate(Liver.YN.Int = Liver.YN * ChemBev) %>%
mutate(Peritoneum.YN.Int = Peritoneum.YN * ChemBev) %>%
mutate(ECOG.1.Int = ECOG.1 * ChemBev) %>%
mutate(ECOG.23.Int = ECOG.23 * ChemBev) %>%
mutate(Charlson.1plus.Int = Charlson.1plus * ChemBev) %>%
mutate(CEA.200plus.Int = CEA.200plus * ChemBev) %>%
mutate(MutantRAS.YN.Int = MutantRAS.YN * ChemBev) %>%
mutate(MutantBRAF.YN.Int = MutantBRAF.YN * ChemBev)
# Only the newly created columns need to be returned
return(select(data, Event:MutantBRAF.YN.Int))
}
getSurvDataset.LX <- function(data) {
data <- data %>%
## Single variables
# L1.OS.Event = Event,
mutate(Event = LX.OS.Event) %>%
# L1.OS.Time = "Time",
mutate(Time = LX.OS.Time) %>%
# L1.PFS.Time = "PFS.Time",
mutate(PFS.Time = log(L1.PFS.Time)) %>%
# Arm.ChemBev = "ChemBev",
mutate(ChemBev = as.numeric(as.logical(L1.Arm.ChemBev))) %>%
# Age = "Age1 + Age2",
mutate(Age1 = as.numeric(Age)-mean(Age)) %>%
mutate(Age2 = as.numeric(Age1^2)) %>%
# Gender.Female = "Female.YN",
mutate(Female.YN = as.numeric(as.logical(Gender.Female))) %>%
# Hospital.Private = "Private.YN",
mutate(Private.YN = as.numeric(as.logical(Hospital.Private))) %>%
# Stage.Diagnosis = "Stage4.YN",
mutate(Stage4.YN = ifelse(as.numeric(Stage.Diagnosis)==4, 1, 0)) %>%
# Primary.Site = "Site.Left + Site.Right + Site.Rectum",
mutate(Site.LeftRectum = ifelse(Primary.Site %in% c("Left","Rectum"), 1, 0)) %>%
mutate(Site.Right = ifelse(Primary.Site=="Right", 1, 0)) %>%
# Metastases.Number = "Metastases.2 + Metastases.3plus",
mutate(Metastases.2 = ifelse(as.numeric(Metastases.Number)==2, 1, 0)) %>%
mutate(Metastases.3plus = ifelse(as.numeric(Metastases.Number)>=3, 1, 0)) %>%
# Metastases.Liver = "Liver.YN",
mutate(Liver.YN = as.numeric(as.logical(Metastases.Liver))) %>%
# Metastases.Peritoneum = "Peritoneum.YN",
mutate(Peritoneum.YN = as.numeric(as.logical(Metastases.Peritoneum))) %>%
# L1.ECOG = "ECOG.1 + ECOG.23",
mutate(ECOG.1 = ifelse(as.numeric(as.character(L1.ECOG))==1, 1, 0)) %>%
mutate(ECOG.23 = ifelse(as.numeric(as.character(L1.ECOG))>=2, 1, 0)) %>%
# Charlson = "Charlson.3plus.YN",
mutate(Charlson.1plus.YN = ifelse(as.numeric(Charlson)>=1, 1, 0)) %>%
# CEA.Level = "CEA.200plus",
mutate(CEA.200plus = ifelse(as.numeric(Metastases.Number)<=2, 1, 0) * ifelse(as.numeric(CEA.Level)>=200, 1, 0)) %>%
# RAS.Mutated = "MutantRAS.YN",
mutate(MutantRAS.YN = as.numeric(as.logical(RAS.Mutated))) %>%
# BRAF.Mutated = "MutantBRAF.YN",
mutate(MutantBRAF.YN = as.numeric(as.logical(BRAF.Mutated))) %>%
## Interactions
mutate(PFS.Time.Int = PFS.Time * ChemBev) %>%
mutate(Age1.Int = Age1 * ChemBev) %>%
mutate(Age2.Int = Age2 * ChemBev) %>%
mutate(Female.YN.Int = Female.YN * ChemBev) %>%
mutate(Private.YN.Int = Private.YN * ChemBev) %>%
mutate(Stage4.YN.Int = Stage4.YN * ChemBev) %>%
mutate(Site.LeftRectum.Int = Site.LeftRectum * ChemBev) %>%
mutate(Site.Right.Int = Site.Right * ChemBev) %>%
mutate(Metastases.2.Int = Metastases.2 * ChemBev) %>%
mutate(Metastases.3plus.Int = Metastases.3plus * ChemBev) %>%
mutate(Liver.YN.Int = Liver.YN * ChemBev) %>%
mutate(Peritoneum.YN.Int = Peritoneum.YN * ChemBev) %>%
mutate(ECOG.1.Int = ECOG.1 * ChemBev) %>%
mutate(ECOG.23.Int = ECOG.23 * ChemBev) %>%
mutate(Charlson.1plus.YN.Int = Charlson.1plus.YN * ChemBev) %>%
mutate(CEA.200plus.Int = CEA.200plus * ChemBev) %>%
mutate(MutantRAS.YN.Int = MutantRAS.YN * ChemBev) %>%
mutate(MutantBRAF.YN.Int = MutantBRAF.YN * ChemBev)
# Only the newly created columns need to be returned
return(select(data, Event:MutantBRAF.YN.Int))
}
# The 'avgSurvivalModel' and 'avgLogisticModel' functions estimate (survival) models based on a number of datasets and
# combine these separate models into one model using Rubin's rule so that a p-value can be calculated for all models
# combined in the variable selection procedure
avgSurvivalModel <- function(formula, datasets, distribution, conf.int=F) {
# Fit models for each dataset
models <- lapply(datasets, function(dataset) {
model <- survreg(formula=as.formula(formula), data=dataset, dist=distribution);
list(coef=c(model$coefficients, 'Log(Scale)'=log(model$scale)), sigma=model$var);
})
# Coefficient Estimates
m <- length(models);
Q <- sapply(models, function(model) model$coef);
Qm <- matrix(rowMeans(Q), nrow=1);
# Within Variance
U <- lapply(models, function(model) model$sigma);
Um <- Reduce('+', U) / m;
# Between variance
B <- lapply(1:m, function(i) {
Q <- matrix(Q[,i], nrow=1);
t((Q-Qm)) %*% (Q-Qm) / (m-1)
});
Bm <- Reduce('+', B);
# Total Variance
Tm <- Um + (1+(1/m))*Bm;
# Save the Total Variance-Covariance matrix
S <- Tm;
# Provide proper format and naming
Qm <- as.numeric(Qm);
names(Qm) <- rownames(Q);
# Confidence intervals for coefficients
if(conf.int) {
# Extract diagonals of variance matrices to get variance vectors
Um <- diag(Um);
Bm <- diag(Bm);
Tm <- diag(Tm);
# Degrees of freedom
rm <- (1+(1/m)) * Bm / Um;
v <- (m-1)*(1+(1/rm))^2;
# Determine 95% confidence interval bounds
lower <- Qm + qt(p=0.025, df=v)*sqrt(Tm);
upper <- Qm + qt(p=0.975, df=v)*sqrt(Tm);
}
if(conf.int) {
return(list(coef=Qm, sigma=S, lower=lower, upper=upper))
} else {
return(list(coef=Qm, sigma=S))
}
}
avgLogisticModel <- function(formula, datasets, conf.int=F) {
# Fit models for each dataset
models <- lapply(datasets, function(dataset) {
model <- glm(formula=as.formula(formula), data=dataset, family=binomial(link="logit"));
list(coef=c(model$coefficients), sigma=vcov(model));
})
# Coefficient Estimates
m <- length(models);
Q <- sapply(models, function(model) model$coef);
Qm <- matrix(rowMeans(Q), nrow=1);
# Within Variance
U <- lapply(models, function(model) model$sigma);
Um <- Reduce('+', U) / m;
# Between variance
B <- lapply(1:m, function(i) {
Q <- matrix(Q[,i], nrow=1);
t((Q-Qm)) %*% (Q-Qm) / (m-1)
});
Bm <- Reduce('+', B);
# Total Variance
Tm <- Um + (1+(1/m))*Bm;
# Save the Total Variance-Covariance matrix
S <- Tm;
# Provide proper format and naming
Qm <- as.numeric(Qm);
names(Qm) <- rownames(Q);
# Confidence intervals for coefficients
if(conf.int) {
# Extract diagonals of variance matrices to get variance vectors
Um <- diag(Um);
Bm <- diag(Bm);
Tm <- diag(Tm);
# Degrees of freedom
rm <- (1+(1/m)) * Bm / Um;
v <- (m-1)*(1+(1/rm))^2;
# Determine 95% confidence interval bounds
lower <- Qm + qt(p=0.025, df=v)*sqrt(Tm);
upper <- Qm + qt(p=0.975, df=v)*sqrt(Tm);
}
if(conf.int) {
return(list(coef=Qm, sigma=S, lower=lower, upper=upper))
} else {
return(list(coef=Qm, sigma=S))
}
}
# The 'survivalBackward', 'survivalForward', and 'logisticForward' functions perform a backward or forward selection
# procedure based on a set of basis variables and a set of candidate variables
survivalBackward <- function(datasets, time, event, vars.include, vars.selection, vars.formulas, pvalue, dist) {
# Extract all used columns from the provided input
columns.used <- sapply(c(vars.include, vars.selection), function(var) {
formula.complete <- vars.formulas[var];
formula.splitted <- str_split(string=formula.complete, pattern=fixed("+"), simplify=T);
formula.nospaces <- str_trim(string=formula.splitted);
});
# Objects to record process
vars.unselected <- c();
continue <- TRUE;
# Backward selection
while(continue) {
# Get formula according to the remaining backward selection candidates
formula.current <- str_c(
str_c(c("Surv(", time, ", ", event, ") ~ "), collapse=""),
str_c(vars.formulas[c(vars.include, vars.selection)], collapse=" + ")
);
# Fit model with backward candidates
model.current <- avgSurvivalModel(formula=formula.current, datasets=datasets, distribution=dist);
# Get pvalue for each variable
coefficient.indexes <- lapply(vars.selection, function(var) which(names(model.current$coef) %in% columns.used[[var]]));
names(coefficient.indexes) <- vars.selection;
pvalues <- sapply(vars.selection, function(var) {
testresult <- wald.test(Sigma = model.current$sigma,
b = model.current$coef,
Terms = coefficient.indexes[[var]]
)$result$chi2["P"];
unname(testresult)
}); # sapply pvalues
# Get highes p value
pvalue.max <- max(pvalues);
# If the highest p value is higher than the threshold, remove this one
if(pvalue.max > pvalue) {
# Get the name of the variable with the highest pvalue
pvalue.index <- which.max(pvalues);
# Temporary list of selection candidates without the selected variable
vars.selection.tmp <- vars.selection[-pvalue.index];
# Store unselected candidates
vars.unselected <- c(vars.unselected, vars.selection[pvalue.index]);
# If no backward selection candidates remain, stop backward selection
if(length(vars.selection.tmp)<1) {
vars.selection <- vars.selection.tmp;
continue <- F;
} else {
# Update backward selection candidates and unselected candidates from temporary vectors
vars.selection <- vars.selection.tmp;
}
} else {
# If no pvalue higher than thershold, stop backward selection
continue <- F;
}
} # while continue
# Return outcomes
return(list(included = c(vars.include, vars.selection),
excluded = vars.unselected))
}
survivalForward <- function(datasets, time, event, vars.include, vars.selection, vars.formulas, pvalue, dist) {
# Extract all used columns from the provided input
columns.used <- sapply(c(vars.include, vars.selection), function(var) {
formula.complete <- vars.formulas[var];
formula.splitted <- str_split(string=formula.complete, pattern=fixed("+"), simplify=T);
formula.nospaces <- str_trim(string=formula.splitted);
});
# Objects to record process
continue <- TRUE;
# Backward selection
while(continue) {
# Get pvalue for each variable
pvalues <- sapply(vars.selection, function(var) {
# Get formula according to the current forward selection candidates
formula.current <- str_c(
str_c(c("Surv(", time, ", ", event, ") ~ "), collapse=""),
str_c(vars.formulas[c(vars.include, var)], collapse=" + ")
);
# Fit model with backward candidates
model.current <- avgSurvivalModel(formula=formula.current, datasets=datasets, distribution=dist);
# Get pvalue for this variable
coefficient.indexes <- which(names(model.current$coef) %in% columns.used[[var]])
test.result <- wald.test(Sigma = model.current$sigma,
b = model.current$coef,
Terms = coefficient.indexes
)$result$chi2["P"]
unname(test.result)
})
# Get lowest p value
pvalue.min <- min(pvalues);
# If the highest p value is lower than the threshold, add this one
if(pvalue.min < pvalue) {
# Get the name of the variable with the lowest pvalue
pvalue.index <- which.min(pvalues);
# Update variable lists
vars.include <- c(vars.include, vars.selection[pvalue.index]);
vars.selection <- vars.selection[-pvalue.index];
# If no forward selection candidates remain, stop backward selection
if(length(vars.selection)<1) {
continue <- F;
}
} else {
# If no pvalue higher than thershold, stop forward selection
continue <- F;
}
} # while !stop
# Return outcomes
return(list(included = vars.include,
excluded = vars.selection))
}
logisticForward <- function(datasets, events, vars.include, vars.selection, vars.formulas, pvalue) {
# Extract all used columns from the provided input
columns.used <- sapply(c(vars.include, vars.selection), function(var) {
formula.complete <- vars.formulas[var];
formula.splitted <- str_split(string=formula.complete, pattern=fixed("+"), simplify=T);
formula.nospaces <- str_trim(string=formula.splitted);
});
# Objects to record process
continue <- TRUE;
# Backward selection
while(continue) {
# Get pvalue for each variable
pvalues <- sapply(vars.selection, function(var) {
# Get formula according to the current forward selection candidates
formula.current <- str_c(
str_c(c(events, " ~ "), collapse=""),
str_c(vars.formulas[c(vars.include, var)], collapse=" + ")
);
# Fit model with backward candidates
model.current <- avgLogisticModel(formula=formula.current, datasets=datasets);
# Get pvalue for this variable
coefficient.indexes <- which(names(model.current$coef) %in% columns.used[[var]])
test.result <- wald.test(Sigma = model.current$sigma,
b = model.current$coef,
Terms = coefficient.indexes
)$result$chi2["P"]
unname(test.result)
})
# Get lowest p value
pvalue.min <- min(pvalues);
# If the highest p value is lower than the threshold, add this one
if(pvalue.min < pvalue) {
# Get the name of the variable with the lowest pvalue
pvalue.index <- which.min(pvalues);
# Update variable lists
vars.include <- c(vars.include, vars.selection[pvalue.index]);
vars.selection <- vars.selection[-pvalue.index];
# If no forward selection candidates remain, stop backward selection
if(length(vars.selection)<1) {
continue <- F;
}
} else {
# If no pvalue higher than thershold, stop forward selection
continue <- F;
}
} # while !stop
# Return outcomes
return(list(included = vars.include,
excluded = vars.selection))
}
# The 'fitLogisticModel' and 'fitSurvivalModel' functions are high-level functions that call the different selection
# procedures and return the final model based on the selected set of variables
fitLogisticModel <- function(datasets, events, pvalue, vars.include, vars.selection, vars.formulas) {
result.forward <- logisticForward(datasets = datasets, events = events, vars.include = vars.include, vars.selection = vars.selection, vars.formulas = vars.formulas, pvalue = pvalue);
formula.final <- str_c(
str_c(c(events, " ~ "), collapse=""),
str_c(vars.formulas[result.forward$included], collapse=" + ")
);
model.final <- avgLogisticModel(formula = formula.final, datasets = datasets, conf.int = T)
return(list(coef = model.final$coef, sigma = model.final$sigma, lower = model.final$lower, upper = model.final$upper, included = result.forward$included))
}
fitSurvivalModel <- function(datasets, time, event, dist, vars.include, vars.backward, vars.forward, vars.formulas, pvalue.backward, pvalue.forward, pvalue.interaction) {
result.backward <- survivalBackward(datasets = datasets, time = time, event = event, dist = dist, vars.include = vars.include, vars.selection = vars.backward, vars.formulas = vars.formulas, pvalue = pvalue.backward)
result.forward <- survivalForward(datasets = datasets, time = time, event = event, dist = dist, vars.include = result.backward$included, vars.selection = c(vars.forward, result.backward$excluded), vars.formulas = vars.formulas, pvalue = pvalue.forward)
vars.include.interaction <- result.forward$included;
vars.backward.interaction <- str_c(vars.include.interaction, ".Int");
vars.backward.interaction <- vars.backward.interaction[-which(vars.backward.interaction=="Arm.ChemBev.Int")];
result.interaction <- survivalBackward(datasets = datasets, time = time, event = event, dist = dist, vars.include = vars.include.interaction, vars.selection = vars.backward.interaction, vars.formulas = vars.formulas, pvalue = pvalue.interaction)
formula.final <- str_c(
str_c(c("Surv(", time, ", ", event, ") ~ "), collapse=""),
str_c(vars.formulas[result.interaction$included], collapse=" + ")
);
model.final <- avgSurvivalModel(formula = formula.final, datasets = datasets, distribution = "weibull", conf.int = T)
return(list(coef = model.final$coef, sigma = model.final$sigma, lower = model.final$lower, upper = model.final$upper, included = result.interaction$included))
}
The selection procedures require specific input, which is defined here:
# List object that contains an overview of all variables and the columns in the data that
# they use for first-line progression-free survival (L1)
vars.formulas.L1 <- list(
# Single variables
L1.PFS.Time = "Time1 + Time2",
Arm.ChemBev = "ChemBev",
Age = "Age1 + Age2",
Gender.Female = "Female.YN",
Hospital.Private = "Private.YN",
Stage.Diagnosis = "Stage4.YN",
Primary.Site = "Site.LeftRectum + Site.Right",
Metastases.Number = "Metastases.2 + Metastases.3plus",
Metastases.Liver = "Liver.YN",
Metastases.Peritoneum = "Peritoneum.YN",
L1.ECOG = "ECOG.1 + ECOG.23",
Charlson = "Charlson.1plus",
CEA.Level = "CEA.200plus",
RAS.Mutated = "MutantRAS.YN",
BRAF.Mutated = "MutantBRAF.YN",
# Interactions with treatment
L1.PFS.Time.Int = "Time1.Int + Time2.Int",
Age.Int = "Age1.Int + Age2.Int",
Gender.Female.Int = "Female.YN.Int",
Hospital.Private.Int = "Private.YN.Int",
Stage.Diagnosis.Int = "Stage4.YN.Int",
Primary.Site.Int = "Site.LeftRectum.Int + Site.Right.Int",
Metastases.Number.Int = "Metastases.2.Int + Metastases.3plus.Int",
Metastases.Liver.Int = "Liver.YN.Int",
Metastases.Peritoneum.Int = "Peritoneum.YN.Int",
L1.ECOG.Int = "ECOG.1.Int + ECOG.23.Int",
Charlson.Int = "Charlson.1plus.Int",
CEA.Level.Int = "CEA.200plus.Int",
RAS.Mutated.Int = "MutantRAS.YN.Int",
BRAF.Mutated.Int = "MutantBRAF.YN.Int"
);
# List object that contains an overview of all variables and the columns in the data that
# they use for overall survival after first-line progression (LX)
vars.formulas.LX <- list(
# Single variables
PFS = "PFS.Time",
Arm.ChemBev = "ChemBev",
Age = "Age1 + Age2",
Gender.Female = "Female.YN",
Hospital.Private = "Private.YN",
Stage.Diagnosis = "Stage4.YN",
Primary.Site = "Site.LeftRectum + Site.Right",
Metastases.Number = "Metastases.2 + Metastases.3plus",
Metastases.Liver = "Liver.YN",
Metastases.Peritoneum = "Peritoneum.YN",
L1.ECOG = "ECOG.1 + ECOG.23",
Charlson = "Charlson.1plus.YN",
CEA.Level = "CEA.200plus",
RAS.Mutated = "MutantRAS.YN",
BRAF.Mutated = "MutantBRAF.YN",
# Interactions with treatment
PFS.Int = "PFS.Time.Int",
Age.Int = "Age1.Int + Age2.Int",
Gender.Female.Int = "Female.YN.Int",
Hospital.Private.Int = "Private.YN.Int",
Stage.Diagnosis.Int = "Stage4.YN.Int",
Primary.Site.Int = "Site.LeftRectum.Int + Site.Right.Int",
Metastases.Number.Int = "Metastases.2.Int + Metastases.3plus.Int",
Metastases.Liver.Int = "Liver.YN.Int",
Metastases.Peritoneum.Int = "Peritoneum.YN.Int",
L1.ECOG.Int = "ECOG.1.Int + ECOG.23.Int",
Charlson.Int = "Charlson.1plus.YN.Int",
CEA.Level.Int = "CEA.200plus.Int",
RAS.Mutated.Int = "MutantRAS.YN.Int",
BRAF.Mutated.Int = "MutantBRAF.YN.Int"
);
# Indicate which variables should be included and which are subject to selection (backward vs. forward)
vars.include.L1 <- c("Arm.ChemBev", "Age", "L1.ECOG", "BRAF.Mutated");
vars.backward.L1 <- c("Stage.Diagnosis", "Primary.Site", "Metastases.Number", "Metastases.Liver",
"Metastases.Peritoneum", "Charlson");
vars.forward.L1 <- c("Gender.Female", "Hospital.Private", "CEA.Level", "RAS.Mutated");
# Indicate which variables should be included and which are subject to selection (backward vs. forward)
vars.include.LX <- c("Arm.ChemBev", "Age", "Primary.Site", "Metastases.Peritoneum", "L1.ECOG", "BRAF.Mutated");
vars.backward.LX <- c("PFS","Stage.Diagnosis", "Metastases.Number", "Metastases.Liver", "Charlson", "CEA.Level");
vars.forward.LX <- c("Gender.Female", "Hospital.Private", "RAS.Mutated");
# Transform cleaned datasets into appropriate format
datasets.surv.L1 <- lapply(datasets, function(dataset) getSurvDataset.L1(dataset));
datasets.surv.LX <- lapply(datasets, function(dataset) getSurvDataset.LX(filter(dataset, LX.OS.Include)));
After defining all functions and input required for fitting the (survival) models, all procedures can be performed.
# Fit Model L1-Time
Model.L1.Time <- fitSurvivalModel(datasets = datasets.surv.L1, time = "Time", event = "Event", dist = "weibull", vars.include = vars.include.L1, vars.backward = vars.backward.L1, vars.forward = vars.forward.L1, vars.formulas = vars.formulas.L1, pvalue.backward = 0.3, pvalue.forward = 0.2, pvalue.interaction = 0.3);
# Save and print Model L1-Time
save(x=Model.L1.Time, file="object Model.L1.Time 20190201.RData");
write.table(cbind(Coef=Model.L1.Time$coef, Lower=Model.L1.Time$lower, Upper=Model.L1.Time$upper), file="Model L1-Time 20190201.txt", sep=";");
cbind(Coefficient=Model.L1.Time$coef, Lower.Bound=Model.L1.Time$lower, Upper.Bound=Model.L1.Time$upper);
## Coefficient Lower.Bound Upper.Bound
## (Intercept) 4.5352456627 3.6607544949 5.409736831
## ChemBev 0.4903821144 0.2176428890 0.763121340
## Age1 0.0467445556 0.0193879515 0.074101160
## Age2 -0.0003818097 -0.0006122064 -0.000151413
## ECOG.1 -0.4420102362 -0.7086076563 -0.175412816
## ECOG.23 -0.4466416971 -0.7601591259 -0.133124268
## MutantBRAF.YN -0.1586860032 -0.3946605642 0.077288558
## Site.LeftRectum -0.0320105967 -0.2807102482 0.216689055
## Site.Right -0.2048020078 -0.4594543905 0.049850375
## Liver.YN -0.1006873497 -0.2196800253 0.018305326
## Peritoneum.YN -0.0967462377 -0.2154643846 0.021971909
## CEA.200plus -0.0349843979 -0.3386554012 0.268686605
## Private.YN 0.3951723972 0.1640512674 0.626293527
## MutantRAS.YN -0.0958525980 -0.2360036442 0.044298448
## ECOG.1.Int 0.2336980952 -0.0577437169 0.525139907
## ECOG.23.Int -0.0886445670 -0.4773567723 0.300067638
## CEA.200plus.Int -0.2021634520 -0.5296258343 0.125298930
## Private.YN.Int -0.4110088123 -0.6653954483 -0.156622176
## Log(Scale) -0.3324565000 -0.3850521961 -0.279860804
# Fit Model L1-Event
vars.all.L1 <- c(vars.include.L1, vars.backward.L1, vars.forward.L1);
vars.all.L1 <- vars.all.L1[vars.all.L1!="Arm.ChemBev"];
Model.L1.Event <- fitLogisticModel(datasets = datasets.surv.L1, events = "Events.01", vars.include = c("L1.PFS.Time", "Arm.ChemBev"), vars.selection = c(vars.all.L1, str_c(vars.all.L1, ".Int"), "L1.PFS.Time.Int"), vars.formulas = vars.formulas.L1, pvalue = 0.20);
# Save and print Model L1-Event
save(x=Model.L1.Event, file="object Model.L1.Event 20190201.RData")
write.table(cbind(Coef=Model.L1.Event$coef, Lower=Model.L1.Event$lower, Upper=Model.L1.Event$upper), file="Model L1-Event 20190201.txt", sep=";");
cbind(Coefficient=Model.L1.Event$coef, Lower.Bound=Model.L1.Event$lower, Upper.Bound=Model.L1.Event$upper);
## Coefficient Lower.Bound Upper.Bound
## (Intercept) -2.069855e+00 -3.011333e+00 -1.128376e+00
## Time1 -1.871496e-04 -8.169673e-03 7.795374e-03
## Time2 -2.561821e-06 -1.577255e-05 1.064891e-05
## ChemBev 1.738231e-01 -1.229186e+00 1.576832e+00
## Charlson.1plus 6.461158e-01 1.593857e-01 1.132846e+00
## Site.LeftRectum.Int -1.462309e+00 -2.459083e+00 -4.655346e-01
## Site.Right.Int -1.469024e+00 -2.529350e+00 -4.086988e-01
## Time1.Int 2.267294e-03 -6.025885e-03 1.056047e-02
## Time2.Int 2.566552e-06 -1.072477e-05 1.585787e-05
# Fit Model LX-Time
Model.LX.Time <- fitSurvivalModel(datasets = datasets.surv.LX, time = "Time", event = "Event", dist = "weibull", vars.include = vars.include.LX, vars.backward = vars.backward.LX, vars.forward = vars.forward.LX, vars.formulas = vars.formulas.LX, pvalue.backward = 0.3, pvalue.forward = 0.2, pvalue.interaction = 0.3)
# Save and print Model LX-Time
save(x=Model.LX.Time, file="object Model.LX.Time 20190201.RData")
write.table(cbind(Coef=Model.LX.Time$coef, Lower=Model.LX.Time$lower, Upper=Model.LX.Time$upper), file="Model LX-Time 20190201.txt", sep=";");
cbind(Coefficient=Model.LX.Time$coef, Lower.Bound=Model.LX.Time$lower, Upper.Bound=Model.LX.Time$upper);
## Coefficient Lower.Bound Upper.Bound
## (Intercept) 3.2891795888 2.2620579298 4.316301e+00
## ChemBev 0.8818385546 -0.2812537861 2.044931e+00
## Age1 -0.0251070916 -0.0370240389 -1.319014e-02
## Age2 -0.0005670460 -0.0011710130 3.692101e-05
## Site.LeftRectum -0.1579432364 -0.5383176339 2.224312e-01
## Site.Right -0.2365318069 -0.6211753023 1.481117e-01
## Peritoneum.YN 0.1513874151 -0.1773574308 4.801323e-01
## ECOG.1 -0.2762104282 -0.6659711458 1.135503e-01
## ECOG.23 -1.2022780055 -1.6308052807 -7.737507e-01
## MutantBRAF.YN 0.2745640877 -0.3312597292 8.803879e-01
## PFS.Time 0.5751723494 0.4054450540 7.448996e-01
## Stage4.YN -0.5243148245 -0.9757145591 -7.291509e-02
## Charlson.1plus.YN 0.1044981510 -0.0565984893 2.655948e-01
## MutantRAS.YN 0.1091776800 -0.3022287934 5.205842e-01
## Private.YN 0.1053037877 -0.0458828408 2.564904e-01
## Age1.Int 0.0174056852 0.0034559505 3.135542e-02
## Age2.Int 0.0004030007 -0.0003232005 1.129202e-03
## Peritoneum.YN.Int -0.2386910341 -0.6130196513 1.356376e-01
## ECOG.1.Int -0.0930659837 -0.5165916222 3.304597e-01
## ECOG.23.Int 0.3115761369 -0.2210811948 8.442335e-01
## MutantBRAF.YN.Int -0.6795392885 -1.3768905224 1.781195e-02
## PFS.Time.Int -0.1486793160 -0.3474887968 5.013016e-02
## Stage4.YN.Int 0.4904170757 -0.0221555921 1.002990e+00
## MutantRAS.YN.Int -0.4071664687 -0.8555996074 4.126667e-02
## Log(Scale) -0.1525148212 -0.2153760094 -8.965363e-02
Internal validation is performed to assess the performanc of the models. This exists of five steps:
For survival models Model L1-Time and Model LX-Time, several validation plots were assessed and model discrimination was assessed by Uno et al.’s modification of Harrel’s C-statistic to account for censoring. Two measures were used to assess survival model calibration: 1) the regression slope of the predicted Weibull scale parameter in a Cox-regression model, and 2) Demler et al.’s Greenwood-Nam-D’Agostino statistic (GND), which is a modification of Hosmer-Lemeshow’s statistic by Nam and D’Agostino and accounts for censoring using Greenwood’s variance estimator. Both C-statistic and GND are assessed at specific timepoints, which were selected to be at 0.5, 1, and 2 years. Additionally, GND requires a number of groups to be specified in which patients are stratified according to their predicted risk, for which 5 and 10 were chosen to assess impact of this arbitrary decision.
Internal validity of logistic regression Model L1-Event was assessed by excluding patients for which no event was observed and using Harrel’s C-statistic for model discrimination. To assess calibration, the regression slope of the predicted response of Model L1-Event in a logistic regression model and the Brier score were used.
Bootstrapping was performed to determine the optimism and obtain sets of models to perform the probabilistic sensitivity analysis for the discrete event simulation model.
Several custom functions are defined to make predictions using the fitted models and to calculate the statistics used. Please find an explanation of the functions in the R code below.
# The 'predictSurvmod', 'predictSurvmod.response', 'predictSurvmod.lp', 'predictSurvmod.p', 'predictLogreg',
# and 'predictLogreg.lp' are custom functions to make predictions using the fitted models
predictSurvmod <- function(model, newdata) {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Ge the Weibull Shape parameter from the model
Shape <- 1/exp(model["Log(Scale)"]);
model <- model[-which(names(model) == "Log(Scale)")];
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Get the scale parameter for the survival model
Scale <- exp(newdata.sub %*% model);
# Predict the times
p <- matrix(runif(nrow(newdata.sub)), ncol=1);
t <- qweibull(p=p, shape=Shape, scale=Scale);
# Return predicted times
return(t);
}
predictSurvmod.response <- function(model, newdata) {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Ge the Weibull Shape parameter from the model
Shape <- 1/exp(model["Log(Scale)"]);
model <- model[-which(names(model) == "Log(Scale)")];
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Get the scale parameter for the survival model
Scale <- exp(newdata.sub %*% model);
# Return predicted times
return(Scale);
}
predictSurvmod.lp <- function(model, newdata) {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Ge the Weibull Shape parameter from the model
Shape <- 1/exp(model["Log(Scale)"]);
model <- model[-which(names(model) == "Log(Scale)")];
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Get the scale parameter for the survival model
LP <- newdata.sub %*% model;
# Return predicted times
return(LP);
}
predictSurvmod.p <- function(model, newdata, q) {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Ge the Weibull Shape parameter from the model
Shape <- 1/exp(model["Log(Scale)"]);
model <- model[-which(names(model) == "Log(Scale)")];
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Get the scale parameter for the survival model
Scale <- exp(newdata.sub %*% model);
# Predict the times
q <- matrix(q, nrow=nrow(newdata), ncol=length(q), byrow=T);
p <- pweibull(q=q, shape=Shape, scale=Scale)
# Return predicted times
return(p);
}
predictLogreg <- function(model, newdata) {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Calculate the probabilities
p.tmp <- exp(newdata.sub %*% model);
p <- p.tmp / (p.tmp + 1);
# Return probabilities
return(as.vector(p));
}
predictLogreg.lp <- function(model, newdata) {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Calculate the probabilities
p.tmp <- newdata.sub %*% model;
# Return probabilities
return(p.tmp);
}
# The 'testGND' is used to calculate the Greenwood-Nam-D'Agostino statistic as proposed by the authors and
# with some additional error handlers included and an option to plot an illustration
testGND <- function(model, data, time, n.bin=5, min.event=5, plot.graph=F, y.max=1) {
time.in.days <- time*365.25;
# Predict event probability for each individual
pt <- predictSurvmod.p(model=model, newdata=data, q=time.in.days);
# Bin individuals according to risk of event
pt.q <- quantile(pt, seq(0, 1, length.out=n.bin+1));
b <- sapply(pt, function(p) ifelse(p==max(pt.q), n.bin, which(p<pt.q)[1]-1));
# Calculate average probability for each group
ptg <- tapply(pt, b, mean);
# Determine Kaplan-Meier estimates
KMg <- t(sapply(1:n.bin, function(bin) {
# Get subset of data for the bin
data.sub <- data[b==bin, ];
# Check whether sufficient events are observed
if(sum(data.sub$Event==1)<min.event) stop("Insufficient events observed!");
# Estimate Kaplan-Meier for the bin
KM <- survfit(Surv(Time, Event) ~ 1, data=data.sub, type="kaplan-meier");
# Extract the estimates:
# p event probability (1 - km survival estimate)
# var variance of km survival estimate
# nrisk number at risk
if(time.in.days<min(KM$time)) {
p <- 1;
var <- 0;
} else {
p <- KM$surv[tail(which(KM$time<=time.in.days), n=1)];
p <- ifelse(p==0, KM$surv[tail(which(KM$time<=time.in.days), n=1)-1], p);
err <- KM$std.err[tail(which(KM$time<=time.in.days), n=1)];
err <- ifelse(err==Inf, KM$std.err[tail(which(KM$time<=time.in.days), n=1)-1], err);
var <- (err * p)^2;
}
# Return estimates
c(p=1-p, var=var)
}))
# Calculate test statistic
GND.chisq.sep <- (KMg[,"p"]-ptg)^2 / KMg[,"var"];
GND.chisq <- sum(GND.chisq.sep);
GND.df <- n.bin - 1;
GND.p <- 1 - pchisq(q=GND.chisq, df=GND.df);
if(plot.graph) {
barplot(rbind(ptg, KMg[, "p"]), beside=T, ylim=c(0,y.max), main=paste0("Event before ",time, "-year (p = ",round(GND.p,3),")"), ylab="Probability", names.arg=str_c("n=",table(b)), las=1, legend.text=c("Survival Model", "Kaplan-Meier"), args.legend=list(x="top", inset=c(0,-0.0), ncol=2, bty='n'))
}
# Return results
return(c(chisq=GND.chisq, df=GND.df, p=GND.p));
}
# The 'getPerformanceSurvmod' and 'getPerformanceLogreg' functions are high-level functions that call all
# functions used to calculate statistics used to assess the models' performance
getPerformanceSurvmod <- function(model, data.train, data.test, timepoints=c(0.5, 1, 2)) {
library(survAUC);
# c-statistic
c.statistic <- sapply(timepoints, function(tt) {
# Make negative because a higher scale indicates a higher time-to-event, i.e. lower risk
pred.lp <- -predictSurvmod.lp(model=model, newdata=data.test);
surv.train <- Surv(time=data.train$Time, event=data.train$Event);
surv.test <- Surv(time=data.test$Time, event=data.test$Event);
UnoC(Surv.rsp=surv.train, Surv.rsp.new=surv.test, lpnew=pred.lp, time=tt*365.25);
});
names(c.statistic) <- str_c("c.statistic_", timepoints, "-year");
# Regression Slope
data.test$scale <- predictSurvmod.response(model, data.test);
slope.scale <- exp(coxph(Surv(Time, Event) ~ scale, data=data.test)$coefficients);
# Hosmer-Lemeshow
HL.5bins <- sapply(timepoints, function(tt) {
testGND(model=model, data=data.test, time=tt, n.bin=5, plot=F)["chisq"]
});
HL.10bins <- sapply(timepoints, function(tt) {
testGND(model=model, data=data.test, time=tt, n.bin=10, plot=F)["chisq"]
});
names(HL.5bins) <- str_c("HL.5bins_", timepoints, "-year");
names(HL.10bins) <- str_c("HL.10bins_", timepoints, "-year");
# Return all performance measures
return(c(c.statistic, Slope.Scale = unname(slope.scale), HL.5bins, HL.10bins))
}
getPerformanceLogreg <- function(model, data) {
data.sub <- data[data$Events!="Censored", ];
obs <- ifelse(data.sub$Events=="Death", 1,
ifelse(data.sub$Events=="Progression", 0, NA));
pred.prob <- predictLogreg(model=model, newdata=data.sub);
pred.lp <- predictLogreg.lp(model=model, newdata=data.sub);
# c-statistic
pairs <- expand.grid(1:length(obs),
1:length(obs));
c_n <- sum(ifelse(obs[pairs[,1]]==1 & obs[pairs[,2]]==0, 1, 0) * ifelse(pred.prob[pairs[,1]]>pred.prob[pairs[,2]], 1, 0));
c_d <- sum(ifelse(obs[pairs[,1]]==1 & obs[pairs[,2]]==0, 1, 0));
c.statistic <- c_n/c_d;
# Regression slope
tmp.model <- glm(obs ~ pred.lp, family=binomial(link="logit"));
regression.slope <- tmp.model$coefficients["pred.lp"];
# Brier score
Brier.score <- sum((pred.prob-obs)^2)/length(obs);
# Return all performance measures
return(c(C.Statistic = c.statistic, Regression.Slope = unname(regression.slope), Brier.Score = Brier.score))
}
Both Kaplan-Meier plots for different patient subgroups and illustrations of the Greenwood-Nam-D’Agostino statistic were used to graphically assess the performance of the survival models.
# Custom function to plot predited vs. observed Kaplan-Meier curves for both treatment strategies
plotKM <- function(model, data, class.column, time.max=1500, time.step=10, add.main=NULL) {
km.fit <- survfit(as.formula(str_c("Surv(Time, Event) ~ ",class.column)), data=data, type="kaplan-meier")
classes <- sort(unlist(unique(data[, class.column])));
n.classes <- length(classes);
n.obs.classes <- table(data[, class.column]);
q.values <- seq(1, time.max, time.step);
sim <- sapply(classes, function(class) {
surv.mat <- 1 - predictSurvmod.p(model, data[data[, class.column]==as.character(class), ], q=q.values);
surv.mean <- apply(surv.mat, 2, mean);
surv.mean
})
cols <- rainbow(n.classes);
if(is.null(add.main)) {
main.text <- class.column;
} else {
main.text <- str_c(class.column, "\n(", add.main,")")
}
plot(km.fit, col=cols, lty=1, lwd=2, xlim=c(0,time.max), main=main.text, xlab="Time to Progression or Death (Days)")
for(i in 1:n.classes) {
lines(x=q.values, y=sim[,i], col=cols[i], lty=2, lwd=2)
}
legend("right", legend=c(str_c(classes, " (obs) [n=",n.obs.classes,"]"), str_c(classes, " (sim)")),
col=c(cols, cols), lty=c(rep(1, n.classes), rep(2, n.classes)), lwd=2)
}
# Model L1-Time
set.seed(20181211);
random.dataset <- sample(1:length(datasets), 1);
data <- datasets.surv.L1[[random.dataset]];
# Kaplan-Meier curves for the following groups
# - The complete cohort
# - Patient with a left-sided primary tumour or rectum primary tumour
# - Patient with a right-sided primary tumour
# - Patients with an primary tumour in their colon but not otherwise specified (NOS) or unkown primary tumour site
par(mfrow=c(2,2));
plotKM(model=Model.L1.Time$coef, data=data, class.column="ChemBev")
plotKM(model=Model.L1.Time$coef, data=filter(data, Site.LeftRectum==1), class.column="ChemBev", add.main="Left or Rectum")
plotKM(model=Model.L1.Time$coef, data=filter(data, Site.Right==1), class.column="ChemBev", add.main="Right")
plotKM(model=Model.L1.Time$coef, data=filter(data, Site.LeftRectum==0, Site.Right==0), class.column="ChemBev", add.main="Other")
# Kaplan-Meier curves for the following groups
# - Female patients
# - Male patients
# - Patients with metastatic disease at diagnosis
# - Patients without metastatic disease at diagnosis
par(mfrow=c(2,2));
plotKM(model=Model.L1.Time$coef, data=filter(data, Female.YN==1), class.column="ChemBev", add.main="Female")
plotKM(model=Model.L1.Time$coef, data=filter(data, Female.YN==0), class.column="ChemBev", add.main="Male")
plotKM(model=Model.L1.Time$coef, data=filter(data, Stage4.YN==1), class.column="ChemBev", add.main="Stage 4 at Diagnosis")
plotKM(model=Model.L1.Time$coef, data=filter(data, Stage4.YN==0), class.column="ChemBev", add.main="NOT Stage 4 at Diagnosis")
# Kaplan-Meier curves for the following groups
# - Patients with an ECOG Performance Status of 2 or 3
# - Patients with an ECOG Performance Status of 1
# - Patients with an ECOG Performance Status of 0
par(mfrow=c(2,2));
plotKM(model=Model.L1.Time$coef, data=filter(data, ECOG.23==1), class.column="ChemBev", add.main="ECOG 23")
plotKM(model=Model.L1.Time$coef, data=filter(data, ECOG.1==1), class.column="ChemBev", add.main="ECOG 1")
plotKM(model=Model.L1.Time$coef, data=filter(data, ECOG.23==0, ECOG.1==0), class.column="ChemBev", add.main="ECOG 0")
# Graphical representation of the Greenwood-Nam-D'Agostino statistic with 5 risk groups
par(mfrow=c(2,2));
testGND(model=Model.L1.Time$coef, data=data, time=0.5, n.bin=5, plot.graph=T);
## chisq df p
## 4.4991559 4.0000000 0.3426476
testGND(model=Model.L1.Time$coef, data=data, time=1, n.bin=5, plot.graph=T);
## chisq df p
## 2.4457211 4.0000000 0.6543809
testGND(model=Model.L1.Time$coef, data=data, time=2, n.bin=5, plot.graph=T);
## chisq df p
## 2.0226615 4.0000000 0.7315906
# Graphical representation of the Greenwood-Nam-D'Agostino statistic with 10 risk groups
par(mfrow=c(2,2));
testGND(model=Model.L1.Time$coef, data=data, time=0.5, n.bin=10, plot.graph=T);
## chisq df p
## 7.1396005 9.0000000 0.6225876
testGND(model=Model.L1.Time$coef, data=data, time=1, n.bin=10, plot.graph=T);
## chisq df p
## 8.1382488 9.0000000 0.5202764
testGND(model=Model.L1.Time$coef, data=data, time=2, n.bin=10, plot.graph=T);
## chisq df p
## 5.2879510 9.0000000 0.8085192
# Graphical representation of the Greenwood-Nam-D'Agostino statistic as included in the manuscript
png(file="GND Model L1-Time 20190201.png", width=12, height=6, units="in", res=96);
par(mfrow=c(1,2));
testGND(model=Model.L1.Time$coef, data=data, time=1, n.bin=5, plot.graph=T);
## chisq df p
## 2.4457211 4.0000000 0.6543809
testGND(model=Model.L1.Time$coef, data=data, time=1, n.bin=10, plot.graph=T);
## chisq df p
## 8.1382488 9.0000000 0.5202764
dev.off();
## quartz_off_screen
## 2
# Model LX-Time
data <- datasets.surv.LX[[random.dataset]];
# Kaplan-Meier curves for the following groups
# - The complete cohort
# - Patient with a left-sided primary tumour or rectum primary tumour
# - Patient with a right-sided primary tumour
# - Patients with an primary tumour in their colon but not otherwise specified (NOS) or unkown primary tumour site
par(mfrow=c(2,2));
plotKM(model=Model.LX.Time$coef, data=data, class.column="ChemBev")
plotKM(model=Model.LX.Time$coef, data=filter(data, Site.LeftRectum==1), class.column="ChemBev", add.main="Left or Rectum")
plotKM(model=Model.LX.Time$coef, data=filter(data, Site.Right==1), class.column="ChemBev", add.main="Right")
plotKM(model=Model.LX.Time$coef, data=filter(data, Site.LeftRectum==0, Site.Right==0), class.column="ChemBev", add.main="Other")
# Kaplan-Meier curves for the following groups
# - Female patients
# - Male patients
# - Patients with metastatic disease at diagnosis
# - Patients without metastatic disease at diagnosis
par(mfrow=c(2,2));
plotKM(model=Model.LX.Time$coef, data=filter(data, Female.YN==1), class.column="ChemBev", add.main="Female")
plotKM(model=Model.LX.Time$coef, data=filter(data, Female.YN==0), class.column="ChemBev", add.main="Male")
plotKM(model=Model.LX.Time$coef, data=filter(data, Stage4.YN==1), class.column="ChemBev", add.main="Stage 4 at Diagnosis")
plotKM(model=Model.LX.Time$coef, data=filter(data, Stage4.YN==0), class.column="ChemBev", add.main="NOT Stage 4 at Diagnosis")
# Kaplan-Meier curves for the following groups
# - Patients with an ECOG Performance Status of 2 or 3
# - Patients with an ECOG Performance Status of 1
# - Patients with an ECOG Performance Status of 0
par(mfrow=c(2,2));
plotKM(model=Model.LX.Time$coef, data=filter(data, ECOG.23==1), class.column="ChemBev", add.main="ECOG 23")
plotKM(model=Model.LX.Time$coef, data=filter(data, ECOG.1==1), class.column="ChemBev", add.main="ECOG 1")
plotKM(model=Model.LX.Time$coef, data=filter(data, ECOG.23==0, ECOG.1==0), class.column="ChemBev", add.main="ECOG 0")
# Graphical representation of the Greenwood-Nam-D'Agostino statistic with 5 risk groups
par(mfrow=c(2,2));
testGND(model=Model.LX.Time$coef, data=data, time=0.5, n.bin=5, plot.graph=T);
## chisq df p
## 8.62654164 4.00000000 0.07114301
testGND(model=Model.LX.Time$coef, data=data, time=1, n.bin=5, plot.graph=T);
## chisq df p
## 2.8534511 4.0000000 0.5826418
testGND(model=Model.LX.Time$coef, data=data, time=2, n.bin=5, plot.graph=T);
## chisq df p
## 7.4139126 4.0000000 0.1155658
# Graphical representation of the Greenwood-Nam-D'Agostino statistic with 10 risk groups
par(mfrow=c(2,2));
testGND(model=Model.LX.Time$coef, data=data, time=0.5, n.bin=10, plot.graph=T);
## chisq df p
## 16.73744515 9.00000000 0.05298987
testGND(model=Model.LX.Time$coef, data=data, time=1, n.bin=10, plot.graph=T);
## chisq df p
## 4.2555980 9.0000000 0.8937987
testGND(model=Model.LX.Time$coef, data=data, time=2, n.bin=10, plot.graph=T);
## chisq df p
## 9.5251554 9.0000000 0.3902742
# Graphical representation of the Greenwood-Nam-D'Agostino statistic as included in the manuscript
png(file="GND Model LX-Time 20190201.png", width=12, height=6, units="in", res=96);
par(mfrow=c(1,2));
testGND(model=Model.LX.Time$coef, data=data, time=1, n.bin=5, plot.graph=T);
## chisq df p
## 2.8534511 4.0000000 0.5826418
testGND(model=Model.LX.Time$coef, data=data, time=1, n.bin=10, plot.graph=T);
## chisq df p
## 4.2555980 9.0000000 0.8937987
dev.off();
## quartz_off_screen
## 2
Using the previously defined functions, the uncorrected, biased apparent performance of the models is calculated, so that these can be corrected for the optimism as estimated from the bootstrapping later on. The apparent performance is calculated for each imputed datasets separately and averaged to obtain a single apparent performance outcome.
# General settings
timepoints <- c(0.5, 1, 2);
# Model L1-Event
Dapp.Model.L1.Time <- rowMeans(
sapply(datasets.surv.L1, function(dataset) {
getPerformanceSurvmod(model = Model.L1.Time$coef, data.train = dataset, data.test = dataset, timepoints = timepoints)
})
);
# Observe apparent performance
Dapp.Model.L1.Time;
## c.statistic_0.5-year c.statistic_1-year c.statistic_2-year
## 0.6591775 0.6348088 0.6287258
## Slope.Scale HL.5bins_0.5-year HL.5bins_1-year
## 0.9958962 4.0929190 4.3369164
## HL.5bins_2-year HL.10bins_0.5-year HL.10bins_1-year
## 0.9944051 7.4060601 10.6726626
## HL.10bins_2-year
## 5.5885418
# Model L1-Event
Dapp.Model.L1.Event <- rowMeans(
sapply(datasets.surv.L1, function(dataset) {
getPerformanceLogreg(model = Model.L1.Event$coef, data = dataset)
})
);
# Observe apparent performance
Dapp.Model.L1.Event;
## C.Statistic Regression.Slope Brier.Score
## 0.68958824 0.99974239 0.08674998
# Model LX-Event
Dapp.Model.LX.Time <- rowMeans(
sapply(datasets.surv.LX, function(dataset) {
getPerformanceSurvmod(model = Model.LX.Time$coef, data.train = dataset, data.test = dataset, timepoints = timepoints)
})
);
# Observe apparent performance
Dapp.Model.LX.Time;
## c.statistic_0.5-year c.statistic_1-year c.statistic_2-year
## 0.7174054 0.6933501 0.6844447
## Slope.Scale HL.5bins_0.5-year HL.5bins_1-year
## 0.9971058 8.1409540 1.5974814
## HL.5bins_2-year HL.10bins_0.5-year HL.10bins_1-year
## 4.9123090 15.9033123 6.6292335
## HL.10bins_2-year
## 7.6692321
Bootstrapping was performed to assess the optimism of the (survival) models and to obtain sets of correlared model parameters to be used to perform a probabilistic sensitivity analysis for the discrete event simulation. In each bootstrap iteration, the following actions are performed:
# Setting for the bootstrapping
n.boot <- 10000; # number of bootstrap iterations
set.seed(20181211); # random number seed for reproduction
# Define the model structure that needs to be fitted for the models in each bootstrap iteration
Model.L1.Time.formula <- str_c(
str_c(c("Surv(Time, Event) ~ "), collapse=""),
str_c(vars.formulas.L1[Model.L1.Time$included], collapse=" + ")
);
Model.L1.Event.formula <- str_c(
str_c(c("Events.01 ~ "), collapse=""),
str_c(vars.formulas.L1[Model.L1.Event$included], collapse=" + ")
);
Model.LX.Time.formula <- str_c(
str_c(c("Surv(Time, Event) ~ "), collapse=""),
str_c(vars.formulas.LX[Model.LX.Time$included], collapse=" + ")
);
# Set up a multithreaded computation cluster
library(parallel);
library(doSNOW);
cl <- makeCluster(detectCores());
clusterExport(cl, ls());
# Perform the bootstrap
out.bootstrap <- t(parSapply(cl, 1:n.boot, function(i.boot) {
# Load packages
library(dplyr);
library(stringr);
library(survival);
library(aod)
library(survAUC);
# Initialize an output object and an object to track whether any errors occur
out <- NULL;
count.errors <- -1;
while(is.null(out)) {
# An error handler is used to make sure that an a set of outcomes is obtained for each bootstrap iteration
# If an error occurs, this is recorded, so that it is known whether this occurs and, if so, how often
out <- tryCatch({
# Increase the error counter (starts at -1 so if no errors occur the counter will be zero)
count.errors <- count.errors + 1;
# Bootstrap sample of patients
boot.sample <- sample(x=1:nrow(datasets[[1]]), size=nrow(datasets[[1]]), replace=T);
# For each imputed dataset
out <- t(sapply(1:length(datasets), function(i.dataset) {
## Get bootstrap sample
# Get datasets from imputed dataset
dataset <- datasets[[i.dataset]];
dataset.L1 <- dataset;
dataset.LX <- filter(dataset, LX.OS.Include);
boot.dataset.L1 <- dataset[boot.sample, , drop=F];
boot.dataset.LX <- filter(boot.dataset.L1, LX.OS.Include);
# Transform data in appropriate format
dataset.surv.L1 <- getSurvDataset.L1(data=dataset.L1);
dataset.surv.LX <- getSurvDataset.LX(data=dataset.LX);
boot.dataset.surv.L1 <- getSurvDataset.L1(data=boot.dataset.L1);
boot.dataset.surv.LX <- getSurvDataset.LX(data=boot.dataset.LX);
## Fit the models
# Model L1-Time
boot.Model.L1.Time <- survreg(formula = as.formula(Model.L1.Time.formula), data=boot.dataset.surv.L1, dist="weibull");
boot.survmod.L1 <- c(boot.Model.L1.Time$coefficients, 'Log(Scale)'=log(boot.Model.L1.Time$scale));
# Model L1-Event
boot.Model.L1.Event <- glm(formula = as.formula(Model.L1.Event.formula), data=boot.dataset.surv.L1, family=binomial(link="logit"));
boot.logreg.L1 <- boot.Model.L1.Event$coefficients;
# Model LX-Time
boot.Model.LX.Time <- survreg(formula = as.formula(Model.LX.Time.formula), data=boot.dataset.surv.LX, dist="weibull");
boot.survmod.LX <- c(boot.Model.LX.Time$coefficients, 'Log(Scale)'=log(boot.Model.LX.Time$scale));
## Performance Model L1-Time
# Performance in bootstrap sample
perf.surv.L1.boot <- getPerformanceSurvmod(model = boot.survmod.L1, data.train = boot.dataset.surv.L1, data.test = boot.dataset.surv.L1, timepoints = timepoints)
# Performance in original (imputed) dataset
perf.surv.L1.origin <- getPerformanceSurvmod(model = boot.survmod.L1, data.train = boot.dataset.surv.L1, data.test = dataset.surv.L1, timepoints = timepoints);
# Calculate the optimism
optimism.surv.L1 <- perf.surv.L1.boot - perf.surv.L1.origin;
# Set names for later analyses
names(perf.surv.L1.boot) <- str_c("BOOT_SURVMOD_L1_", names(perf.surv.L1.boot));
names(perf.surv.L1.origin) <- str_c("ORG_SURVMOD_L1_", names(perf.surv.L1.origin));
names(optimism.surv.L1) <- str_c("OPT_SURVMOD_L1_", names(optimism.surv.L1));
names(boot.survmod.L1) <- str_c("SURVMOD_L1_", names(boot.survmod.L1));
## Performance Model L1-Event
# Performance in bootstrap sample
perf.logreg.L1.boot <- getPerformanceLogreg(model = boot.logreg.L1, data = boot.dataset.surv.L1)
# Performance in original (imputed) dataset
perf.logreg.L1.origin <- getPerformanceLogreg(model = boot.logreg.L1, data = dataset.surv.L1);
# Calculate the optimism
optimism.logreg.L1 <- perf.logreg.L1.boot - perf.logreg.L1.origin;
# Set names for later analyses
names(perf.logreg.L1.boot) <- str_c("BOOT_EVENTMOD_L1_", names(perf.logreg.L1.boot));
names(perf.logreg.L1.origin) <- str_c("ORG_EVENTMOD_L1_", names(perf.logreg.L1.origin));
names(optimism.logreg.L1) <- str_c("OPT_EVENTMOD_L1_", names(optimism.logreg.L1));
names(boot.logreg.L1) <- str_c("EVENTMOD_L1_", names(boot.logreg.L1));
## Performance Model LX-Time
# Performance in bootstrap sample
perf.surv.LX.boot <- getPerformanceSurvmod(model = boot.survmod.LX, data.train = boot.dataset.surv.LX, data.test = boot.dataset.surv.LX, timepoints = timepoints)
# Performance in original (imputed) dataset
perf.surv.LX.origin <- getPerformanceSurvmod(model = boot.survmod.LX, data.train = boot.dataset.surv.LX, data.test = dataset.surv.LX, timepoints = timepoints);
# Calculate the optimism
optimism.surv.LX <- perf.surv.LX.boot - perf.surv.LX.origin;
# Set names for later analyses
names(perf.surv.LX.boot) <- str_c("BOOT_SURVMOD_LX_", names(perf.surv.LX.boot));
names(perf.surv.LX.origin) <- str_c("ORG_SURVMOD_LX_", names(perf.surv.LX.origin));
names(optimism.surv.LX) <- str_c("OPT_SURVMOD_LX_", names(optimism.surv.LX));
names(boot.survmod.LX) <- str_c("SURVMOD_LX_", names(boot.survmod.LX));
## Return models and performance
c(boot.survmod.L1, perf.surv.L1.boot, perf.surv.L1.origin, optimism.surv.L1, boot.logreg.L1, perf.logreg.L1.boot, perf.logreg.L1.origin, optimism.logreg.L1, boot.survmod.LX, perf.surv.LX.boot, perf.surv.LX.origin, optimism.surv.LX, Errors = count.errors)
})) # sapply datasets
# Apply Rubin's rule to obtain average models and outcomes
out <- colMeans(out);
# Check whether there are infeasible results for any of the GND tests and, if so, give
# an error so that another bootstrap sample is performed
#if(any(out[str_detect(string=names(out), pattern="HL")]==Inf)) {
if(any(out==Inf)) stop("An error occured in the analysis of the bootstrap sample!");
#}
# Return the 'object'
out;
# Error handler for the tryCatch function
}, error=function(e) NULL)
} # While
# Return the outcomes for the bootstrap iteration
out;
})); # parSapply nboot
# Stop the defined computation cluster
stopCluster(cl);
# Print five iteration of the bootstrap
head(out.bootstrap, n=5);
## SURVMOD_L1_(Intercept) SURVMOD_L1_ChemBev SURVMOD_L1_Age1
## [1,] 3.981999 0.7133633 0.04901094
## [2,] 4.700113 0.4052946 0.04082126
## [3,] 4.620920 0.6174053 0.03743284
## [4,] 4.518186 0.5050754 0.04693260
## [5,] 4.863937 0.4679972 0.03160581
## SURVMOD_L1_Age2 SURVMOD_L1_ECOG.1 SURVMOD_L1_ECOG.23
## [1,] -0.0003953975 -0.4172791 -0.7096706
## [2,] -0.0003497074 -0.5041710 -0.2891771
## [3,] -0.0003100127 -0.3154448 -0.6469685
## [4,] -0.0003717848 -0.1261577 -0.3197438
## [5,] -0.0002592091 -0.4779499 -0.2849362
## SURVMOD_L1_MutantBRAF.YN SURVMOD_L1_Site.LeftRectum
## [1,] -0.18668733 0.2851306
## [2,] -0.10763923 0.1840829
## [3,] -0.10634250 0.1204314
## [4,] -0.08786323 -0.2165154
## [5,] -0.18082350 0.2201003
## SURVMOD_L1_Site.Right SURVMOD_L1_Liver.YN SURVMOD_L1_Peritoneum.YN
## [1,] 0.05037495 -0.08705661 -0.1502843
## [2,] -0.04295241 -0.20918879 -0.0965782
## [3,] -0.02865888 -0.14946688 -0.1496728
## [4,] -0.35799352 -0.16156255 -0.1867249
## [5,] 0.07600494 -0.20936823 -0.1616124
## SURVMOD_L1_CEA.200plus SURVMOD_L1_Private.YN SURVMOD_L1_MutantRAS.YN
## [1,] -0.02250040 0.5601233 -0.06317179
## [2,] 0.02959098 0.4218272 -0.05558826
## [3,] 0.04848758 0.4890554 -0.02016390
## [4,] -0.37707645 0.3980278 -0.12519125
## [5,] -0.03467023 0.4405518 -0.10982494
## SURVMOD_L1_ECOG.1.Int SURVMOD_L1_ECOG.23.Int
## [1,] 0.268692508 0.17451063
## [2,] 0.403546316 0.07013651
## [3,] 0.057770700 0.11115989
## [4,] -0.001833973 -0.14401697
## [5,] 0.244993040 -0.32615021
## SURVMOD_L1_CEA.200plus.Int SURVMOD_L1_Private.YN.Int
## [1,] -0.2969168 -0.6098874
## [2,] -0.3712344 -0.4717239
## [3,] -0.3673050 -0.4517613
## [4,] 0.3443316 -0.3426054
## [5,] -0.1190150 -0.4710576
## SURVMOD_L1_Log(Scale) BOOT_SURVMOD_L1_c.statistic_0.5-year
## [1,] -0.3352906 0.6971802
## [2,] -0.3162845 0.6075005
## [3,] -0.3872189 0.6787228
## [4,] -0.3924422 0.6314717
## [5,] -0.4041236 0.6488232
## BOOT_SURVMOD_L1_c.statistic_1-year BOOT_SURVMOD_L1_c.statistic_2-year
## [1,] 0.6630135 0.6527020
## [2,] 0.6154472 0.6124971
## [3,] 0.6543885 0.6424432
## [4,] 0.6148917 0.6109928
## [5,] 0.6363253 0.6308419
## BOOT_SURVMOD_L1_Slope.Scale BOOT_SURVMOD_L1_HL.5bins_0.5-year
## [1,] 0.9958290 5.211445
## [2,] 0.9962760 5.708340
## [3,] 0.9959867 10.338772
## [4,] 0.9953009 4.618231
## [5,] 0.9958087 6.050799
## BOOT_SURVMOD_L1_HL.5bins_1-year BOOT_SURVMOD_L1_HL.5bins_2-year
## [1,] 6.985530 11.005641
## [2,] 4.814476 5.175752
## [3,] 14.221859 5.976124
## [4,] 9.888893 7.126047
## [5,] 6.660566 3.508903
## BOOT_SURVMOD_L1_HL.10bins_0.5-year BOOT_SURVMOD_L1_HL.10bins_1-year
## [1,] 23.43496 28.94024
## [2,] 17.57629 17.19086
## [3,] 24.89114 24.99794
## [4,] 13.89146 24.25595
## [5,] 15.40319 15.69896
## BOOT_SURVMOD_L1_HL.10bins_2-year ORG_SURVMOD_L1_c.statistic_0.5-year
## [1,] 20.76857 0.6603587
## [2,] 13.96084 0.6362329
## [3,] 15.23128 0.6589743
## [4,] 14.40160 0.6437835
## [5,] 10.52725 0.6513928
## ORG_SURVMOD_L1_c.statistic_1-year ORG_SURVMOD_L1_c.statistic_2-year
## [1,] 0.6318032 0.6247304
## [2,] 0.6160703 0.6114792
## [3,] 0.6359020 0.6303478
## [4,] 0.6172235 0.6116596
## [5,] 0.6269510 0.6210518
## ORG_SURVMOD_L1_Slope.Scale ORG_SURVMOD_L1_HL.5bins_0.5-year
## [1,] 0.9968068 3.085940
## [2,] 0.9965162 2.079462
## [3,] 0.9966184 4.145054
## [4,] 0.9959796 4.621218
## [5,] 0.9964070 4.844827
## ORG_SURVMOD_L1_HL.5bins_1-year ORG_SURVMOD_L1_HL.5bins_2-year
## [1,] 10.950854 7.054199
## [2,] 7.846517 2.283721
## [3,] 13.404311 3.649663
## [4,] 4.739600 9.011875
## [5,] 6.753159 3.259901
## ORG_SURVMOD_L1_HL.10bins_0.5-year ORG_SURVMOD_L1_HL.10bins_1-year
## [1,] 7.978417 17.86005
## [2,] 6.379822 11.97190
## [3,] 8.151804 20.61193
## [4,] 9.711431 10.76515
## [5,] 8.845025 10.56049
## ORG_SURVMOD_L1_HL.10bins_2-year OPT_SURVMOD_L1_c.statistic_0.5-year
## [1,] 10.072965 0.036821541
## [2,] 7.287994 -0.028732415
## [3,] 7.852532 0.019748480
## [4,] 12.752327 -0.012311846
## [5,] 6.122629 -0.002569534
## OPT_SURVMOD_L1_c.statistic_1-year OPT_SURVMOD_L1_c.statistic_2-year
## [1,] 0.0312102343 0.027971551
## [2,] -0.0006230617 0.001017893
## [3,] 0.0184864365 0.012095387
## [4,] -0.0023317608 -0.000666834
## [5,] 0.0093743284 0.009790143
## OPT_SURVMOD_L1_Slope.Scale OPT_SURVMOD_L1_HL.5bins_0.5-year
## [1,] -0.0009777518 2.125504929
## [2,] -0.0002402122 3.628878019
## [3,] -0.0006317983 6.193717317
## [4,] -0.0006787829 -0.002987206
## [5,] -0.0005982845 1.205971312
## OPT_SURVMOD_L1_HL.5bins_1-year OPT_SURVMOD_L1_HL.5bins_2-year
## [1,] -3.96532395 3.9514412
## [2,] -3.03204090 2.8920317
## [3,] 0.81754863 2.3264606
## [4,] 5.14929344 -1.8858279
## [5,] -0.09259345 0.2490013
## OPT_SURVMOD_L1_HL.10bins_0.5-year OPT_SURVMOD_L1_HL.10bins_1-year
## [1,] 15.456548 11.080198
## [2,] 11.196472 5.218963
## [3,] 16.739334 4.386004
## [4,] 4.180025 13.490804
## [5,] 6.558168 5.138470
## OPT_SURVMOD_L1_HL.10bins_2-year EVENTMOD_L1_(Intercept)
## [1,] 10.695601 -1.222529
## [2,] 6.672845 -3.429277
## [3,] 7.378746 -1.896369
## [4,] 1.649275 -1.982959
## [5,] 4.404618 -1.695852
## EVENTMOD_L1_Time1 EVENTMOD_L1_Time2 EVENTMOD_L1_ChemBev
## [1,] -0.0127019951 9.717204e-06 -1.1545663
## [2,] 0.0040775680 -8.194758e-06 1.4111030
## [3,] -0.0007586047 -1.140798e-06 -0.1703071
## [4,] -0.0007956200 -1.050032e-07 0.8586870
## [5,] -0.0025616104 -9.145091e-07 -0.4207733
## EVENTMOD_L1_Charlson.1plus EVENTMOD_L1_Site.LeftRectum.Int
## [1,] 1.0387673 -1.5291757
## [2,] 0.7663630 -1.5997984
## [3,] 0.5972719 -0.7651645
## [4,] 0.4055901 -2.3030617
## [5,] 0.7016559 -1.6592355
## EVENTMOD_L1_Site.Right.Int EVENTMOD_L1_Time1.Int
## [1,] -1.0183295 0.015737858
## [2,] -1.8408813 -0.001153155
## [3,] -0.7035978 0.001664369
## [4,] -2.0915155 0.002074461
## [5,] -2.0244781 0.005974437
## EVENTMOD_L1_Time2.Int BOOT_EVENTMOD_L1_C.Statistic
## [1,] -1.017203e-05 0.7513534
## [2,] 7.602548e-06 0.7160946
## [3,] 1.754595e-06 0.6314291
## [4,] 1.165385e-06 0.7175743
## [5,] 7.297080e-08 0.7066423
## BOOT_EVENTMOD_L1_Regression.Slope BOOT_EVENTMOD_L1_Brier.Score
## [1,] 1 0.08621135
## [2,] 1 0.07125345
## [3,] 1 0.09863529
## [4,] 1 0.07825127
## [5,] 1 0.08003173
## ORG_EVENTMOD_L1_C.Statistic ORG_EVENTMOD_L1_Regression.Slope
## [1,] 0.6742480 0.6229088
## [2,] 0.6632729 0.7894793
## [3,] 0.6854978 1.1844405
## [4,] 0.6894995 0.7169080
## [5,] 0.6806799 0.8191654
## ORG_EVENTMOD_L1_Brier.Score OPT_EVENTMOD_L1_C.Statistic
## [1,] 0.08796597 0.07710542
## [2,] 0.08816621 0.05282169
## [3,] 0.08741610 -0.05406867
## [4,] 0.08821141 0.02807471
## [5,] 0.08674404 0.02596248
## OPT_EVENTMOD_L1_Regression.Slope OPT_EVENTMOD_L1_Brier.Score
## [1,] 0.3770911 -0.001754618
## [2,] 0.2105207 -0.016912762
## [3,] -0.1844404 0.011219198
## [4,] 0.2830920 -0.009960139
## [5,] 0.1808346 -0.006712313
## SURVMOD_LX_(Intercept) SURVMOD_LX_ChemBev SURVMOD_LX_Age1
## [1,] 3.562227 0.7824871 -0.02828989
## [2,] 3.998186 0.2554781 -0.02341365
## [3,] 3.569794 0.4655880 -0.02833868
## [4,] 2.313700 2.0443733 -0.02091674
## [5,] 2.598641 0.6808260 -0.03057557
## SURVMOD_LX_Age2 SURVMOD_LX_Site.LeftRectum SURVMOD_LX_Site.Right
## [1,] -0.0011449823 -0.1087073 -0.1413447
## [2,] -0.0002210010 -0.2235832 -0.3898415
## [3,] -0.0011554675 -0.3559879 -0.3880295
## [4,] -0.0009223096 -0.1158416 -0.2002665
## [5,] -0.0006766529 0.2135378 0.1598420
## SURVMOD_LX_Peritoneum.YN SURVMOD_LX_ECOG.1 SURVMOD_LX_ECOG.23
## [1,] -0.0249536 0.1549188 -1.0220102
## [2,] 0.2214340 -0.5808265 -1.4354497
## [3,] 0.1059064 -0.4217989 -1.3702082
## [4,] 0.5151359 -0.1775224 -1.3387347
## [5,] 0.0503954 -0.1520670 -0.9341271
## SURVMOD_LX_MutantBRAF.YN SURVMOD_LX_PFS.Time SURVMOD_LX_Stage4.YN
## [1,] 0.70546169 0.4707842 -0.3944380
## [2,] 0.19919089 0.5095575 -0.5856043
## [3,] -0.32472287 0.6052857 -0.6042184
## [4,] 0.03480724 0.7434121 -0.3734194
## [5,] 0.14462485 0.6508931 -0.9527968
## SURVMOD_LX_Charlson.1plus.YN SURVMOD_LX_MutantRAS.YN
## [1,] 0.09992622 0.17053839
## [2,] 0.04779290 0.03353796
## [3,] -0.07911071 0.09179549
## [4,] 0.13386652 0.22966125
## [5,] 0.07126643 0.05809591
## SURVMOD_LX_Private.YN SURVMOD_LX_Age1.Int SURVMOD_LX_Age2.Int
## [1,] 0.15158631 0.01786526 7.877169e-04
## [2,] 0.07598565 0.01288963 -3.442888e-05
## [3,] 0.03818174 0.02456055 1.056388e-03
## [4,] 0.20289242 0.01560396 1.133192e-03
## [5,] 0.14046066 0.02742750 6.063385e-04
## SURVMOD_LX_Peritoneum.YN.Int SURVMOD_LX_ECOG.1.Int
## [1,] -0.12031487 -0.46908724
## [2,] -0.21624553 0.27065371
## [3,] -0.15013080 -0.03996639
## [4,] -0.57248756 -0.03035211
## [5,] -0.01832955 -0.21184432
## SURVMOD_LX_ECOG.23.Int SURVMOD_LX_MutantBRAF.YN.Int
## [1,] 0.2905964 -1.2068901
## [2,] 0.5655256 -0.5433808
## [3,] 0.3297124 -0.3063589
## [4,] 0.4628835 -0.3725404
## [5,] -0.1904882 -0.4076959
## SURVMOD_LX_PFS.Time.Int SURVMOD_LX_Stage4.YN.Int
## [1,] -0.09350060 0.3643138
## [2,] -0.09965267 0.7427876
## [3,] -0.09376370 0.5268879
## [4,] -0.40790633 0.3279707
## [5,] -0.13930390 0.7655971
## SURVMOD_LX_MutantRAS.YN.Int SURVMOD_LX_Log(Scale)
## [1,] -0.3972561 -0.1596703
## [2,] -0.2979609 -0.1807637
## [3,] -0.4433107 -0.2072936
## [4,] -0.4738601 -0.2232146
## [5,] -0.2130537 -0.1556238
## BOOT_SURVMOD_LX_c.statistic_0.5-year
## [1,] 0.7175332
## [2,] 0.7182717
## [3,] 0.7255471
## [4,] 0.6871814
## [5,] 0.7255819
## BOOT_SURVMOD_LX_c.statistic_1-year BOOT_SURVMOD_LX_c.statistic_2-year
## [1,] 0.6836472 0.6742734
## [2,] 0.6979305 0.6871862
## [3,] 0.7094436 0.6946134
## [4,] 0.6827085 0.6717903
## [5,] 0.6989814 0.6822703
## BOOT_SURVMOD_LX_Slope.Scale BOOT_SURVMOD_LX_HL.5bins_0.5-year
## [1,] 0.9970078 11.326984
## [2,] 0.9970459 14.218835
## [3,] 0.9971341 7.908079
## [4,] 0.9965562 5.302841
## [5,] 0.9973610 10.901772
## BOOT_SURVMOD_LX_HL.5bins_1-year BOOT_SURVMOD_LX_HL.5bins_2-year
## [1,] 12.103363 5.984211
## [2,] 6.049089 8.935838
## [3,] 9.671642 4.928537
## [4,] 8.257890 8.517921
## [5,] 5.580362 6.485760
## BOOT_SURVMOD_LX_HL.10bins_0.5-year BOOT_SURVMOD_LX_HL.10bins_1-year
## [1,] 28.80884 23.00415
## [2,] 29.92394 18.69299
## [3,] 33.21597 18.72455
## [4,] 15.67308 17.15724
## [5,] 18.91509 15.45737
## BOOT_SURVMOD_LX_HL.10bins_2-year ORG_SURVMOD_LX_c.statistic_0.5-year
## [1,] 14.52087 0.7113699
## [2,] 27.31791 0.7036708
## [3,] 17.52691 0.7093891
## [4,] 17.04953 0.7103136
## [5,] 23.34100 0.7129121
## ORG_SURVMOD_LX_c.statistic_1-year ORG_SURVMOD_LX_c.statistic_2-year
## [1,] 0.6827227 0.6739350
## [2,] 0.6817569 0.6723853
## [3,] 0.6838889 0.6729765
## [4,] 0.6869212 0.6750528
## [5,] 0.6827392 0.6700449
## ORG_SURVMOD_LX_Slope.Scale ORG_SURVMOD_LX_HL.5bins_0.5-year
## [1,] 0.9971391 6.048096
## [2,] 0.9973047 4.313220
## [3,] 0.9977269 11.834210
## [4,] 0.9973201 6.336112
## [5,] 0.9975677 2.847844
## ORG_SURVMOD_LX_HL.5bins_1-year ORG_SURVMOD_LX_HL.5bins_2-year
## [1,] 3.723671 3.430626
## [2,] 3.913788 3.848451
## [3,] 13.579960 6.704817
## [4,] 6.530022 5.451594
## [5,] 9.312232 10.132961
## ORG_SURVMOD_LX_HL.10bins_0.5-year ORG_SURVMOD_LX_HL.10bins_1-year
## [1,] 12.759155 8.371971
## [2,] 10.075726 8.675773
## [3,] 19.169984 17.097972
## [4,] 15.517688 15.131077
## [5,] 7.252339 12.878907
## ORG_SURVMOD_LX_HL.10bins_2-year OPT_SURVMOD_LX_c.statistic_0.5-year
## [1,] 10.21783 0.006163227
## [2,] 10.92730 0.014600836
## [3,] 16.42205 0.016158051
## [4,] 11.36277 -0.023132143
## [5,] 19.21853 0.012669796
## OPT_SURVMOD_LX_c.statistic_1-year OPT_SURVMOD_LX_c.statistic_2-year
## [1,] 0.0009244992 0.0003384089
## [2,] 0.0161735932 0.0148009260
## [3,] 0.0255546730 0.0216369479
## [4,] -0.0042126971 -0.0032624571
## [5,] 0.0162421948 0.0122253280
## OPT_SURVMOD_LX_Slope.Scale OPT_SURVMOD_LX_HL.5bins_0.5-year
## [1,] -0.0001312901 5.278888
## [2,] -0.0002588117 9.905615
## [3,] -0.0005928405 -3.926131
## [4,] -0.0007639492 -1.033271
## [5,] -0.0002066949 8.053928
## OPT_SURVMOD_LX_HL.5bins_1-year OPT_SURVMOD_LX_HL.5bins_2-year
## [1,] 8.379692 2.553585
## [2,] 2.135301 5.087387
## [3,] -3.908318 -1.776280
## [4,] 1.727868 3.066327
## [5,] -3.731871 -3.647200
## OPT_SURVMOD_LX_HL.10bins_0.5-year OPT_SURVMOD_LX_HL.10bins_1-year
## [1,] 16.0496825 14.632183
## [2,] 19.8482130 10.017212
## [3,] 14.0459844 1.626581
## [4,] 0.1553928 2.026166
## [5,] 11.6627504 2.578462
## OPT_SURVMOD_LX_HL.10bins_2-year Errors
## [1,] 4.303047 0
## [2,] 16.390608 0
## [3,] 1.104856 0
## [4,] 5.686761 0
## [5,] 4.122471 0
After performing the bootstrapping, the apparent performance is corrected for the optimism. To do so, the following actions are performed for each statistic:
## Model L1-Time
# Extract all relevant columns from the 'out.bootstrap' object and calculate the mean over all iterations
O.Model.L1.Time <- colMeans(out.bootstrap[, str_detect(string=colnames(out.bootstrap), pattern="OPT_SURVMOD_L1")]);
# C-statistic
Dapp.Model.L1.Time.c <- Dapp.Model.L1.Time[str_detect(string=names(Dapp.Model.L1.Time), pattern="c.statistic")];
O.Model.L1.Time.c <- O.Model.L1.Time[str_detect(string=names(O.Model.L1.Time), pattern="c.statistic")];
rbind(Dapp.Model.L1.Time.c,
O.Model.L1.Time.c,
corrected = Dapp.Model.L1.Time.c - O.Model.L1.Time.c);
## c.statistic_0.5-year c.statistic_1-year
## Dapp.Model.L1.Time.c 0.65917752 0.63480879
## O.Model.L1.Time.c 0.01175686 0.01173835
## corrected 0.64742065 0.62307044
## c.statistic_2-year
## Dapp.Model.L1.Time.c 0.62872577
## O.Model.L1.Time.c 0.01208758
## corrected 0.61663820
# Slope
Dapp.Model.L1.Time.slope <- Dapp.Model.L1.Time[str_detect(string=names(Dapp.Model.L1.Time), pattern="Slope")];
O.Model.L1.Time.slope <- O.Model.L1.Time[str_detect(string=names(O.Model.L1.Time), pattern="Slope")];
rbind(Dapp.Model.L1.Time.slope, O.Model.L1.Time.slope, corrected = Dapp.Model.L1.Time.slope - O.Model.L1.Time.slope);
## Slope.Scale
## Dapp.Model.L1.Time.slope 0.9958962063
## O.Model.L1.Time.slope -0.0005810449
## corrected 0.9964772512
# GND HL
Dapp.Model.L1.Time.hl <- Dapp.Model.L1.Time[str_detect(string=names(Dapp.Model.L1.Time), pattern="HL")];
O.Model.L1.Time.hl <- O.Model.L1.Time[str_detect(string=names(O.Model.L1.Time), pattern="HL")];
rbind(Dapp.Model.L1.Time.hl, O.Model.L1.Time.hl, corrected = Dapp.Model.L1.Time.hl + O.Model.L1.Time.hl);
## HL.5bins_0.5-year HL.5bins_1-year HL.5bins_2-year
## Dapp.Model.L1.Time.hl 4.092919 4.336916 0.9944051
## O.Model.L1.Time.hl 4.637893 1.256029 2.9860626
## corrected 8.730812 5.592946 3.9804677
## HL.10bins_0.5-year HL.10bins_1-year HL.10bins_2-year
## Dapp.Model.L1.Time.hl 7.40606 10.672663 5.588542
## O.Model.L1.Time.hl 11.21292 6.928614 7.157294
## corrected 18.61898 17.601276 12.745836
1-c(pchisq(q=c(Dapp.Model.L1.Time.hl + O.Model.L1.Time.hl)[1:3], df=4), pchisq(q=c(Dapp.Model.L1.Time.hl + O.Model.L1.Time.hl)[4:6], df=9))
## HL.5bins_0.5-year HL.5bins_1-year HL.5bins_2-year
## 0.06819159 0.23167947 0.40865571
## HL.10bins_0.5-year HL.10bins_1-year HL.10bins_2-year
## 0.02863475 0.04009156 0.17444819
## Model L1-Event
# Extract all relevant columns from the 'out.bootstrap' object and calculate the mean over all iterations
O.Model.L1.Event <- colMeans(out.bootstrap[, str_detect(string=colnames(out.bootstrap), pattern="OPT_EVENTMOD_L1")]);
# C-statistic
Dapp.Model.L1.Event.c <- Dapp.Model.L1.Event[str_detect(string=names(Dapp.Model.L1.Event), pattern="C.Statistic")];
O.Model.L1.Event.c <- O.Model.L1.Event[str_detect(string=names(O.Model.L1.Event), pattern="C.Statistic")];
rbind(Dapp.Model.L1.Event.c, O.Model.L1.Event.c, corrected = Dapp.Model.L1.Event.c - O.Model.L1.Event.c);
## C.Statistic
## Dapp.Model.L1.Event.c 0.68958824
## O.Model.L1.Event.c 0.03202666
## corrected 0.65756159
# Slope
Dapp.Model.L1.Event.slope <- Dapp.Model.L1.Event[str_detect(string=names(Dapp.Model.L1.Event), pattern="Slope")];
O.Model.L1.Event.slope <- O.Model.L1.Event[str_detect(string=names(O.Model.L1.Event), pattern="Slope")];
rbind(Dapp.Model.L1.Event.slope, O.Model.L1.Event.slope, corrected = Dapp.Model.L1.Event.slope - O.Model.L1.Event.slope);
## Regression.Slope
## Dapp.Model.L1.Event.slope 0.9997424
## O.Model.L1.Event.slope 0.1796652
## corrected 0.8200772
# Brier
Dapp.Model.L1.Event.brier <- Dapp.Model.L1.Event[str_detect(string=names(Dapp.Model.L1.Event), pattern="Brier")];
O.Model.L1.Event.brier <- O.Model.L1.Event[str_detect(string=names(O.Model.L1.Event), pattern="Brier")];
rbind(Dapp.Model.L1.Event.brier, O.Model.L1.Event.brier, corrected = Dapp.Model.L1.Event.brier - O.Model.L1.Event.brier);
## Brier.Score
## Dapp.Model.L1.Event.brier 0.086749976
## O.Model.L1.Event.brier -0.002987089
## corrected 0.089737065
## Model LX-Time
# Extract all relevant columns from the 'out.bootstrap' object and calculate the mean over all iterations
O.Model.LX.Time <- colMeans(out.bootstrap[, str_detect(string=colnames(out.bootstrap), pattern="OPT_SURVMOD_LX")]);
# C-statistic
Dapp.Model.LX.Time.c <- Dapp.Model.LX.Time[str_detect(string=names(Dapp.Model.LX.Time), pattern="c.statistic")];
O.Model.LX.Time.c <- O.Model.LX.Time[str_detect(string=names(O.Model.LX.Time), pattern="c.statistic")];
rbind(Dapp.Model.LX.Time.c, O.Model.LX.Time.c, corrected = Dapp.Model.LX.Time.c - O.Model.LX.Time.c);
## c.statistic_0.5-year c.statistic_1-year
## Dapp.Model.LX.Time.c 0.71740538 0.69335005
## O.Model.LX.Time.c 0.01264523 0.01364425
## corrected 0.70476016 0.67970580
## c.statistic_2-year
## Dapp.Model.LX.Time.c 0.68444465
## O.Model.LX.Time.c 0.01450578
## corrected 0.66993887
# Slope
Dapp.Model.LX.Time.slope <- Dapp.Model.LX.Time[str_detect(string=names(Dapp.Model.LX.Time), pattern="Slope")];
O.Model.LX.Time.slope <- O.Model.LX.Time[str_detect(string=names(O.Model.LX.Time), pattern="Slope")];
rbind(Dapp.Model.LX.Time.slope, O.Model.LX.Time.slope, corrected = Dapp.Model.LX.Time.slope - O.Model.LX.Time.slope);
## Slope.Scale
## Dapp.Model.LX.Time.slope 0.9971058368
## O.Model.LX.Time.slope -0.0004747626
## corrected 0.9975805993
# GND HL
Dapp.Model.LX.Time.hl <- Dapp.Model.LX.Time[str_detect(string=names(Dapp.Model.LX.Time), pattern="HL")];
O.Model.LX.Time.hl <- O.Model.LX.Time[str_detect(string=names(O.Model.LX.Time), pattern="HL")];
rbind(Dapp.Model.LX.Time.hl, O.Model.LX.Time.hl, corrected = Dapp.Model.LX.Time.hl + O.Model.LX.Time.hl);
## HL.5bins_0.5-year HL.5bins_1-year HL.5bins_2-year
## Dapp.Model.LX.Time.hl 8.140954 1.5974814 4.9123090
## O.Model.LX.Time.hl 3.811204 -0.5324031 0.7049291
## corrected 11.952158 1.0650783 5.6172380
## HL.10bins_0.5-year HL.10bins_1-year HL.10bins_2-year
## Dapp.Model.LX.Time.hl 15.90331 6.629234 7.669232
## O.Model.LX.Time.hl 10.96412 4.661620 5.729678
## corrected 26.86743 11.290854 13.398910
1-c(pchisq(q=c(Dapp.Model.LX.Time.hl + O.Model.LX.Time.hl)[1:3], df=4), pchisq(q=c(Dapp.Model.LX.Time.hl + O.Model.LX.Time.hl)[4:6], df=9))
## HL.5bins_0.5-year HL.5bins_1-year HL.5bins_2-year
## 0.017710603 0.899772603 0.229614753
## HL.10bins_0.5-year HL.10bins_1-year HL.10bins_2-year
## 0.001471154 0.256298079 0.145371007
Now that all (survival) models have been fitted and correlated sets of model coefficients (parameters) are obtained by performing the boostrapping procedure, a probabilistic sensitivity analysis can be performed with the discrete event simulation. A base-case analysis can be performed by using the models that are estimated from the imputed datasets (Step 4: Multivariable Modeling of Survival). In order to validate the discrete event simulation, first some custom functions will be defined in Step 6.1, after which the validation is perfomed in Step 6.2. Hence, Step 6 exists of the following steps:
Several custom functions are defined to run the simulation and validate it. Please find an explanation of the functions in the R code below.
# The 'getSurvDataset.L1' and 'getSurvDataset.LX' functions do no longer need to also
# include the outcomes of the patients, as these are unknown for the simulation model
# since it needs to simulate them, so the shorter 'getData.L1' and 'getData.LX'
# functions are defined
getData.L1 <- function(data) {
data <- data %>%
## Single variables
# Arm.ChemBev = "ChemBev",
mutate(ChemBev = as.numeric(as.logical(L1.Arm.ChemBev))) %>%
# Age = "Age1 + Age2",
mutate(Age1 = Age) %>%
mutate(Age2 = Age1^2) %>%
# Hospital.Private = "Private.YN",
mutate(Private.YN = as.numeric(as.logical(Hospital.Private))) %>%
# Stage.Diagnosis = "Stage4.YN",
mutate(Stage4.YN = ifelse(as.character(Stage.Diagnosis)=="4", 1, 0)) %>%
# Primary.Site = "Site.Left + Site.Right + Site.Rectum",
mutate(Site.LeftRectum = ifelse(Primary.Site %in% c("Left","Rectum"), 1, 0)) %>%
mutate(Site.Right = ifelse(Primary.Site=="Right", 1, 0)) %>%
# Metastases.Number = "Metastases.2 + Metastases.3plus",
mutate(Metastases.2 = ifelse(as.numeric(as.character(Metastases.Number))==2, 1, 0)) %>%
mutate(Metastases.3plus = ifelse(as.numeric(as.character(Metastases.Number))>=3, 1, 0)) %>%
# Metastases.Liver = "Liver.YN",
mutate(Liver.YN = as.numeric(as.logical(Metastases.Liver))) %>%
# Metastases.Peritoneum = "Peritoneum.YN",
mutate(Peritoneum.YN = as.numeric(as.logical(Metastases.Peritoneum))) %>%
# L1.ECOG = "ECOG.1 + ECOG.23",
mutate(ECOG.1 = ifelse(as.numeric(as.character(L1.ECOG))==1, 1, 0)) %>%
mutate(ECOG.23 = ifelse(as.numeric(as.character(L1.ECOG))>=2, 1, 0)) %>%
# Charlson = "Charlson.1 + Charlson.2 + Charlson.3plus",
mutate(Charlson.1plus = ifelse(as.numeric(as.character(Charlson))>=1, 1, 0)) %>%
# CEA.Level = "CEA1 + CEA2",
mutate(CEA.200plus = ifelse(as.numeric(as.character(Metastases.Number))<=2, 1, 0) * ifelse(as.numeric(CEA.Level)>=200, 1, 0)) %>%
# RAS.Mutated = "MutantRAS.YN",
mutate(MutantRAS.YN = as.numeric(as.logical(RAS.Mutated))) %>%
# BRAF.Mutated = "MutantBRAF.YN",
mutate(MutantBRAF.YN = as.numeric(as.logical(BRAF.Mutated))) %>%
## Interactions
mutate(Private.YN.Int = Private.YN * ChemBev) %>%
mutate(Site.LeftRectum.Int = Site.LeftRectum * ChemBev) %>%
mutate(Site.Right.Int = Site.Right * ChemBev) %>%
mutate(ECOG.1.Int = ECOG.1 * ChemBev) %>%
mutate(ECOG.23.Int = ECOG.23 * ChemBev) %>%
mutate(CEA.200plus.Int = CEA.200plus * ChemBev)
# Only the newly created columns need to be returned
return(select(data, ChemBev:CEA.200plus.Int))
}
getData.LX <- function(data) {
data <- data %>%
## Single variables
# L1.PFS.Time = "PFS.Time",
mutate(PFS.Time = log(L1.PFS.Time)) %>%
# Arm.ChemBev = "ChemBev",
mutate(ChemBev = as.numeric(as.logical(L1.Arm.ChemBev))) %>%
# Age = "Age1 + Age2",
mutate(Age1 = as.numeric(Age)-mean(Age)) %>%
mutate(Age2 = as.numeric(Age1^2)) %>%
# Hospital.Private = "Private.YN",
mutate(Private.YN = as.numeric(as.logical(Hospital.Private))) %>%
# Stage.Diagnosis = "Stage4.YN",
mutate(Stage4.YN = ifelse(as.numeric(Stage.Diagnosis)==4, 1, 0)) %>%
# Primary.Site = "Site.Left + Site.Right + Site.Rectum",
mutate(Site.LeftRectum = ifelse(Primary.Site %in% c("Left","Rectum"), 1, 0)) %>%
mutate(Site.Right = ifelse(Primary.Site=="Right", 1, 0)) %>%
# Metastases.Peritoneum = "Peritoneum.YN",
mutate(Peritoneum.YN = as.numeric(as.logical(Metastases.Peritoneum))) %>%
# L1.ECOG = "ECOG.1 + ECOG.23",
mutate(ECOG.1 = ifelse(as.numeric(as.character(L1.ECOG))==1, 1, 0)) %>%
mutate(ECOG.23 = ifelse(as.numeric(as.character(L1.ECOG))>=2, 1, 0)) %>%
# Charlson = "Charlson.3plus.YN",
mutate(Charlson.1plus.YN = ifelse(as.numeric(Charlson)>=1, 1, 0)) %>%
# RAS.Mutated = "MutantRAS.YN",
mutate(MutantRAS.YN = as.numeric(as.logical(RAS.Mutated))) %>%
# BRAF.Mutated = "MutantBRAF.YN",
mutate(MutantBRAF.YN = as.numeric(as.logical(BRAF.Mutated))) %>%
## Interactions
mutate(PFS.Time.Int = PFS.Time * ChemBev) %>%
mutate(Age1.Int = Age1 * ChemBev) %>%
mutate(Age2.Int = Age2 * ChemBev) %>%
mutate(Stage4.YN.Int = Stage4.YN * ChemBev) %>%
mutate(Peritoneum.YN.Int = Peritoneum.YN * ChemBev) %>%
mutate(ECOG.1.Int = ECOG.1 * ChemBev) %>%
mutate(ECOG.23.Int = ECOG.23 * ChemBev) %>%
mutate(MutantRAS.YN.Int = MutantRAS.YN * ChemBev) %>%
mutate(MutantBRAF.YN.Int = MutantBRAF.YN * ChemBev)
# Only the newly created columns need to be returned
return(select(data, PFS.Time:MutantBRAF.YN.Int))
}
# The 'predictSurvmod' and 'predictLogreg' functions are re-written so that they can
# predict the outcomes in different formats in stead of having different functions for
# different formats. The new functions are 'predSurv' and 'predLogr'.
predSurv <- function(model, newdata, type="random") {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Ge the Weibull Shape parameter from the model
Shape <- 1/exp(model["Log(Scale)"]);
model <- model[-which(names(model) == "Log(Scale)")];
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Get the scale parameter for the survival model
Scale <- exp(newdata.sub %*% model);
if(type=="random") {
# Predict the times
p <- matrix(runif(nrow(newdata.sub)), ncol=1);
t <- qweibull(p=p, shape=Shape, scale=Scale);
# Return predicted times
return(as.numeric(t));
} else if(type=="response") {
# Return scale parameter
return(Scale)
}
}
predLogr <- function(model, newdata) {
# Add a dummy column for the intercept
newdata <- cbind(newdata, '(Intercept)'=1);
# Subset a dataframe to enable matrix multiplication and check
newdata.sub <- as.matrix(newdata[, names(model), drop=F]);
# Calculate the probabilities
p.tmp <- exp(newdata.sub %*% model);
p <- p.tmp / (p.tmp + 1);
# Return probabilities
return(as.vector(p));
}
# The 'runDES' function performs one run with the DES based on a specific patient
# (sub)population and a set of model parameters for each (survival) model
runDES <- function(patients, nsim, PFS.Time.Model, PFS.Event.Model, OS.Time.Model) {
## INITIALIZATION
# Sample from patients
df <- sample_n(patients, size=nsim, replace=T);
## PART 1: SIMULATE WHETHER PROGRESSION OR DEATH OCCURS
# Put data in right format
df.PFS <- getData.L1(data=df);
# Predict time to progression or death
df.PFS <- df.PFS %>% mutate(Time = predSurv(model=PFS.Time.Model, newdata=df.PFS))%>%
mutate(Time1 = Time) %>%
mutate(Time2 = Time^2) %>%
mutate(Time1.Int = Time1 * ChemBev) %>%
mutate(Time2.Int = Time2 * ChemBev);
# Predict which event is associated with the time
df.PFS <- df.PFS %>%
mutate(PFS.Prob = predLogr(model=PFS.Event.Model, newdata=df.PFS)) %>%
mutate(L1.PFS.Event = ifelse(runif(nrow(df.PFS)) < PFS.Prob, "Death", "Progression"));
# Save to original patient object
df.PFS <- mutate(df.PFS, L1.PFS.Time = Time);
df.PFS <- select(df.PFS, L1.PFS.Time, L1.PFS.Event);
df <- cbind(df, df.PFS);
## PART 2: SIMULATE TIME TILL DEATH FOR THOSE WHO PROGRESSED
# Put data in right format
df.OS <- getData.LX(data=df);
# Predict time to death
df <- df %>% mutate(OS.Time = ifelse(L1.PFS.Event=="Death", NA, predSurv(model=OS.Time.Model, newdata=df.OS)))
# Rename variables
df <- df %>%
mutate(OS.Time = as.numeric(OS.Time)) %>%
mutate(PFS.Time = L1.PFS.Time) %>%
mutate(PFS.Event = L1.PFS.Event) %>%
mutate(PFS = PFS.Time) %>%
mutate(OS = PFS.Time + ifelse(is.na(OS.Time), 0, OS.Time));
# Return simulation data.frame
return(df)
}
# The 'predictKM' extracts a survival probabilty at a specific timepoint 't' from a fitted Kaplan-Meier
# object 'km' using a stepwise function
predictKM <- function(km, t) {
km.fun <- stepfun(km$time, c(1, km$surv));
pred <- km.fun(t);
return(pred)
}
# The 'validateDES' function performs a probabilistic sensitivity analysis for the DES and a specific
# patient (sub)population, and plots the validation plots and median time-to-event outcomes
validateDES <- function(output=NULL, patients=NULL, patients.times=NULL, nsim=NULL, nrun=NULL, models=NULL, timemax=c(2000,2500,2000), timestep=10, freecores=0, seed=20181212) {
# If the result of a previous 'validateDES' call is provide, the probabilistic sensitivity analysis
# is not performed again, but only the plots etc. are generated.
runsim <- ifelse(is.null(output), T, F);
# The 'timemax' and 'timestep' inputs are used to determine for what time horizon the plots should ne generated
times <- list(seq(from=0, to=timemax[1], by=timestep),
seq(from=0, to=timemax[2], by=timestep),
seq(from=0, to=timemax[3], by=timestep));
# Only if a simulation is to be performed, this will be done
if(runsim) {
# Check whether sufficient sets of parameters are provided to perform the requested number of runs
if(nrun>nrow(models)) stop(str_c("nrun can be max ",nrow(models), "!"));
# Set the random number seed for reproduction
set.seed(seed);
# Set up a multithreaded computation cluster
library(parallel);
library(doSNOW);
cl <- makeCluster(detectCores()-freecores);
clusterExport(cl, c("times", "patients", "patients.times",
"runDES", "predSurv", "predLogr",
"getData.L1", "getData.LX",
"predictKM"));
# Perform the simulation
sim <- parLapply(cl, 1:nrun, function(i.run) {
# Load the required packages
library(dplyr);
library(tidyr);
library(stringr);
library(survival);
# Obtain the coefficients for the (survival) models
coefs <- models[i.run, ];
# Model L1-Time
survmod.L1 <- coefs[str_detect(names(coefs), "SURVMOD_L1_")];
names(survmod.L1) <- str_remove(names(survmod.L1), "SURVMOD_L1_");
# Model L1-Event
eventmod.L1 <- coefs[str_detect(names(coefs), "EVENTMOD_L1_")];
names(eventmod.L1) <- str_remove(names(eventmod.L1), "EVENTMOD_L1_");
# Model LX-Time
survmod.LX <- coefs[str_detect(names(coefs), "SURVMOD_LX_")];
names(survmod.LX) <- str_remove(names(survmod.LX), "SURVMOD_LX_");
# Simulation for the ChemOnly (control) strategy
sim.chemonly <- runDES(patients = filter(patients, L1.Arm.ChemBev==F), nsim = nsim, PFS.Time.Model = survmod.L1, PFS.Event.Model = eventmod.L1, OS.Time.Model = survmod.LX);
# Simulation for the ChemBev (experimental) strategy
sim.chembev <- runDES(patients = filter(patients, L1.Arm.ChemBev==T), nsim = nsim, PFS.Time.Model = survmod.L1, PFS.Event.Model = eventmod.L1, OS.Time.Model = survmod.LX);
# Fit Kaplan-Meier objects for all time-to-events and both strategies to generate the plots at a later stage
km.L1.PFS.sim.chemonly <- survfit(Surv(PFS.Time, PFS.Event=="Progression") ~ 1, data=sim.chemonly, type="kaplan-meier");
km.L1.PFS.sim.chembev <- survfit(Surv(PFS.Time, PFS.Event=="Progression") ~ 1, data=sim.chembev, type="kaplan-meier");
km.L1.OS.sim.chemonly <- survfit(Surv(PFS.Time, PFS.Event=="Death") ~ 1, data=sim.chemonly, type="kaplan-meier");
km.L1.OS.sim.chembev <- survfit(Surv(PFS.Time, PFS.Event=="Death") ~ 1, data=sim.chembev, type="kaplan-meier");
km.LX.sim.chemonly <- survfit(Surv(OS.Time) ~ 1, data=sim.chemonly, type="kaplan-meier");
km.LX.sim.chembev <- survfit(Surv(OS.Time) ~ 1, data=sim.chembev, type="kaplan-meier");
# Use the custom 'predictKM' function to get the curves for the defined timeframe
plot.L1.PFS.sim.chemonly <- predictKM(km = km.L1.PFS.sim.chemonly, t = times[[1]]);
plot.L1.PFS.sim.chembev <- predictKM(km = km.L1.PFS.sim.chembev, t = times[[1]]);
plot.L1.OS.sim.chemonly <- predictKM(km = km.L1.OS.sim.chemonly, t = times[[2]]);
plot.L1.OS.sim.chembev <- predictKM(km = km.L1.OS.sim.chembev, t = times[[2]]);
plot.LX.sim.chemonly <- predictKM(km = km.LX.sim.chemonly, t = times[[3]]);
plot.LX.sim.chembev <- predictKM(km = km.LX.sim.chembev, t = times[[3]]);
# Extract the median time-to-events
median.L1.PFS.sim.chemonly <- summary(km.L1.PFS.sim.chemonly)$table['median'];
median.L1.PFS.sim.chembev <- summary(km.L1.PFS.sim.chembev)$table['median'];
median.L1.OS.sim.chemonly <- summary(km.L1.OS.sim.chemonly)$table['median'];
median.L1.OS.sim.chembev <- summary(km.L1.OS.sim.chembev)$table['median'];
median.LX.sim.chemonly <- summary(km.LX.sim.chemonly)$table['median'];
median.LX.sim.chembev <- summary(km.LX.sim.chembev)$table['median'];
# Extract the event incidences from the simulation objects
prob.L1.death.chemonly <- mean(sim.chemonly$PFS.Event=="Death");
prob.L1.death.chembev <- mean(sim.chembev$PFS.Event=="Death");
# Return all objects in a list because of the different formats
list(km = list(km.L1.PFS.sim.chemonly = plot.L1.PFS.sim.chemonly, km.L1.PFS.sim.chembev = plot.L1.PFS.sim.chembev, km.L1.OS.sim.chemonly = plot.L1.OS.sim.chemonly, km.L1.OS.sim.chembev = plot.L1.OS.sim.chembev, km.LX.sim.chemonly = plot.LX.sim.chemonly, km.LX.sim.chembev = plot.LX.sim.chembev), median = c(median.L1.PFS.sim.chemonly = median.L1.PFS.sim.chemonly, median.L1.PFS.sim.chembev = median.L1.PFS.sim.chembev, median.L1.OS.sim.chemonly = median.L1.OS.sim.chemonly, median.L1.OS.sim.chembev = median.L1.OS.sim.chembev, median.LX.sim.chemonly = median.LX.sim.chemonly, median.LX.sim.chembev = median.LX.sim.chembev), p = c(prob.L1.death.chemonly = prob.L1.death.chemonly, prob.L1.death.chembev = prob.L1.death.chembev))
})
# Stop the computation cluster
stopCluster(cl);
# If no simulation is to be performed because a previous outcome object is provide,
# the correct objects need to be extracted from the object.
} else {
# Kaplan-Meier curves etc.
sim <- output$sim;
# Observed data
patients <- output$patients;
patients.times <- output$patients.times;
}
# Transform all Kaplan-Meier curves into one matrix
km.L1.PFS.sim.chemonly <- sapply(sim, function(s) s$km$km.L1.PFS.sim.chemonly);
km.L1.PFS.sim.chembev <- sapply(sim, function(s) s$km$km.L1.PFS.sim.chembev);
km.L1.OS.sim.chemonly <- sapply(sim, function(s) s$km$km.L1.OS.sim.chemonly);
km.L1.OS.sim.chembev <- sapply(sim, function(s) s$km$km.L1.OS.sim.chembev);
km.LX.sim.chemonly <- sapply(sim, function(s) s$km$km.LX.sim.chemonly);
km.LX.sim.chembev <- sapply(sim, function(s) s$km$km.LX.sim.chembev);
# Calculate the mean curves and the corresponding 95% Confidence Intervals
km.L1.PFS.sim.chemonly.curves <- apply(km.L1.PFS.sim.chemonly, 1, function(row) c(low = quantile(row, 0.025), mean = mean(row), high = quantile(row, 0.975)))
km.L1.PFS.sim.chembev.curves <- apply(km.L1.PFS.sim.chembev, 1, function(row) c(low = quantile(row, 0.025), mean = mean(row), high = quantile(row, 0.975)))
km.L1.OS.sim.chemonly.curves <- apply(km.L1.OS.sim.chemonly, 1, function(row) c(low = quantile(row, 0.025), mean = mean(row), high = quantile(row, 0.975)))
km.L1.OS.sim.chembev.curves <- apply(km.L1.OS.sim.chembev, 1, function(row) c(low = quantile(row, 0.025), mean = mean(row), high = quantile(row, 0.975)))
km.LX.sim.chemonly.curves <- apply(km.LX.sim.chemonly, 1, function(row) c(low = quantile(row, 0.025), mean = mean(row), high = quantile(row, 0.975)))
km.LX.sim.chembev.curves <- apply(km.LX.sim.chembev, 1, function(row) c(low = quantile(row, 0.025), mean = mean(row), high = quantile(row, 0.975)))
# Kaplan-Meier curves for the observed data from the 'patients' input object
km.L1.PFS.obs.chemonly <- survfit(Surv(L1.PFS.Time, L1.PFS.Events=="Progression")~1, data=filter(patients.times, L1.Arm.ChemBev==F), type="kaplan-meier");
km.L1.PFS.obs.chembev <- survfit(Surv(L1.PFS.Time, L1.PFS.Events=="Progression")~1, data=filter(patients.times, L1.Arm.ChemBev==T), type="kaplan-meier");
km.L1.OS.obs.chemonly <- survfit(Surv(L1.PFS.Time, L1.PFS.Events=="Death")~1, data=filter(patients.times, L1.Arm.ChemBev==F), type="kaplan-meier");
km.L1.OS.obs.chembev <- survfit(Surv(L1.PFS.Time, L1.PFS.Events=="Death")~1, data=filter(patients.times, L1.Arm.ChemBev==T), type="kaplan-meier");
km.LX.obs.chemonly <- survfit(Surv(LX.OS.Time, LX.OS.Events=="Death")~1, data=filter(patients.times, L1.Arm.ChemBev==F), type="kaplan-meier");
km.LX.obs.chembev <- survfit(Surv(LX.OS.Time, LX.OS.Events=="Death")~1, data=filter(patients.times, L1.Arm.ChemBev==T), type="kaplan-meier");
# Plots
par(mfrow=c(1,3));
lwds <- c(2,2,2,2);
cols <- c("black", "red", "green", "blue");
plot(main=str_c("First-Line Progression (Event is Progression)\n", "(ChemOnly=",sum(km.L1.PFS.obs.chemonly$n.event),", ChemBev=",sum(km.L1.PFS.obs.chembev$n.event),")"), xlab="Time to Progression (Days)", km.L1.PFS.obs.chemonly, col=cols[1], lwd=1, conf.int=T, las=1, xlim=c(0,timemax[[1]]));
lines(km.L1.PFS.obs.chemonly, col=cols[1], lwd=lwds[1], conf.int=F);
lines(km.L1.PFS.obs.chembev, col=cols[2], lwd=1, conf.int=T);
lines(km.L1.PFS.obs.chembev, col=cols[2], lwd=lwds[2], conf.int=F);
lines(x=times[[1]], y=km.L1.PFS.sim.chemonly.curves[2,], col=cols[3], lwd=lwds[3]);
lines(x=times[[1]], y=km.L1.PFS.sim.chemonly.curves[1,], col=cols[3], lwd=1, lty=2);
lines(x=times[[1]], y=km.L1.PFS.sim.chemonly.curves[3,], col=cols[3], lwd=1, lty=2);
lines(x=times[[1]], y=km.L1.PFS.sim.chembev.curves[2,], col=cols[4], lwd=lwds[4]);
lines(x=times[[1]], y=km.L1.PFS.sim.chembev.curves[1,], col=cols[4], lwd=1, lty=2);
lines(x=times[[1]], y=km.L1.PFS.sim.chembev.curves[3,], col=cols[4], lwd=1, lty=2);
legend("topright", c("ChemOnly (obs)", "ChemBev (obs)", "ChemOnly (sim)", "ChemBev (sim)"), col=cols, lwd=lwds, lty=1, bty="n")
plot(main=str_c("First-Line Progression (Event is Death)\n", "(ChemOnly=",sum(km.L1.OS.obs.chemonly$n.event),", ChemBev=",sum(km.L1.OS.obs.chembev$n.event),")"), xlab="Time to Death (Days)", km.L1.OS.obs.chemonly, col=cols[1], lwd=1, conf.int=T, las=1, xlim=c(0,timemax[[2]]));
lines(km.L1.OS.obs.chemonly, col=cols[1], lwd=lwds[1], conf.int=F);
lines(km.L1.OS.obs.chembev, col=cols[2], lwd=1, conf.int=T);
lines(km.L1.OS.obs.chembev, col=cols[2], lwd=lwds[2], conf.int=F);
lines(x=times[[2]], y=km.L1.OS.sim.chemonly.curves[2,], col=cols[3], lwd=lwds[3]);
lines(x=times[[2]], y=km.L1.OS.sim.chemonly.curves[1,], col=cols[3], lwd=1, lty=2);
lines(x=times[[2]], y=km.L1.OS.sim.chemonly.curves[3,], col=cols[3], lwd=1, lty=2);
lines(x=times[[2]], y=km.L1.OS.sim.chembev.curves[2,], col=cols[4], lwd=lwds[4]);
lines(x=times[[2]], y=km.L1.OS.sim.chembev.curves[1,], col=cols[4], lwd=1, lty=2);
lines(x=times[[2]], y=km.L1.OS.sim.chembev.curves[3,], col=cols[4], lwd=1, lty=2);
legend("topright", c("ChemOnly (obs)", "ChemBev (obs)", "ChemOnly (sim)", "ChemBev (sim)"), col=cols, lwd=lwds, lty=1, bty="n")
plot(main=str_c("Death after Progression\n", "(ChemOnly=",sum(km.LX.obs.chemonly$n.event),", ChemBev=",sum(km.LX.obs.chembev$n.event),")"), xlab="Time to Death (Days)", km.LX.obs.chemonly, col=cols[1], lwd=1, conf.int=T, las=1, xlim=c(0,timemax[[3]]));
lines(km.LX.obs.chemonly, col=cols[1], lwd=lwds[1], conf.int=F);
lines(km.LX.obs.chembev, col=cols[2], lwd=1, conf.int=T);
lines(km.LX.obs.chembev, col=cols[2], lwd=lwds[2], conf.int=F);
lines(x=times[[3]], y=km.LX.sim.chemonly.curves[2,], col=cols[3], lwd=lwds[3]);
lines(x=times[[3]], y=km.LX.sim.chemonly.curves[1,], col=cols[3], lwd=1, lty=2);
lines(x=times[[3]], y=km.LX.sim.chemonly.curves[3,], col=cols[3], lwd=1, lty=2);
lines(x=times[[3]], y=km.LX.sim.chembev.curves[2,], col=cols[4], lwd=lwds[4]);
lines(x=times[[3]], y=km.LX.sim.chembev.curves[1,], col=cols[4], lwd=1, lty=2);
lines(x=times[[3]], y=km.LX.sim.chembev.curves[3,], col=cols[4], lwd=1, lty=2);
legend("topright", c("ChemOnly (obs)", "ChemBev (obs)", "ChemOnly (sim)", "ChemBev (sim)"), col=cols, lwd=lwds, lty=1, bty="n")
# Extract Median
medians.sim <- sapply(sim, function(s) s$median);
medians.sim.low <- apply(medians.sim, 1, quantile, probs=0.025, na.rm=T);
medians.sim.mean <- apply(medians.sim, 1, mean, na.rm=T);
medians.sim.high <- apply(medians.sim, 1, quantile, probs=0.975, na.rm=T);
print(cbind(medians.sim.low, medians.sim.mean, medians.sim.high))
print(km.L1.PFS.obs.chemonly);
print(km.L1.PFS.obs.chembev);
print(km.L1.OS.obs.chemonly);
print(km.L1.OS.obs.chembev);
print(km.LX.obs.chemonly);
print(km.LX.obs.chembev);
# Probability of death
prob.sim <- sapply(sim, function(s) s$p);
prob.sim.low <- apply(prob.sim, 1, quantile, probs=0.025);
prob.sim.mean <- apply(prob.sim, 1, mean);
prob.sim.high <- apply(prob.sim, 1, quantile, probs=0.975);
print(cbind(prob.sim.low, prob.sim.mean, prob.sim.high))
count.obs <- table(patients.times$L1.PFS.Events, patients.times$L1.Arm.ChemBev);
prob.obs <- apply(count.obs, 2, function(col) col/sum(col));
print(prob.obs)
# Only return the results if a new simulation is performed
if(runsim) return(list(sim=sim, patients=patients, patients.times=patients.times));
}
After the custom functions have been defined, the validation of the discrete event simulation can be performed. Unfortunately, the numerical results of the validation cannot be presented in a nicer way because they are printed from without the ‘validateDES’ function.
# The sets of correlated model parameters need to be extracted from the 'out.bootstrap' object
psa.models <- out.bootstrap[, !str_detect(string=colnames(out.bootstrap), pattern="BOOT")];
psa.models <- psa.models[, !str_detect(string=colnames(psa.models), pattern="ORG")];
psa.models <- psa.models[, !str_detect(string=colnames(psa.models), pattern="OPT")];
# Only select the columns that are required to perform a simulation
dataset <- datasets[[1]];
patients <- dataset %>%
select(L1.Arm.ChemBev, Age, Hospital.Private, Primary.Site, Stage.Diagnosis, Metastases.Number, Metastases.Liver, Metastases.Peritoneum, L1.ECOG, Charlson, CEA.Level, RAS.Mutated, BRAF.Mutated);
# Additionally provide the observed time-to-events for validation purposes
patients.times <- datasets[[1]] %>%
select(L1.PFS.Time, L1.PFS.Events, LX.OS.Time, LX.OS.Events, L1.Arm.ChemBev, Age, Hospital.Private, Primary.Site, Stage.Diagnosis, Metastases.Number, Metastases.Liver, Metastases.Peritoneum, L1.ECOG, Charlson, CEA.Level, RAS.Mutated, BRAF.Mutated);
# Call the 'validateDES' function to perform the simulations and generate all plots
png(file="Validation DES 20190201.png", width=12, height=6, units="in", res=96);
val.cohort <- validateDES(patients = patients, patients.times = patients.times, nsim = 20000, nrun = 10000, models = psa.models);
## medians.sim.low medians.sim.mean
## median.L1.PFS.sim.chemonly.median 171.4950 198.8957
## median.L1.PFS.sim.chembev.median 289.1615 307.8619
## median.L1.OS.sim.chemonly.median 724.7450 1606.9122
## median.L1.OS.sim.chembev.median 1013.3768 1171.1054
## median.LX.sim.chemonly.median 161.1872 199.6297
## median.LX.sim.chembev.median 242.6702 266.4501
## medians.sim.high
## median.L1.PFS.sim.chemonly.median 228.2406
## median.L1.PFS.sim.chembev.median 327.7214
## median.L1.OS.sim.chemonly.median 2623.2981
## median.L1.OS.sim.chembev.median 1392.2564
## median.LX.sim.chemonly.median 241.8250
## median.LX.sim.chembev.median 291.9795
## Call: survfit(formula = Surv(L1.PFS.Time, L1.PFS.Events == "Progression") ~
## 1, data = filter(patients.times, L1.Arm.ChemBev == F), type = "kaplan-meier")
##
## n events median 0.95LCL 0.95UCL
## 191 157 171 155 210
## Call: survfit(formula = Surv(L1.PFS.Time, L1.PFS.Events == "Progression") ~
## 1, data = filter(patients.times, L1.Arm.ChemBev == T), type = "kaplan-meier")
##
## n events median 0.95LCL 0.95UCL
## 676 571 304 280 329
## Call: survfit(formula = Surv(L1.PFS.Time, L1.PFS.Events == "Death") ~
## 1, data = filter(patients.times, L1.Arm.ChemBev == F), type = "kaplan-meier")
##
## n events median 0.95LCL 0.95UCL
## 191 21 NA NA NA
## Call: survfit(formula = Surv(L1.PFS.Time, L1.PFS.Events == "Death") ~
## 1, data = filter(patients.times, L1.Arm.ChemBev == T), type = "kaplan-meier")
##
## n events median 0.95LCL 0.95UCL
## 676 62 1055 902 NA
## Call: survfit(formula = Surv(LX.OS.Time, LX.OS.Events == "Death") ~
## 1, data = filter(patients.times, L1.Arm.ChemBev == F), type = "kaplan-meier")
##
## 34 observations deleted due to missingness
## n events median 0.95LCL 0.95UCL
## 157 141 193 153 243
## Call: survfit(formula = Surv(LX.OS.Time, LX.OS.Events == "Death") ~
## 1, data = filter(patients.times, L1.Arm.ChemBev == T), type = "kaplan-meier")
##
## 105 observations deleted due to missingness
## n events median 0.95LCL 0.95UCL
## 571 467 270 237 308
## prob.sim.low prob.sim.mean prob.sim.high
## prob.L1.death.chemonly 0.07064875 0.1163047 0.16565
## prob.L1.death.chembev 0.07859750 0.1027043 0.12870
##
## FALSE TRUE
## Censored 0.06806283 0.06360947
## Death 0.10994764 0.09171598
## Progression 0.82198953 0.84467456
dev.off();
## png
## 2
After the discrete event simulation has been validated, an exploratory analysis is performed in which patients’ treatment is targeted using personalized predictions of their first-line progression-free survival using Model L1-Time. To achieve this, Step 7 exists out of the following four steps:
Several custom functions are defined to run the simulation. Also, some earlier defined functions are re-written as they do not need to be as extensive as they needed to be for the data analysis. Please find an explanation of the functions in the R code below.
# The 'runPSA' function is a wrapper function that repeatedly calls the 'runDES' function
# for a patient (sub)population and different sets of parameter values
runPSA <- function(patients=NULL, nsim=NULL, nrun=NULL, models=NULL, freecores=0, seed=20181212) {
# Check whether sufficient sets of parameters are provided to perform the requested number of runs
if(nrun>nrow(models)) stop(str_c("nrun can be max ",nrow(models), "!"));
# Set the random sumber seed for reproduction
set.seed(seed);
# Multithreading
library(parallel);
library(doSNOW);
cl <- makeCluster(detectCores()-freecores);
clusterExport(cl, c("patients",
"runDES", "predSurv", "predLogr",
"getData.L1", "getData.LX"));
# Perform the probabilistic sensitivity analysis
psa.out <- parSapply(cl, 1:nrun, function(i.run) {
# Load packages
library(dplyr);
library(tidyr);
library(stringr);
library(survival);
# Extract the model coefficients from the 'models' object with all sets
coefs <- models[i.run, ];
# Model L1-Time
survmod.L1 <- coefs[str_detect(names(coefs), "SURVMOD_L1_")];
names(survmod.L1) <- str_remove(names(survmod.L1), "SURVMOD_L1_");
# Model L1-Event
eventmod.L1 <- coefs[str_detect(names(coefs), "EVENTMOD_L1_")];
names(eventmod.L1) <- str_remove(names(eventmod.L1), "EVENTMOD_L1_");
# Model LX-Time
survmod.LX <- coefs[str_detect(names(coefs), "SURVMOD_LX_")];
names(survmod.LX) <- str_remove(names(survmod.LX), "SURVMOD_LX_");
# Run the DES
sim <- runDES(patients = patients,
nsim = nsim,
PFS.Time.Model = survmod.L1,
PFS.Event.Model = eventmod.L1,
OS.Time.Model = survmod.LX);
# For now we are only interested in the median first-line progression-free survival
median(sim$PFS.Time)
})
# Stop the multithreading
stopCluster(cl);
# Return the median first-line progression-free survival outcomes
return(psa.out)
}
To perform a simulation to estimate the impact of personalized treatment decisions, first it needs to be determined what is expected to be the best strategy (ChemOnly or ChemBev) for each patient individually. To make these predictions, Model L1-Time is used to make a prediction whether first-line progression-free survival is expected to be higher for the ChemOnly strategy or for the ChemBev strategy.
# Transform the data to the required format
patients.chemonly <- getData.L1(data = patients %>% mutate(L1.Arm.ChemBev=F));
patients.chembev <- getData.L1(data = patients %>% mutate(L1.Arm.ChemBev=T));
# Determine the Weibull scale parameter for each patient individually (a higher value for the scale
# parameter indicates a higher time-to-event, i.e. better first-line progression-free survival)
response.chemonly <- predSurv(model = Model.L1.Time$coef, newdata = patients.chemonly, type = "response");
response.chembev <- predSurv(model = Model.L1.Time$coef, newdata = patients.chembev, type = "response");
# Set the treatment strategy indicator 'L1.Arm.ChemBev' to the treatment strategy associated with
# the highest expected first-line progression-free survival
patients <- patients %>%
mutate(L1.Arm.ChemBev = if_else(response.chembev > response.chemonly, T, F));
# Observe how the treatment were provided as observed in the data
table(dataset$L1.Arm.ChemBev);
##
## FALSE TRUE
## 191 676
# # Observe how the treatment are provided according to the personalized treatment decisions
table(patients$L1.Arm.ChemBev);
##
## FALSE TRUE
## 74 793
# Number of different treatment decisions
n.differenttreatment <- sum(ifelse(dataset$L1.Arm.ChemBev != patients$L1.Arm.ChemBev, 1, 0));
n.differenttreatment;
## [1] 219
n.differenttreatment / nrow(dataset);
## [1] 0.2525952
# Number who swithed from ChemOnly to ChemBev
n.ChemOnly.ChemBev <- sum(ifelse(!dataset$L1.Arm.ChemBev & patients$L1.Arm.ChemBev, 1, 0));
n.ChemOnly.ChemBev;
## [1] 168
n.ChemOnly.ChemBev / n.differenttreatment;
## [1] 0.7671233
# Number who swithed from ChemBev to ChemOnly
n.ChemBev.ChemOnly <- sum(ifelse(dataset$L1.Arm.ChemBev & !patients$L1.Arm.ChemBev, 1, 0));
n.ChemBev.ChemOnly;
## [1] 51
n.ChemBev.ChemOnly / n.differenttreatment;
## [1] 0.2328767
After determining the treatment each patient should receive based on the personalized treatment decisions informed by the estimated first-line progression-free survival, a probabilistic sensitivity analysis can be performed to estimate the impact.
# Perform a simulation for the whole patient population
sim.out <- runPSA(patients = patients, nsim = 20000, nrun = 10000, models = psa.models, freecores = 0, seed = 20190104)
# Determine which patients switched treatment
switched <- which(dataset$L1.Arm.ChemBev != patients$L1.Arm.ChemBev);
patients <- patients[switched, , drop=F];
# Perform a simulation only for the 219 patients who switched treatment
sim.out.switched <- runPSA(patients = patients, nsim = 20000, nrun = 10000, models = psa.models, freecores = 0, seed = 20190104)
The final step of this study is to assess the impact of this exploratory analysis based on personalized treatment decisions.
## For the complete patient population
# Mean median PFS and upper and lower bound over the probabilistic sensitivity analysis
mean(sim.out);
## [1] 288.0516
quantile(sim.out, probs = c(0.025, 0.975));
## 2.5% 97.5%
## 270.2137 306.6571
# Median PFS as observed from the data
survfit(formula = Surv(L1.PFS.Time, L1.PFS.Event) ~ 1, data = dataset, type = "kaplan-meier");
## Call: survfit(formula = Surv(L1.PFS.Time, L1.PFS.Event) ~ 1, data = dataset,
## type = "kaplan-meier")
##
## n events median 0.95LCL 0.95UCL
## 867 811 265 248 280
## For the patients who switched treatment based on the personalized treatment decision
# Mean median PFS and upper and lower bound over the probabilistic sensitivity analysis
mean(sim.out.switched);
## [1] 269.4719
quantile(sim.out.switched, probs = c(0.025, 0.975));
## 2.5% 97.5%
## 246.3946 293.4646
# Median PFS as observed from the data
survfit(formula = Surv(L1.PFS.Time, L1.PFS.Event) ~ 1, data = dataset[switched, ], type = "kaplan-meier");
## Call: survfit(formula = Surv(L1.PFS.Time, L1.PFS.Event) ~ 1, data = dataset[switched,
## ], type = "kaplan-meier")
##
## n events median 0.95LCL 0.95UCL
## 219 205 175 156 199