Skip to content
Mammary Cancer Survival Analysis
  • AI Chat
  • Code
  • Report
  • Survival analysis

    In medicine, especially but not only in oncology, it is very important to analyse overall survival, recurrence-free survival and perform other time-to-event analyses. In this example, using simulated data, we will analyse overall mammary cancer survival. We will try to see if survival is associated with a few predictive variables.

    The dataset

    For every woman, we have data about

    • date of cancer diagnosis
    • date of last follow up visit
    • survival at last follow up visit
    • possible predictors of survival: BMI, cancer stage, neo_adiuvant_therapy
    
    suppressWarnings(suppressMessages(library(dplyr, quiet = T)))
    
    set.seed(111)
    # Upload data (I created the data, it's not real!)
    db = read.csv("mammary_cancer_data.csv") %>% select(-X) %>% 
    filter(as.Date(date_diagnosis,format="%d/%m/%Y")>as.Date("01/01/1900",format="%d/%m/%Y"))
    
    db = db %>% 
    # simulate stage data and neo_adiuvant_therapy data accordingly
    mutate(stage = sample(1:4, nrow(db), replace = T),
           neo_adiuvant_therapy = ifelse(stage %in% c(3,4), 1, neo_adiuvant_therapy),
    # calculate survival time after diagnosis
    	   survival_time = abs(as.numeric(as.Date(last_visit,format="%d/%m/%Y") - as.Date(date_diagnosis,format="%d/%m/%Y")-sample(1:5000, nrow(db)))), 
           survival_factor = as.factor(survival),
           neo_adiuvant_therapy = as.factor(neo_adiuvant_therapy), 
           stage = as.factor(stage), 
           BMI_cat = ifelse(BMI > 30, "Obese", "Not Obese"))
    head(db)
    

    Data summary

    BMI, stage and neoadiuvant therapy are not significantly associated with survival status at bivariate analysis, but time has not yet been considered.

    # Summary of data
    install.packages("arsenal", quiet = T)
    library(arsenal, quiet = T)
    as.data.frame(summary(tableby(survival_factor ~ BMI + neo_adiuvant_therapy + stage, data = db), text = T))

    Kaplan-Meier Analysis

    Overall Kaplan-Meier survival curve shows a median survival of more than 4000 days, which is more than 10 years.

    library(survival, quiet = T)
    library(ggplot2, quiet = T)
    suppressWarnings(suppressMessages(install.packages("survminer", quiet = T)))
    suppressWarnings(suppressMessages(library(survminer, quiet = T)))
    # overall survival
    sop <- Surv(db$survival_time, db$survival)
    fitkm <- survfit(sop ~ 1, type="kaplan-meier")
    
    ggsurvplot(
      fitkm,
      data = db,
      conf.int = TRUE,
      xlab = "Days", ylab = "",
      ggtheme = theme_light(),
      surv.median.line = "hv",  legend.labs = c("Overall survival"),
      legend.title = "",risk.table = F,
      palette = c("#8C3F4D"), font.y = 15, font.x = 15, font.tickslab = 10, font.legend = 12)
    

    Bivariate and Multivariate Analysis

    BMI is associated with survival at bivariate analysis while neoadiuvant therapy and stage seem not (remember this is fake data!!)

    listavariabili<-list("BMI_cat", "stage", "neo_adiuvant_therapy")
    listatitoli<- c("BMI", "Cancer stage", "Neo-adiuvant therapy")
    listaetichette <- list(c("Not Obese", "Obese"), c("1", "2", "3", "4"), c("No", "Yes"))
    formulae <- lapply(listavariabili, function(x) as.formula(paste0("sop ~ ", x)))
    fits <- surv_fit(formulae, data = db)
    
    p1_st <- ggsurvplot(fits, data = db,
      pval = TRUE,
      #pval.coord = c(200, 0.10),
      xlab = "Days", ylab = "",
      ggtheme = theme_light(),
      #surv.median.line = "hv",
      title = "",
      legend.title=listatitoli, tables.height = 0.28, tables.theme = theme_cleantable(),  conf.int = F, tables.col = "strata", tables.y.text = F, censor = T,
      palette = c("#8C3F4D", "#3E606F", "#6476a1", "red"), font.y = 7, font.x = 7, font.tickslab = 10, font.legend = 12, legend.labs = listaetichette,
      risk.table = T, risk.table.title = "", risk.table.fontsize = 3
      )
    options(repr.plot.width=10, repr.plot.height=20)
    arrange_ggsurvplots(p1_st, ncol = 1, nrow = 3, print = TRUE, title = "Kaplan-Meier plots for survival in mammary cancer stratified by Body Mass Index, neoadiuvant therapy and stage.\nLog rank test p values are shown for each group.")
    suppressWarnings(suppressMessages(install.packages("Publish", quiet = T)))
    suppressWarnings(suppressMessages(library(Publish, quiet = T)))
    fit<-coxph(sop~stage + neo_adiuvant_therapy + BMI, data = db)
    a <- publish(fit, print = F, factor.reference = "extraline", units = NULL, probindex = FALSE,pvalue.digits=3,pvalue.stars=TRUE)
    a_df <- as.data.frame(a[["regressionTable"]])
    a_df[, 1:5]