Joint Modelling of Multivariate Longitudinal and Survival Data

Denisa Mendonça

Principal Investigator

Integrated Member (PhD)

Type of project:




Proposing institution:

Universidade do Minho

Participating institutions:

Instituto Politécnico do Porto (IPP); Instituto Nacional de Engenharia Biomédica (INEB Porto); ISPUP

Sources of financing:

Sociedade Portuguesa de Hipertensão

Start date:


(Predicted) End date:


Research line:

L1 - Life Course Research and Healthy Ageing

Research lab:

Trajectories and Joint Modelling Applied to Health Metrics


Joint models for longitudinal and survival data, or also called time to event data, have recently attracted a lot of attention in statistics, and in particular in biostatistics. Biostatistics is a science that develops statistical methodologies motivated by questions and scientific problems within
the areas of medicine, epidemiology, public health and biology. These models are of great interest in these areas as it is common to both collect for a subject repeated measures of biomarkers as well as times to key clinical events.
In particular, this project is motivated by a study on breast cancer within the unit of senology in Hospital de Braga, in the north region of Portugal.
At the moment, the existing recommendations and guidelines from the National Health Service are mainly based on european studies. However, it is not clear that the behaviour of the disease is similar among european countries. Therefore, this study would be able to answer questions on the portuguese population specificity.
In this project we propose to develop a statistical model, in particular a joint model for multivariate longitudinal and survival data, motivated by the medical questions that came out of this data set. We propose to use longitudinal models, survival models and the multivariate joint model for statistical inference on understanding the complexity of breast cancer disease. Longitudinal data is originated when individuals are measured repeatedly through time, and the longitudinal models describe the process underneath the observed data. Mainly, these models allow us to distinguish in the data variability within and between subjects (Diggle et al, 2002). These models are also extended to incorporate multivariate analysis (Bandyopadhyat et al 2011). Survival data deals with times from a reference point until an event to occur. Classic survival models describe the hazard of event to occur at any given time, as a function of explanatory variables.
There are more than a single event of interest in breast cancer patients, these being time to recurrence, time to death and also time to other cancer not related with breast. At the same time patients are monitored for multiple tumor biomarkers (CA 549, CA 15.3) through blood samples,
since they are diagnosed. It is of interest to make inference on the progression of these tumor biomarkers in breast cancer patients up to events of interest. Moreover, the events of interest might be competing and therefore time to a specific event depends also on other events. The collection of repeated measurements on several patients brings up longitudinal data, that have to be analysed with methodology that treats measurements of a same subject as correlated, but at the same time independent when measurements are from different subjects. The development of joint models gave us the opportunity to model the two processes of interest simultaneously, longitudinal and time to event, given these are associated as in the example of breast cancer patients.
In a setting where the longitudinal observations may be correlated with survival, joint models of longitudinal and time to event processes have been increasingly proposed, to recover information from these potentially informative censorings (Wulfsohn and Tsiatis, 1997, Henderson et al 2001, Diggle et al, 2008 and Sousa, 2011). For example, women with breast cancer getting worst are more likely to die earlier, or having a tumor relapse earlier. Some of these proposals are conceptually different, but they all focus on a single longitudinal variable and a single time to event.
We aim to compare the results obtained from independent multivariates analysis (longitudinal and survival) with the multivariate joint analysis.
Our hypothesis being that a joint analysis gives us better inferences on model parameters.
The joint models proposed in Henderson et al 2001 and Diggle et al 2008 are now available in the package joineR of R software (http://cran.rproject.
org/web/packages/joineR/), and we propose to extend this with functions for multivariate joint models, in the presence of multiple longitudinal responses and multiple events of interest.
The project is being proposed for one year as this is the funding available for this time. However, we already have a clear idea how this project could progress in half a year, and two years. It would be of great importance to implement in the local hospital a statistical computer application that could work as a data base, but also analyse with the multivariate joint model the new data in a daily base, when the new longitudinal information would arrive. It would be a tool of great importance for doctors.

Research Team