Automated development of clinical prediction models using genetic programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4496

  title =        "Automated development of clinical prediction models
                 using genetic programming",
  author =       "Christian A. Bannister",
  year =         "2015",
  school =       "Cardiff University",
  address =      "UK",
  month =        sep,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  URL =          "",
  URL =          "",
  size =         "410 pages",
  abstract =     "Genetic programming is an Evolutionary Computing
                 technique, inspired by biological evolution, capable of
                 discovering complex non-linear patterns in large
                 datasets. Genetic programming is a general methodology,
                 the specific implementation of which requires
                 development of several different specific elements such
                 as problem representation, fitness, selection and
                 genetic variation. Despite the potential advantages of
                 genetic programming over standard statistical methods,
                 its applications to survival analysis are at best rare,
                 primarily because of the difficulty in handling
                 censored data. The aim of this work was to develop a
                 genetic programming approach for survival analysis and
                 demonstrate its utility for the automatic development
                 of clinical prediction models using cardiovascular
                 disease as a case study. We developed a tree-based
                 untyped steady-state genetic programming approach for
                 censored longitudinal data, comparing its performance
                 to the de facto statistical method—Cox
                 regression—in the development of clinical prediction
                 models for the prediction of future cardiovascular
                 events in patients with symptomatic and asymptomatic
                 cardiovascular disease, using large observational
                 datasets. We also used genetic programming to examine
                 the prognostic significance of different risk factors
                 together with their non-linear combinations for the
                 prognosis of health outcomes in cardiovascular disease.
                 These experiments showed that Cox regression and the
                 developed steady-state genetic programming approach
                 produced similar results when evaluated in common
                 validation datasets. Despite slight relative
                 differences, both approaches demonstrated an acceptable
                 level of discriminative and calibration at a range of
                 times points. Whilst the application of genetic
                 programming did not provide more accurate
                 representations of factors that predict the risk of
                 both symptomatic and asymptomatic cardiovascular
                 disease when compared with existing methods, genetic
                 programming did offer comparable performance. Despite
                 generally comparable performance, albeit in slight
                 favour of the Cox model, the predictors selected for
                 representing their relationships with the outcome were
                 quite different and, on average, the models developed
                 using genetic programming used considerably fewer
                 predictors. The results of the genetic programming
                 confirm the prognostic significance of a small number
                 of the most highly associated predictors in the Cox
                 modelling; age, previous atherosclerosis, and albumin
                 for secondary prevention; age, recorded diagnosis of
                 other cardiovascular disease, and ethnicity for primary
                 prevention in patients with type 2 diabetes. When
                 considered as a whole, genetic programming did not
                 produce better performing clinical prediction models,
                 rather it used fewer predictors, most of which were the
                 predictors that Cox regression estimated be most
                 strongly associated with the outcome, whilst achieving
                 comparable performance. This suggests that genetic
                 programming may better represent the potentially
                 non-linear relationship of (a smaller subset of) the
                 strongest predictors. To our knowledge, this work is
                 the first study to develop a genetic programming
                 approach for censored longitudinal data and assess its
                 value for clinical prediction in comparison with the
                 well-known and widely applied Cox regression technique.
                 Using empirical data this work has demonstrated that
                 clinical prediction models developed by steady-state
                 genetic programming have predictive ability comparable
                 to those developed using Cox regression. The genetic
                 programming models were more complex and thus more
                 difficult to validate by domain experts, however these
                 models were developed in an automated fashion, using
                 fewer input variables, without the need for domain
                 specific knowledge and expertise required to
                 appropriately perform survival analysis. This work has
                 demonstrated the strong potential of genetic
                 programming as a methodology for automated development
                 of clinical prediction models for diagnostic and
                 prognostic purposes in the presence of censored data.
                 This work compared untuned genetic programming models
                 that were developed in an automated fashion with highly
                 tuned Cox regression models that was developed in a
                 very involved manner that required a certain amount of
                 clinical and statistical expertise. Whilst the highly
                 tuned Cox regression models performed slightly better
                 in validation data, the performance of the
                 automatically generated genetic programming models were
                 generally comparable. The comparable performance
                 demonstrates the utility of genetic programming for
                 clinical prediction modelling and prognostic research,
                 where the primary goal is accurate prediction. In
                 aetiological research, where the primary goal is to
                 examine the relative strength of association between
                 risk factors and the outcome, then Cox regression and
                 its variants remain as the de facto approach.",
  notes =        "British Library, EThOS",

Genetic Programming entries for Christian Bannister