Comparative genome analysis of a large Dutch Legionella pneumophila strain collection identifies five markers highly correlated with clinical strains

Created by W.Langdon from gp-bibliography.bib Revision:1.4192

  author =       "Ed Yzerman and Jeroen {den Boer} and 
                 Martien Caspers and Arpit Almal and Bill Worzel and 
                 Walter {van der Meer} and Roy Montijn and Frank Schuren",
  title =        "Comparative genome analysis of a large Dutch
                 Legionella pneumophila strain collection identifies
                 five markers highly correlated with clinical strains",
  year =         "2010",
  journal =      "BMC Genomics",
  volume =       "11",
  pages =        "433",
  keywords =     "genetic algorithms, genetic programming",
  ISSN =         "1471-2164",
  bibsource =    "OAI-PMH server at",
  oai =          "oai:doaj-articles:5761bf382eaa4396188fdcc54c508f94",
  URL =          "",
  DOI =          "doi:10.1186/1471-2164-11-433",
  size =         "11 pages",
  publisher =    "BioMed Central",
  abstract =     "Background

                 Discrimination between clinical and environmental
                 strains within many bacterial species is currently
                 under explored. Genomic analyses have clearly shown the
                 enormous variability in genome composition between
                 different strains of a bacterial species. In this study
                 we have used Legionella pneumophila, the causative
                 agent of Legionnaire's disease, to search for genomic
                 markers related to pathogenicity. During a large
                 surveillance study in The Netherlands
                 well-characterised patient-derived strains and
                 environmental strains were collected. We have used a
                 mixed-genome microarray to perform comparative-genome
                 analysis of 257 strains from this


                 Microarray analysis indicated that 480 DNA markers (out
                 of in total 3360 markers) showed clear variation in
                 presence between individual strains and these were
                 therefore selected for further analysis. Unsupervised
                 statistical analysis of these markers showed the
                 enormous genomic variation within the species but did
                 not show any correlation with a pathogenic phenotype.
                 We therefore used supervised statistical analysis to
                 identify discriminating markers. Genetic programming
                 was used both to identify predictive markers and to
                 define their interrelationships. A model consisting of
                 five markers was developed that together correctly
                 predicted 100percent of the clinical strains and
                 69percent of the environmental strains.


                 A novel approach for identifying predictive markers
                 enabling discrimination between clinical and
                 environmental isolates of L. pneumophila is presented.
                 Out of over 3000 possible markers, five were selected
                 that together enabled correct prediction of all the
                 clinical strains included in this study. This novel
                 approach for identifying predictive markers can be
                 applied to all bacterial species, allowing for better
                 discrimination between strains well equipped to cause
                 human disease and relatively harmless strains.",

Genetic Programming entries for Ed Yzerman Jeroen den Boer Martien Caspers Arpit A Almal William P Worzel Walter van der Meer Roy Montijn Frank Schuren