The use of genetic programming in the analysis of quantitative gene expression profiles for identification of nodal status in bladder cancer

Created by W.Langdon from gp-bibliography.bib Revision:1.4216

  title =        "The use of genetic programming in the analysis of
                 quantitative gene expression profiles for
                 identification of nodal status in bladder cancer",
  author =       "Anirban P Mitra and Arpit A Almal and Ben George and 
                 David W Fry and Peter F Lenehan and 
                 Vincenzo Pagliarulo and Richard J Cote and Ram H Datar and 
                 William P Worzel",
  journal =      "BMC Cancer",
  year =         "2006",
  volume =       "6",
  number =       "159",
  month =        jun # "~16",
  publisher =    "BioMed Central Ltd.",
  ISSN =         "1471-2407",
  bibsource =    "OAI-PMH server at",
  language =     "en",
  oai =          "",
  rights =       "Copyright 2006 Mitra et al; licensee BioMed Central
  keywords =     "genetic algorithms, genetic programming, AUROC",
  URL =          "",
  URL =          "",
  abstract =     "Background

                 Previous studies on bladder cancer have shown nodal
                 involvement to be an independent indicator of prognosis
                 and survival. This study aimed at developing an
                 objective method for detection of nodal metastasis from
                 molecular profiles of primary urothelial carcinoma
                 tissues. Methods

                 The study included primary bladder tumor tissues from
                 60 patients across different stages and 5 control
                 tissues of normal urothelium. The entire cohort was
                 divided into training and validation sets comprised of
                 node positive and node negative subjects. Quantitative
                 expression profiling was performed for a panel of 70
                 genes using standardized competitive RT-PCR and the
                 expression values of the training set samples were run
                 through an iterative machine learning process called
                 genetic programming that employed an N-fold cross
                 validation technique to generate classifier rules of
                 limited complexity. These were then used in a voting
                 algorithm to classify the validation set samples into
                 those associated with or without nodal metastasis.

                 The generated classifier rules using 70 genes
                 demonstrated 81percent accuracy on the validation set
                 when compared to the pathological nodal status. The
                 rules showed a strong predilection for ICAM1, MAP2K6
                 and KDR resulting in gene expression motifs that
                 cumulatively suggested a pattern ICAM1>MAP2K6>KDR for
                 node positive cases. Additionally, the motifs showed
                 CDK8 to be lower relative to ICAM1, and ANXA5 to be
                 relatively high by itself in node positive tumors.
                 Rules generated using only ICAM1, MAP2K6 and KDR were
                 comparably robust, with a single representative rule
                 producing an accuracy of 90percent when used by itself
                 on the validation set, suggesting a crucial role for
                 these genes in nodal metastasis. Conclusion

                 Our study demonstrates the use of standardized
                 quantitative gene expression values from primary
                 bladder tumor tissues as inputs in a genetic
                 programming system to generate classifier rules for
                 determining the nodal status. Our method also suggests
                 the involvement of ICAM1, MAP2K6, KDR, CDK8 and ANXA5
                 in unique mathematical combinations in the progression
                 towards nodal positivity. Further studies are needed to
                 identify more class-specific signatures and confirm the
                 role of these genes in the evolution of nodal
                 metastasis in bladder cancer.",
  notes =        "p2 'Since scaling the gene expression levels to
                 represent fold changes relative to a base value could
                 have biased the significance of these gene'

                 65 samples. 11-fold cross validation. Max 7-genes per

                 mixing of folds and majority voting scheme. 100
                 Generations. p6 Analysis of gene usage 'motifs'
                 (requires GP, could not be done with other approaches.
                 Indicate possible biochemical pathways.

                 p7 'Gene transitivity'. p12 'hypothesis-generating
                 nature of GP'

                 p12 'A unique feature of GP is the final output, which
                 consists of easily readable rules expressed as
                 executable classifier programs that define tangible
                 relationships between the most influential genes.' p12
                 'filtering can create an incomplete and biased dataset
                 that may not be representative of many complex
                 biological systems. The curse of

                 p13.'hierarchical, KNN, K-means clustering and Neural
                 Nets which do not scale easily to larger numbers of

                 p13 GP can 'handle missing values in the data'.",

Genetic Programming entries for Anirban P Mitra Arpit A Almal Ben George David W Fry Peter F Lenehan Vincenzo Pagliarulo Richard J Cote Ram H Datar William P Worzel