Creating Regular Expressions as mRNA Motifs with GP to Predict Human Exon Splitting

Created by W.Langdon from gp-bibliography.bib Revision:1.4504

  author =       "W. B. Langdon and J. Rowsell and A. P. Harrison",
  title =        "Creating Regular Expressions as mRNA Motifs with GP to
                 Predict Human Exon Splitting",
  institution =  "Department of Computer Science, Crest Centre, King's
                 College, London",
  year =         "2009",
  number =       "TR-09-02",
  address =      "Strand, London, WC2R 2LS, UK",
  month =        "19 " # mar,
  keywords =     "genetic algorithms, genetic programming, Gene
                 expression and regulation, alternative splicing,
                 Microarray analysis, Integration of genetic programming
                 into bioinformatics, Biological interpretation of
                 computer generated motifs, Bioinformatics, Affymetrix
                 GeneChip, strongly typed genetic programming, grammar,
                 regular expression, Alternative splicing of Homosapiens
                 exons, HDONA",
  URL =          "",
  abstract =     "Low correlation between mRNA concentrations measured
                 at different locations for the same exon show many
                 current Ensembl exon definitions are incomplete.
                 Automatically created patterns (e.g. TCTTT) identify
                 potential new alternative transcripts.

                 Strongly typed grammar based genetic programming (GP)
                 is used to evolve regular expressions (RE) to classify
                 gene exons with potential alternative mRNA expression
                 from those

                 RNAnet gives us correlations between Affymetrix HG-U133
                 Plus 2 GeneChip probe measurements for the same exon
                 across 2757 Homo Sapiens tissue samples from NCBI's GEO
                 database. We identify many non-atomic Ensembl exons.
                 I.e. exons with substructure.

                 Biological patterns can be data mined by a Backus-Naur
                 form (BNF) context-free grammar using a strongly typed
                 GP written in gawk and using egrep. The automatically
                 produced DNA motifs suggest that alternative
                 polyadenylation is not responsible.

                 The training data is available on the
  notes =        "Long version of \cite{langdon:2009:gecco}",
  size =         "9 pages",

Genetic Programming entries for William B Langdon J Rowsell Andrew P Harrison