Alternative pre-mRNA Splicing: Signals and Evolution

Created by W.Langdon from gp-bibliography.bib Revision:1.4496

  author =       "Ivana Vukusic",
  title =        "Alternative {pre-mRNA} Splicing: Signals and
  school =       "der Mathematisch-Naturwissenschaftlichen Fakultat, der
                 Universitat zu Koeln",
  year =         "2008",
  address =      "Cologne, Germany",
  month =        "17 " # nov,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  URL =          "",
  size =         "142 pages",
  abstract =     "Alternative pre-mRNA splicing is a major source of
                 transcriptome and proteome diversity. In humans,
                 aberrant splicing is a cause for genetic disease and
                 cancer. Until recently it was believed that almost
                 95percent of all genes undergo constitutive splicing,
                 where introns are always excised and exons are always
                 included into the mature mRNA transcript. It is now
                 widely accepted that alternative splicing is the rule
                 rather than the exception and that perhaps more than
                 75percent of all human genes are alternatively spliced.
                 Despite its importance and its potential role in
                 causing disease, the molecular basis of alternative
                 splicing is still not fully understood. The
                 incompleteness of our knowledge about the human
                 transcriptome makes ab initio predictions of
                 alternative splicing a recent, but important research

                 This thesis investigates different aspects of
                 alternative splicing in humans, based upon
                 computational large-scale analyses. We introduce a
                 genetic programming approach to predict alternative
                 splicing events without using expressed sequence tags
                 (ESTs). In contrast to existing methods, our approach
                 relies on sequence information only, and is therefore
                 independent of the existence of orthologous

                 We analysed 27,519 constitutively spliced and 9,641
                 cassette exons (SCE) together with their neighbouring
                 introns; in addition we analyzed 33,316 constitutively
                 spliced introns and 2,712 retained introns (SIR). We
                 find that our tool for classifying yields highly
                 accurate predictions on the SIR data, with a
                 sensitivity of 92.1percent and a specificity of
                 79.2percent. Prediction accuracies on the SCE data are
                 lower: 47.3percent (sensitivity) and 70.9percent
                 (specificity), indicating that alternative splicing of
                 introns can be better captured by sequence properties
                 than that of exons.

                 We critically question these findings and in particular
                 discuss the huge impact of the feature 'length' on
                 predictions in retained introns. We find that the
                 number of adenosines in an exon, called 'feature A' is
                 a highly prominent feature for classification of exons.
                 Adenosines are especially overrepresented in the most
                 abundant exonic splicing enhancers, found in
                 constitutive exons. Furthermore we comment on
                 inconsistencies of the nomenclature and on problems of
                 handling the splicing data. We make suggestions to
                 improve the terminology.

                 For further in silico exploration of sequence
                 properties of exons, we generated a dataset of
                 synthetic exons. We describe a general rule for
                 creating sequences with similar exonic splicing
                 enhancer and -silencer densities to real exons, as well
                 as similar exonic splicing enhancer networks. We find
                 that exonic splicing enhancer densities are well suited
                 for differentiating real and randomised exons, whereas
                 the densities of SR protein binding sites are largely
                 uninformative. Generally, we find that features
                 described on small scale experimental data are not
                 transferable to computational large-scale analyses,
                 which makes creation of rules for alternative splicing
                 prediction based only upon DNA/RNA sequence, an
                 extraordinarily difficult task.

                 According to our findings, we suggest that in case of
                 the SCE, only 20percent, and in case of SIR, only
                 30percent of the whole splicing information is encoded
                 on sequence level.

                 In the last chapter we investigated the question
                 whether alternative splicing may be connected to
                 adaptive evolutionary processes in a species or
                 population. Unfortunately, the currently available
                 population genetic tools are not sensitive enough to
                 identify traces of positive or balancing selection on
                 the scale of a few 100bp. Additional problems are the
                 incomplete SNP databases and SNP ascertainment bias.
                 The evolutionary role of alternative splicing remains,
                 at least for the moment, speculative.",
  notes =        "In English, Koeln",

Genetic Programming entries for Ivana Vukusic