Ab initio identification of human microRNAs based on structure motifs

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Markus Brameier and Carsten Wiuf",
  title =        "Ab initio identification of human {microRNAs} based on
                 structure motifs",
  journal =      "BMC Bioinformatics",
  year =         "2007",
  volume =       "8",
  pages =        "478",
  month =        "18 " # dec,
  keywords =     "genetic algorithms, genetic programming, linear
                 genetic programming",
  URL =          "http://www.biomedcentral.com/content/pdf/1471-2105-8-478.pdf",
  DOI =          "doi:10.1186/1471-2105-8-478",
  size =         "11 pages",
  abstract =     "BACKGROUND: MicroRNAs (miRNAs) are short, non-coding
                 RNA molecules that are directly involved in
                 post-transcriptional regulation of gene expression. The
                 mature miRNA sequence binds to more or less specific
                 target sites on the mRNA. Both their small size and
                 sequence specificity make the detection of completely
                 new miRNAs a challenging task. This cannot be based on
                 sequence information alone, but requires structure
                 information about the miRNA precursor. Unlike
                 comparative genomics approaches, ab initio approaches
                 are able to discover species-specific miRNAs without
                 known sequence homology.

                 RESULTS: MiRPred is a novel method for ab initio
                 prediction of miRNAs by genome scanning that only
                 relies on (predicted) secondary structure to
                 distinguish miRNA precursors from other similar-sized
                 segments of the human genome. We apply a machine
                 learning technique, called linear genetic programming,
                 to develop special classifier programs which include
                 multiple regular expressions (motifs) matched against
                 the secondary structure sequence. Special attention is
                 paid to scanning issues. The classifiers are trained on
                 fixed-length sequences as these occur when shifting a
                 window in regular steps over a genome region. Various
                 statistical and empirical evidence is collected to
                 validate the correctness of and increase confidence in
                 the predicted structures. Among other things, we
                 propose a new criterion to select miRNA candidates with
                 a higher stability of folding that is based on the
                 number of matching windows around their genome
                 location. An ensemble of 16 motif-based classifiers
                 achieves 99.9 percent specificity with sensitivity
                 remaining on an acceptable high level when requiring
                 all classifiers to agree on a positive decision. A low
                 false positive rate is considered more important than a
                 low false negative rate, when searching larger genome
                 regions for unknown miRNAs. 117 new miRNAs have been
                 predicted close to known miRNAs on human chromosome 19.
                 All candidate structures match the free energy
                 distribution of miRNA precursors which is significantly
                 shifted towards lower free energies. We employed a
                 human EST library and found that around 75 percent of
                 the candidate sequences are likely to be transcribed,
                 with around 35 percent located in introns.

                 CONCLUSION: Our motif finding method is at least
                 competitive to state-of-the-art feature-based methods
                 for ab initio miRNA discovery. In doing so, it requires
                 less previous knowledge about miRNA precursor
                 structures while programs and motifs allow a more
                 straightforward interpretation and extraction of the
                 acquired knowledge.",
  notes =        "PMID: 18088431 [PubMed - indexed for MEDLINE]",

Genetic Programming entries for Markus Brameier Carsten Wiuf