Analysis of Grammatical Evolution Approaches to Regular Expression Induction

Created by W.Langdon from gp-bibliography.bib Revision:1.4333

  title =        "Analysis of Grammatical Evolution Approaches to
                 Regular Expression Induction",
  author =       "Antonio Gonzalez-Pardo and David Camacho",
  pages =        "632--639",
  booktitle =    "Proceedings of the 2011 IEEE Congress on Evolutionary
  year =         "2011",
  editor =       "Alice E. Smith",
  month =        "5-8 " # jun,
  address =      "New Orleans, USA",
  organization = "IEEE Computational Intelligence Society",
  publisher =    "IEEE Press",
  ISBN =         "0-7803-8515-2",
  keywords =     "genetic algorithms, genetic programming, grammatical
                 evolution, Data mining",
  DOI =          "doi:10.1109/CEC.2011.5949679",
  abstract =     "Regular expressions, or regexes, have been used
                 traditionally as a pattern matching tool to search for
                 structures in a set of objects, like files, text
                 documents or folders. Pattern matching can be used to
                 look for files whose name contains a given string, to
                 search files that contain a specific pattern within
                 them, or simply to extract text in a set of documents.
                 It is very popular to apply regexes to detect and
                 extract patterns that represent phone numbers, URLs,
                 email addresses, etc. These kind of information can be
                 characterised because it has a well defined structure.
                 Nevertheless, regexes are not very frequently used
                 because its high complexity in both, syntax and
                 grammatical rules, makes regexes difficult to
                 understand. For this reason, the development of
                 programs able to automatically generate, and evaluate,
                 regexes has become a valuable task. This work analyses
                 the performance of different grammatical evolutionary
                 approaches in the generation of regexes able to extract
                 URL patterns. Four different types of grammars have
                 been evaluated: a context-free grammar, a context-free
                 grammar with a penalised fitness function, an
                 extensible context-free grammar, and a Christiansen
                 grammar. For the considered problem, the experimental
                 results show that the best performance of the system,
                 measured as cumulative success rate, is achieved using
                 Christiansen grammars.",
  notes =        "CEC2011 sponsored by the IEEE Computational
                 Intelligence Society, and previously sponsored by the
                 EPS and the IET.",

Genetic Programming entries for Antonio Gonzalez-Pardo David Camacho