Inference of Regular Expressions for Text Extraction from Examples

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Alberto Bartoli and Andrea {De Lorenzo} and 
                 Eric Medvet and Fabiano Tarlao",
  title =        "Inference of Regular Expressions for Text Extraction
                 from Examples",
  journal =      "IEEE Transactions on Knowledge and Data Engineering",
  year =         "2016",
  volume =       "28",
  number =       "5",
  pages =        "1217--1230",
  month =        may,
  keywords =     "genetic algorithms, genetic programming",
  ISSN =         "1041-4347",
  URL =          "",
  DOI =          "doi:10.1109/TKDE.2016.2515587",
  abstract =     "A large class of entity extraction tasks from text
                 that is either semistructured or fully unstructured may
                 be addressed by regular expressions, because in many
                 practical cases the relevant entities follow an
                 underlying syntactical pattern and this pattern may be
                 described by a regular expression. In this work, we
                 consider the long-standing problem of synthesizing such
                 expressions automatically, based solely on examples of
                 the desired behaviour. We present the design and
                 implementation of a system capable of addressing
                 extraction tasks of realistic complexity. Our system is
                 based on an evolutionary procedure carefully tailored
                 to the specific needs of regular expression generation
                 by examples. The procedure executes a search driven by
                 a multiobjective optimization strategy aimed at
                 simultaneously improving multiple performance indexes
                 of candidate solutions while at the same time ensuring
                 an adequate exploration of the huge solution space. We
                 assess our proposal experimentally in great depth, on a
                 number of challenging datasets. The accuracy of the
                 obtained solutions seems to be adequate for practical
                 usage and improves over earlier proposals
                 significantly. Most importantly, our results are highly
                 competitive even with respect to human operators. A
                 prototype is available as a web application at
  notes =        "Entered 2016 HUMIES Department of Engineering and
                 Architecture (DIA), University of Trieste, Italy. Also
                 known as \cite{7374717}",

Genetic Programming entries for Alberto Bartoli Andrea De Lorenzo Eric Medvet Fabiano Tarlao