Boosting drug named entity recognition using an aggregate classifier

Created by W.Langdon from gp-bibliography.bib Revision:1.4192

  author =       "Ioannis Korkontzelos and Dimitrios Piliouras and 
                 Andrew W. Dowsey and Sophia Ananiadou",
  title =        "Boosting drug named entity recognition using an
                 aggregate classifier",
  journal =      "Artificial Intelligence in Medicine",
  volume =       "65",
  number =       "2",
  pages =        "145--153",
  year =         "2015",
  note =         "Intelligent healthcare informatics in big data era",
  ISSN =         "0933-3657",
  DOI =          "doi:10.1016/j.artmed.2015.05.007",
  URL =          "",
  abstract =     "Objective Drug named entity recognition (NER) is a
                 critical step for complex biomedical NLP tasks such as
                 the extraction of pharmacogenomic, pharmacodynamic and
                 pharmacokinetic parameters. Large quantities of high
                 quality training data are almost always a prerequisite
                 for employing supervised machine-learning techniques to
                 achieve high classification performance. However, the
                 human labour needed to produce and maintain such
                 resources is a significant limitation. In this study,
                 we improve the performance of drug NER without relying
                 exclusively on manual annotations. Methods We perform
                 drug NER using either a small gold-standard corpus (120
                 abstracts) or no corpus at all. In our approach, we
                 develop a voting system to combine a number of
                 heterogeneous models, based on dictionary knowledge,
                 gold-standard corpora and silver annotations, to
                 enhance performance. To improve recall, we employed
                 genetic programming to evolve 11 regular-expression
                 patterns that capture common drug suffixes and used
                 them as an extra means for recognition. Materials Our
                 approach uses a dictionary of drug names, i.e.
                 DrugBank, a small manually annotated corpus, i.e. the
                 pharmacokinetic corpus, and a part of the UKPMC
                 database, as raw biomedical text. Gold-standard and
                 silver annotated data are used to train maximum entropy
                 and multinomial logistic regression classifiers.
                 Results Aggregating drug NER methods, based on
                 gold-standard annotations, dictionary knowledge and
                 patterns, improved the performance on models trained on
                 gold-standard annotations, only, achieving a maximum
                 F-score of 95percent. In addition, combining models
                 trained on silver annotations, dictionary knowledge and
                 patterns are shown to achieve comparable performance to
                 models trained exclusively on gold-standard data. The
                 main reason appears to be the morphological
                 similarities shared among drug names. Conclusion We
                 conclude that gold-standard data are not a hard
                 requirement for drug NER. Combining heterogeneous
                 models build on dictionary knowledge can achieve
                 similar or comparable classification performance with
                 that of the best performing model trained on
                 gold-standard annotations.",
  keywords =     "genetic algorithms, genetic programming, Named entity
                 annotation sparsity, Gold-standard vs. silver-standard
                 annotations, Named entity recogniser aggregation,
                 Genetic-programming-evolved string-similarity patterns,
                 Drug named entity recognition",

Genetic Programming entries for Ioannis Korkontzelos Dimitrios Piliouras Andrew Dowsey Sophia Ananiadou