Towards the Validation of Plagiarism Detection Tools by Means of Grammar Evolution

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Manuel Cebrian and Manuel Alfonseca and 
                 Alfonso Ortega",
  title =        "Towards the Validation of Plagiarism Detection Tools
                 by Means of Grammar Evolution",
  journal =      "IEEE Transactions on Evolutionary Computation",
  year =         "2009",
  month =        jun,
  volume =       "13",
  number =       "3",
  pages =        "477--485",
  keywords =     "genetic algorithms, genetic programming, Grammar
                 Evolution, Automatic programming, Benchmark testing,
                 Data mining, Distance measurement, Evolution (biology),
                 Genetics, Plagiarism, Probability density function,
                 computer science education, educational technology",
  ISSN =         "1089-778X",
  DOI =          "doi:10.1109/TEVC.2008.2008797",
  size =         "9 pages",
  abstract =     "Student plagiarism is a major problem in universities
                 worldwide. In this paper, we focus on plagiarism in
                 answers to computer programming assignments, where
                 students mix and/or modify one or more original
                 solutions to obtain counterfeits. Although several
                 software tools have been developed to help the tedious
                 and time consuming task of detecting plagiarism, little
                 has been done to assess their quality, because
                 determining the real authorship of the whole submission
                 corpus is practically impossible for markers. In this
                 paper, we present a Grammar Evolution technique which
                 generates benchmarks for testing plagiarism detection
                 tools. Given a programming language, our technique
                 generates a set of original solutions to an assignment,
                 together with a set of plagiarisms of the former set
                 which mimic the basic plagiarism techniques performed
                 by students. The authorship of the submission corpus is
                 predefined by the user, providing a base for the
                 assessment and further comparison of copy-catching
                 tools. We give empirical evidence of the suitability of
                 our approach by studying the behavior of one advanced
                 plagiarism detection tool (AC) on four benchmarks coded
                 in APL2, generated with our technique.",
  notes =        "also known as \cite{4781609} Not GP",

Genetic Programming entries for Manuel Cebrian Manuel Alfonseca Alfonso Ortega de la Puente