Filtering Junk E-Mail: A Performance Comparison between Genetic Programming and Naive Bayes

Created by W.Langdon from gp-bibliography.bib Revision:1.3973

@Unpublished{katirai99,
  author =       "Hooman Katirai",
  title =        "Filtering Junk E-Mail: A Performance Comparison
                 between Genetic Programming and Naive Bayes",
  year =         "1999",
  month =        "10 " # sep,
  note =         "4A Year student project",
  URL =          "http://www.mit.edu/~hooman/papers/katirai99filtering.pdf",
  URL =          "http://citeseer.nj.nec.com/katirai99filtering.html",
  URL =          "http://citeseer.ist.psu.edu/310632.html",
  keywords =     "genetic algorithms, genetic programming",
  abstract =     "This paper describes the application of genetic
                 programming as a novel approach to the problem of
                 filtering junk e-mail. We benchmark our results against
                 the common standard: the naive Bayes classifier. While
                 the genetically programmed classifier demonstrated a
                 precision comparable to that of naive Bayes, it was
                 slightly outperformed in recall. Since both learning
                 methods gave similar results, it is recommended that a
                 larger study be undertaken to ascertain whether these
                 differences are indeed statistically significant.
                 Further it is recommended that the performance of these
                 classifiers be tested in a richer feature space more
                 typical of real-world classifiers. Although the
                 genetically programming classifier greatly outperformed
                 the naive Bayes classifier in speed, it is concluded
                 that a more efficient implementation of naive Bayes
                 needs to be used in order to provide a fair comparison.
                 We show that when left unabated, e-mail signatures also
                 known as taglines reduce the value of several important
                 features in junk e-mail detection; however it is also
                 shown that these e-mail signatures may be harvested as
                 advantageous features if some of their components are
                 removed and noted as a feature. We therefore recommend
                 that a better parser capable of meeting this criteria
                 be implemented. To aid the reader in the theoretical
                 aspects of our work, we have included introductory
                 background for both approaches, including a full
                 derivation of the generative naive Bayes model.",
  size =         "27 pages",
}

Genetic Programming entries for Hooman Katirai

Citations