Is Machine Learning losing the battle to produce transportable signatures against VoIP traffic?

Created by W.Langdon from gp-bibliography.bib Revision:1.4202

  title =        "Is Machine Learning losing the battle to produce
                 transportable signatures against VoIP traffic?",
  author =       "Riyad Alshammari and A. Nur Zincir-Heywood",
  pages =        "1542--1549",
  booktitle =    "Proceedings of the 2011 IEEE Congress on Evolutionary
  year =         "2011",
  editor =       "Alice E. Smith",
  month =        "5-8 " # jun,
  address =      "New Orleans, USA",
  organization = "IEEE Computational Intelligence Society",
  publisher =    "IEEE Press",
  ISBN =         "0-7803-8515-2",
  keywords =     "genetic algorithms, genetic programming, AdaBoost,
                 C5.0, VoIP traffic classification, consecutive
                 sampling, machine learning, naive Bayesian, random
                 sampling, transportable signatures, voice over IP,
                 Bayes methods, Internet telephony, learning (artificial
                 intelligence), telecommunication security,
                 telecommunication traffic",
  DOI =          "doi:10.1109/CEC.2011.5949799",
  abstract =     "Traffic classification becomes more challenging since
                 the traditional techniques such as port numbers or deep
                 packet inspection are ineffective against voice over IP
                 (VoIP) applications, which uses non-standard ports and
                 encryption. Statistical information based on network
                 layer with the use of machine learning (ML) can achieve
                 high classification accuracy and produce transportable
                 signatures. However, the ability of ML to find
                 transportable signatures depends mainly on the training
                 data sets. In this paper, we explore the importance of
                 sampling training data sets for the ML algorithms,
                 specifically Genetic Programming, C5.0, Naive Bayesian
                 and AdaBoost, to find transportable signatures. To this
                 end, we employed two techniques for sampling network
                 training data sets, namely random sampling and
                 consecutive sampling. Results show that random sampling
                 and 90-minute consecutive sampling have the best
                 performance in terms of accuracy using C5.0 and SBB,
                 respectively. In terms of complexity, the size of C5.0
                 solutions increases as the training size increases,
                 whereas SBB finds simpler solutions.",
  notes =        "CEC2011 sponsored by the IEEE Computational
                 Intelligence Society, and previously sponsored by the
                 EPS and the IET.",

Genetic Programming entries for Riyad Alshammari Nur Zincir-Heywood