On botnet detection with genetic programming under streaming data label budgets and class imbalance

Created by W.Langdon from gp-bibliography.bib Revision:1.4549

  author =       "Sara Khanchi and Ali Vahdat and Malcolm I. Heywood and 
                 A. Nur Zincir-Heywood",
  title =        "On botnet detection with genetic programming under
                 streaming data label budgets and class imbalance",
  journal =      "Swarm and Evolutionary Computation",
  year =         "2018",
  volume =       "39",
  pages =        "123--140",
  keywords =     "genetic algorithms, genetic programming,
                 Non-stationary data, Streaming data, Botnet detection,
                 Class imbalance",
  ISSN =         "2210-6502",
  URL =          "https://doi.org/10.1016/j.swevo.2017.09.008",
  DOI =          "doi:10.1016/j.swevo.2017.09.008",
  size =         "18 page",
  abstract =     "Algorithms for constructing models of classification
                 under streaming data scenarios are becoming
                 increasingly important. In order for such algorithms to
                 be applicable under real-world contexts we adopt the
                 following objectives: 1) operate under label budgets,
                 2) make label requests without recourse to true label
                 information, and 3) robustness to class imbalance.
                 Specifically, we assume that model building is only
                 performed using the content of a Data Subset (as in
                 active learning). Thus, the principle design decisions
                 are with regard to the definitions employed for
                 sampling and archiving policies. Moreover, these
                 policies should operate without prior information
                 regarding the distribution of classes, as this varies
                 over the course of the stream. A team formulation for
                 genetic programming (GP) is assumed as the generic
                 model for classification in order to support
                 incremental changes to classifier content. Benchmarking
                 is conducted with thirteen real-world Botnet datasets
                 with label budgets of the order of 0.5percent to
                 5percent and significant amounts of class imbalance.
                 Specific recommendations are made for detecting the
                 costly minor classes under these conditions. Comparison
                 with current approaches to streaming data under label
                 budgets supports the significance of these findings.",
  notes =        "Also known as \cite{KHANCHI2018123}

                 Appears also in GECCO 2018 (hot of the press) pages
                 21--22, Kyoto, Japan, as \cite{Khanchi:2018:GECCOcomp}
                 or \cite{3208206} doi:10.1145/3205651.3208206",

Genetic Programming entries for Sara Khanchi Ali Vahdat Malcolm Heywood Nur Zincir-Heywood