Feature Extraction Using Genetic Programming with Applications in Malware Detection

Created by W.Langdon from gp-bibliography.bib Revision:1.4340

  author =       "Cristina Vatamanu and Dragos Gavrilut and 
                 Razvan Benchea and Henri Luchian",
  booktitle =    "17th International Symposium on Symbolic and Numeric
                 Algorithms for Scientific Computing (SYNASC)",
  title =        "Feature Extraction Using Genetic Programming with
                 Applications in Malware Detection",
  year =         "2015",
  pages =        "224--231",
  abstract =     "This paper extends the authors' previous research on a
                 malware detection method, focusing on improving the
                 accuracy of the perceptron based - One Side Class
                 Perceptron algorithm via the use of Genetic
                 Programming. We are concerned with finding a proper
                 balance between the three basic requirements for
                 malware detection algorithms: (a) that their training
                 time on large datasets falls below acceptable upper
                 limits; (b) that their false positive rate
                 (clean/legitimate files/software wrongly classified as
                 malware) is as close as possible to 0 and (c) that
                 their detection rate is as close as possible to 1. When
                 the first two requirements are set as objectives for
                 the design of detection algorithms, it often happens
                 that the third objective is missed: the detection rate
                 is low. This study focuses on improving the detection
                 rate while preserving the small training time and the
                 low rate of false positives. Another concern is to use
                 the perceptron-based algorithm's good performance on
                 linearly separable data, by extracting features from
                 existing ones. In order to keep the overall training
                 time low, the huge search space of possible extracted
                 features is efficiently explored in terms of time and
                 memory foot-print using Genetic Programming; better
                 separability is sought for. For experiments we used a
                 dataset consisting of 350,000 executable files with an
                 initial set of 300 Boolean features describing each of
                 them. The feature-extraction algorithm is implemented
                 in a parallel manner in order to cope with the size of
                 the data set. We also tested different ways of
                 controlling the growth in size of the variable-length
                 chromosomes. The experimental results show that the
                 features produced by this method are better than the
                 best ones obtained through mapping allowing for an
                 increase in detection rate.",
  keywords =     "genetic algorithms, genetic programming",
  DOI =          "doi:10.1109/SYNASC.2015.43",
  month =        sep,
  notes =        "Romania Bitdefender Anti-virus Res. Lab., Al. I. Cuza
                 Univ. of Iasi, Iasi, Romania

                 Also known as \cite{7426087}",

Genetic Programming entries for Cristina Vatamanu Dragos Gavrilut Razvan Benchea Henri Luchian