GP-Fileprints: File Types Detection Using Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Ahmed Kattan and Edgar Galvan-Lopez and 
                 Riccardo Poli and Michael O'Neill",
  title =        "GP-Fileprints: File Types Detection Using Genetic
  booktitle =    "Proceedings of the 13th European Conference on Genetic
                 Programming, EuroGP 2010",
  year =         "2010",
  editor =       "Anna Isabel Esparcia-Alcazar and Aniko Ekart and 
                 Sara Silva and Stephen Dignum and A. Sima Uyar",
  volume =       "6021",
  series =       "LNCS",
  pages =        "134--145",
  address =      "Istanbul",
  month =        "7-9 " # apr,
  organisation = "EvoStar",
  publisher =    "Springer",
  keywords =     "genetic algorithms, genetic programming",
  isbn13 =       "978-3-642-12147-0",
  DOI =          "doi:10.1007/978-3-642-12148-7_12",
  abstract =     "We propose a novel application of Genetic Programming
                 (GP): the identification of file types via the analysis
                 of raw binary streams (i.e., without the use of meta
                 data). GP evolves programs with multiple components.
                 One component analyses statistical features extracted
                 from the raw byte-series to divide the data into
                 blocks. These blocks are then analysed via another
                 component to obtain a signature for each file in a
                 training set. These signatures are then projected onto
                 a two-dimensional Euclidean space via two further
                 (evolved) program components. K-means clustering is
                 applied to group similar signatures. Each cluster is
                 then labelled according to the dominant label for its
                 members. Once a program that achieves good
                 classification is evolved it can be used on unseen data
                 without requiring any further evolution. Experimental
                 results show that GP compares very well with
                 established file classification algorithms (i.e.,
                 Neural Networks, Bayes Networks and J48 Decision
  notes =        "Part of \cite{Esparcia-Alcazar:2010:GP} EuroGP'2010
                 held in conjunction with EvoCOP2010 EvoBIO2010 and

Genetic Programming entries for Ahmed Kattan Edgar Galvan Lopez Riccardo Poli Michael O'Neill