Evolution of human-competitive lossless compression algorithms with GP-zip2

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

@Article{Kattan:2011:GPEM,
  author =       "Ahmed Kattan and Riccardo Poli",
  title =        "Evolution of human-competitive lossless compression
                 algorithms with {GP-zip2}",
  journal =      "Genetic Programming and Evolvable Machines",
  year =         "2012",
  volume =       "12",
  number =       "4",
  pages =        "335--364",
  month =        dec,
  keywords =     "genetic algorithms, genetic programming",
  ISSN =         "1389-2576",
  DOI =          "doi:10.1007/s10710-011-9133-6",
  size =         "30 pages",
  abstract =     "We propose GP-zip2, a new approach to loss less data
                 compression based on Genetic Programming (GP). GP is
                 used to optimally combine well-known loss-less
                 compression algorithms to maximise data compression.
                 GP-zip2 evolves programs with multiple components. One
                 component analyses statistical features extracted by
                 sequentially scanning the data to be compressed and
                 divides the data into blocks. These blocks are
                 projected onto a two-dimensional Euclidean space via
                 two further (evolved) program components. K-means
                 clustering is then applied to group similar data
                 blocks. Each cluster is labelled with the optimal
                 compression algorithm for its member blocks. After
                 evolution, evolved programs can be used to compress
                 unseen data. The compression algorithms available to
                 GP-zip2 are: Arithmetic coding, Lempel-Ziv-Welch,
                 Unbounded Prediction by Partial Matching, Run Length
                 Encoding, and Bzip2. Experimentation shows that the
                 results produced by GP-zip2 are human-competitive,
                 being typically superior to well-established
                 human-designed compression algorithms in terms of the
                 compression ratios achieved in heterogeneous archive
                 files.",
  notes =        "GP chooses when to switch between 5 well known
                 algorithms (LZW, PPMD, Run-length encoding and Bzip2)
                 and which to use. Three trees co-evolve to do work
                 between themselves (splitter tree, two feature
                 extraction, cf \cite{langdon:book}). Error in previous
                 Kattan GP-zip work. Threshold theta on splitter tree
                 output, Davis Bouldin index (DBI). variable length
                 header in output file. Compares well to WinRar. Amazon
                 EC2 AWS cloud computer used. Model of use assumes
                 compress once (eg DVD) uncompressed many times.

                 Single v. two pass approaches. Arithmetic coding.",
  affiliation =  "School of Computer Science and Electronic Engineering,
                 University of Essex, Colchester, CO4 3SQ UK",
}

Genetic Programming entries for Ahmed Kattan Riccardo Poli

Citations