Measures of Diversity for Populations and Distances Between Individuals with Highly Reorganizable Genomes

Created by W.Langdon from gp-bibliography.bib Revision:1.4340

  author =       "Claudio Mattiussi and Markus Waibel and 
                 Dario Floreano",
  title =        "Measures of Diversity for Populations and Distances
                 Between Individuals with Highly Reorganizable Genomes",
  journal =      "Evolutionary Computation",
  year =         "2004",
  volume =       "12",
  number =       "4",
  pages =        "495--515",
  month =        "Winter",
  keywords =     "genetic algorithms, genetic programming, Evolutionary
                 computation, variable length genomes, population
                 diversity, substring diversity, Tanimoto distance,
                 Jaccard similarity, linguistic complexity, nucleotide
  ISSN =         "1063-6560",
  URL =          "",
  DOI =          "doi:10.1162/1063656043138923",
  size =         "21 pages",
  abstract =     "we address the problem of defining a measure of
                 diversity for a population of individuals whose genome
                 can be subjected to major reorganisations during the
                 evolutionary process. To this end, we introduce a
                 measure of diversity for populations of strings of
                 variable length defined on a finite alphabet, and from
                 this measure we derive a semi-metric distance between
                 pairs of strings. The definitions are based on counting
                 the number of substrings of the strings, considered
                 first separately and then collectively. This approach
                 is related to the concept of linguistic complexity,
                 whose definition we generalise from single strings to
                 populations. Using the substring count approach we also
                 define a new kind of Tanimoto distance between strings.
                 We show how to extend the approach to representations
                 that are not based on strings and, in particular, to
                 the tree-based representations used in the field of
                 genetic programming. We describe how suffix trees can
                 allow these measures and distances to be implemented
                 with a computational cost that is linear in both space
                 and time relative to the length of the strings and the
                 size of the population. The definitions were devised to
                 assess the diversity of populations having genomes of
                 variable length and variable structure during
                 evolutionary computation runs, but applications in
                 quantitative genomics, proteomics, and pattern
                 recognition can be also envisaged.",
  notes =        "Section 3.4 (10 lines) suggests using Tanimoto
                 distance for tree GP p503. Page 507 says {"}Tanimoto
                 ... O(lnn) complexity makes the direct implementation
                 ... rapidly impractical for runtime diversity
                 assessment when the population size (n)

                 source code etc",

Genetic Programming entries for Claudio Mattiussi Markus Waibel Dario Floreano