A New Quartet Tree Heuristic for Hierarchical Clustering

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Rudi Cilibrasi and Paul Vitanyi",
  title =        "A New Quartet Tree Heuristic for Hierarchical
  booktitle =    "Principled methods of trading exploration and
                 exploitation Workshop",
  year =         "2005",
  address =      "London",
  month =        "6-7 " # jul,
  organisation = "PASCAL",
  keywords =     "genetic algorithms, genetic programming,
                 Computational, Information-Theoretic Learning with
                 Statistics, Learning/Statistics, Optimisation, Theory,
  URL =          "http://www.cwi.nl/~paulv/papers/quartet.pdf",
  size =         "22 pages",
  abstract =     "We consider the problem of constructing an an
                 optimal-weight tree from the 3Chose(n,4) weighted
                 quartet topologies on n objects, where optimality means
                 that the summed weight of the embedded quartet
                 topologies is optimal (so it can be the case that the
                 optimal tree embeds all quartets as non-optimal
                 topologies). We present a heuristic for reconstructing
                 the optimal-weight tree, and a canonical manner to
                 derive the quartet-topology weights from a given
                 distance matrix. The method repeatedly transforms a
                 bifurcating tree, with all objects involved as leaves,
                 achieving a monotonic approximation to the exact single
                 globally optimal tree. This contrasts to other
                 heuristic search methods from biological phylogeny,
                 like DNAML or quartet puzzling, which, repeatedly,
                 incrementally construct a solution from a random order
                 of objects, and subsequently add agreement values. We
                 do not assume that there exists a true bifurcating
                 supertree that embeds each quartet in the optimal
                 topology, or represents the distance matrix
                 faithfully|not even under the assumption that the
                 weights or distances are corrupted by a measuring
                 process. Our aim is to hierarchically cluster the input
                 data as faithfully as possible, both phylogenetic data
                 and data of completely different types. In our
                 experiments with natural data, like genomic data, texts
                 or music, the global optimum appears to be reached. Our
                 method is capable of handling over 100 objects,
                 possibly up to 1000 objects, while no existing quartet
                 heuristic can computionally approximate the exact
                 optimal solution of a quartet tree of more than about
                 20-30 objects without running for years. The method is
                 implemented and available as public software.",
  notes =        "http://eprints.pascal-network.org/archive/00001821/

                 paper improved after workshop?",

Genetic Programming entries for Rudi Cilibrasi Paul M B Vitanyi