Query Consolidation: Interpreting Queries Sent to Independent Heterogenous Databases

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

@PhdThesis{Acar:thesis,
  author =       "Aybar C. Acar",
  title =        "Query Consolidation: Interpreting Queries Sent to
                 Independent Heterogenous Databases",
  school =       "The Volgenau School of Information Technology and
                 Engineering, George Mason University",
  year =         "2008",
  address =      "Fairfax, VA, USA",
  month =        "23 " # jul,
  keywords =     "genetic algorithms, genetic programming, Databases,
                 Information Integration, Query Processing, Machine
                 Learning",
  URL =          "http://hdl.handle.net/1920/3223",
  URL =          "http://digilib.gmu.edu:8080/dspace/bitstream/1920/3223/1/Acar_Aybar.pdf",
  size =         "182 pages",
  abstract =     "This dissertation introduces the problem of query
                 consolidation, which seeks to interpret a set of
                 disparate queries submitted to independent databases
                 with a single global query. The problem has multiple
                 applications, from improving virtual database design,
                 to aiding users in information retrieval, to protecting
                 against inference of sensitive data from a seemingly
                 innocuous set of apparently unrelated queries. The
                 problem exhibits attractive duality with the
                 much-researched problem of query decomposition, which
                 has been addressed intensively in the context of
                 multidatabase environments: How to decompose a query
                 submitted to a virtual database into a set of local
                 queries that are evaluated in individual databases. The
                 new problem is set in the architecture of a canonical
                 multidatabase system, using it in the reverse
                 direction. The reversal is built on the assumption of
                 conjunctive queries and source descriptions. A rational
                 and efficient query decomposition strategy is also
                 assumed, and this decomposition is reversed to arrive
                 at the original query by analyzing the decomposed
                 components. The process incorporates several steps
                 where a number of solutions must be considered, due to
                 the fact that query decomposition is not
                 injective.

                 Initially, the problem of finding the most likely join
                 plan between component queries is investigated. This is
                 accomplished by leveraging the referential constraints
                 available in the underlying multidatabase, or by
                 approximating these constraints from the data when not
                 available. This approximation is done using the
                 information theoretic concept of conditional entropy.
                 Furthermore, the most likely join plans are enhanced by
                 the expansion of their projections and adding precision
                 to their selection constraints by estimating the
                 selection constraints that would be applied to these
                 consolidations offline.

                 Additionally, the extraction of a set of queries
                 related to the same retrieval task from an ongoing
                 sequence of incoming queries is investigated. A
                 conditional random field model is trained to segment
                 and label incoming query sequences. Finally, the
                 candidate consolidations are re-encapsulated with a
                 genetic programming approach to find simpler
                 intentional descriptions that are extensionally
                 equivalent to discover the original intent of the
                 query.

                 The dissertation explains and discusses all of the
                 above operations and validates the methods developed
                 with experimentation on synthesised and real-world
                 data. The results are highly encouraging and verify
                 that the accuracy, time performance, and scalability of
                 the methods would make it possible to exploit query
                 consolidation in production environments.",
  notes =        "GP chapters 7, 8",
}

Genetic Programming entries for Aybar C Acar

Citations