Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction

Created by W.Langdon from gp-bibliography.bib Revision:1.4549

  author =       "Zoltan A. Kocsis and John H. Drake and 
                 Douglas Carson and Jerry Swan",
  title =        "Automatic Improvement of {Apache Spark} Queries using
                 Semantics-preserving Program Reduction",
  booktitle =    "Genetic Improvement 2016 Workshop",
  year =         "2016",
  editor =       "Justyna Petke and David R. White and Westley Weimer",
  pages =        "1141--1146",
  address =      "Denver",
  publisher_address = "New York, NY, USA",
  month =        jul # " 20-24",
  organisation = "SIGEvo",
  publisher =    "ACM",
  keywords =     "genetic algorithms, genetic programming, Genetic
                 Improvement, SBSE, Apache Spark, Program
                 Transformation, Query Optimisation, Automatic
                 Improvement Programmingt",
  URL =          "",
  DOI =          "doi:10.1145/2908961.2931692",
  size =         "6 pages",
  abstract =     "Apache Spark is a popular framework for large-scale
                 data analytics. Unfortunately, Spark's performance can
                 be difficult to optimise, since queries freely
                 expressed in source code are not amenable to
                 traditional optimisation techniques. This article
                 describes Hylas, a tool for automatically optimising
                 Spark queries embedded in source code via the
                 application of semantics-preserving transformations.
                 The transformation method is inspired by functional
                 programming techniques of deforestation, which
                 eliminate intermediate data structures from a
                 computation. This contrasts with approaches defined
                 entirely within structured query formats such as Spark
                 SQL. Hylas can identify certain computationally
                 expensive operations and ensure that performing them
                 creates no superfluous data structures. This
                 optimisation leads to significant improvements in
                 execution time, with over 10,000 times improvement
                 observed in some cases.",
  notes =        "Hylas, Scala refelection 2013 'Infection Discovery
                 using DNS Data' challenge of the Los Alamos National
                 Laboratory, USA.

                 GECCO 2016 Workshop

Genetic Programming entries for Zoltan Kocsis John H Drake Douglas Carson Jerry Swan