Reproducing and learning new algebraic operations on word embeddings using genetic programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4472

  author =       "Roberto Santana",
  title =        "Reproducing and learning new algebraic operations on
                 word embeddings using genetic programming",
  howpublished = "arXiv",
  year =         "2017",
  month =        "18 " # feb,
  volume =       "abs/1702.05624",
  keywords =     "genetic algorithms, genetic programming",
  bibdate =      "2017-06-07",
  bibsource =    "DBLP,
  URL =          "",
  abstract =     "Word-vector representations associate a high
                 dimensional real-vector to every word from a corpus.
                 Recently, neural-network based methods have been
                 proposed for learning this representation from large
                 corpora. This type of word-to-vector embedding is able
                 to keep, in the learned vector space, some of the
                 syntactic and semantic relationships present in the
                 original word corpus. This, in turn, serves to address
                 different types of language classification tasks by
                 doing algebraic operations defined on the vectors. The
                 general practice is to assume that the semantic
                 relationships between the words can be inferred by the
                 application of a-priori specified algebraic operations.
                 Our general goal in this paper is to show that it is
                 possible to learn methods for word composition in
                 semantic spaces. Instead of expressing the
                 compositional method as an algebraic operation, we will
                 encode it as a program, which can be linear, nonlinear,
                 or involve more intricate expressions. More remarkably,
                 this program will be evolved from a set of initial
                 random programs by means of genetic programming (GP).
                 We show that our method is able to reproduce the same
                 behaviour as human-designed algebraic operators. Using
                 a word analogy task as benchmark, we also show that
                 GP-generated programs are able to obtain accuracy
                 values above those produced by the commonly used
                 human-designed rule for algebraic manipulation of word
                 vectors. Finally, we show the robustness of our
                 approach by executing the evolved programs on the
                 word2vec GoogleNews vectors, learned over 3 billion
                 running words, and assessing their accuracy in the same
                 word analogy task",
  notes =        "Python code available from

Genetic Programming entries for Roberto Santana