Stylistic Structures A Computational Approach to Text Classification

Created by W.Langdon from gp-bibliography.bib Revision:1.4504

  author =       "Richard S. Forsyth",
  title =        "Stylistic Structures A Computational Approach to Text
  school =       "University of Nottingham",
  year =         "1995",
  address =      "UK",
  month =        oct,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  URL =          "",
  URL =          "",
  size =         "297 pages",
  abstract =     "The problem of authorship attribution has received
                 attention both in the academic world (e.g. did
                 Shakespeare or Marlowe write Edward III ?) and outside
                 (e.g. is this confession really the words of the
                 accused or was it made up by someone else?). Previous
                 studies by statisticians and literary scholars have
                 sought 'verbal habits' that characterize particular
                 authors consistently. By and large, this has meant
                 looking for distinctive rates of usage of specific
                 marker words -- as in the classic study by Mosteller
                 and Wallace of the Federalist Papers.

                 The present study is based on the premiss that
                 authorship attribution is just one type of text
                 classification and that advances in this area can be
                 made by applying and adapting techniques from the field
                 of machine learning.

                 Five different trainable text-classification systems
                 are described, which differ from current stylometric
                 practice in a number of ways, in particular by using a
                 wider variety of marker patterns than customary and by
                 seeking such markers automatically, without being told
                 what to look for. A comparison of the strengths and
                 weaknesses of these systems, when tested on a
                 representative range of text-classification problems,
                 confirms the importance of paying more attention than
                 usual to alternative methods of representing
                 distinctive differences between types of text.

                 The thesis concludes with suggestions on how to make
                 further progress towards the goal of a fully automatic,
                 trainable text-classification system.",
  notes =        "p222 GLADRAGS 'It may well prove to be a bridge
                 between the sometimes Procrustean simplicity of
                 traditional GA bitstrings and the somewhat unprincipled
                 representational liberalism of Genetic Programming as
                 advocated by \cite{koza:book} in particular.'

                 Richard Sandes Forsyth",

Genetic Programming entries for Richard Forsyth