Assessing Documents' Credibility with Genetic Programming

  title =        "Assessing Documents' Credibility with Genetic
  author =       "Joao Palotti and Thiago Salles and Gisele L. Pappa and 
                 Marcos A. Goncalves and Wagner {Meira, Jr.}",
  abstract =     "The concept of example credibility evaluates how much
                 a classifier can trust an example when building a
                 classification model. It is given by a credibility
                 function, estimated according to a series of factors
                 that influence the credibility of the examples, and is
                 context- dependent. Here we deal with automatic
                 document classification, and study the credibility of a
                 document according to three factors: content,
                 authorship and citations. We propose a genetic
                 programming algorithm to estimate the credibility of
                 training examples, which is then added to a
                 credibility-aware classifier. For that, we model the
                 authorship and citation data as a complex network, and
                 select a set of structural metrics that can be used to
                 estimate credibility. These metrics are then merged
                 with other content-related ones, and used as terminals
                 for the GP. The GP was tested in a subset of the
                 ACM-DL, and results showed that the credibility-aware
                 classifier obtained results of micro and macroF_1 from
                 5percent to 8percent better than the traditional
