Learning to Detect Web Spam by Genetic Programming

  author =       "Xiaofei Niu and Jun Ma and Qiang He and 
                 Shuaiqiang Wang and Dongmei Zhang",
  booktitle =    "Web-Age Information Management, 11th International
                 Conference, {WAIM} 2010, Jiuzhaigou, China, July 15-17,
                 2010. Proceedings",
  publisher =    "Springer",
  year =         "2010",
  volume =       "6184",
  editor =       "Lei Chen and Changjie Tang and Jun Yang and 
                 Yunjun Gao",
  isbn13 =       "978-3-642-14245-1",
  pages =        "18--27",
  series =       "Lecture Notes in Computer Science",
  URL =          "http://dx.doi.org/10.1007/978-3-642-14246-8",
  DOI =          "doi:10.1007/978-3-642-14246-8_5",
  keywords =     "genetic algorithms, genetic programming",
  abstract =     "Web spam techniques enable some web pages or sites to
                 achieve undeserved relevance and importance. They can
                 seriously deteriorate search engine ranking results.
                 Combating web spam has become one of the top challenges
                 for web search. This paper proposes to learn a
                 discriminating function to detect web spam by genetic
                 programming. The evolution computation uses
                 multi-populations composed of some small-scale
                 individuals and combines the selected best individuals
                 in every population to gain a possible best
                 discriminating function. The experiments on
                 WEBSPAM-UK2006 show that the approach can improve spam
                 classification recall performance by 26percent,
                 F-measure performance by 11percent, and accuracy
                 performance by 4percent compared with SVM.",
