Search engine case study: searching the web using genetic programming and MPI

  author =       "Reginald L. Walker",
  title =        "Search engine case study: searching the web using
                 genetic programming and {MPI}",
  journal =      "Parallel Computing",
  volume =       "27",
  pages =        "71--89",
  year =         "2001",
  number =       "1-2",
  month =        jan,
  keywords =     "genetic algorithms, genetic programming, Distributed
                 computing, Information retrieval, World Wide Web,
                 Search engines",
  URL =          "",
  ISSN =         "0167-8191",
  DOI =          "doi:10.1016/S0167-8191(00)00089-2",
  abstract =     "The generation of a Web page follows distinct sources
                 for the incorporation of information. The earliest
                 format of these sources was an organized display of
                 known information determined by the page designers'
                 interest and/or design parameters. The sources may have
                 been published in books or other printed literature, or
                 disseminated as general information about the page
                 designer. Due to a growth in Web pages, several new
                 search engines have been developed in addition to the
                 refinement of the already existing ones. The use of the
                 refined search engines, however, still produces an
                 array of diverse information when the same set of
                 keywords are used in a Web search. Some degree of
                 consistency in the search results can be achieved over
                 a period of time when the same search engine is used,
                 yet, most initial Web searches on a given topic are
                 treated as final after some form of
                 refinement/adjustment of the keywords used in the
                 search process. To determine the applicability of a
                 genetic programming (GP) model for the diverse set of
                 Web documents, search strategies behind the current
                 search engines for the World Wide Web were studied. The
                 development of a GP model resulted in a parallel
                 implementation of a pseudo-search engine indexer
                 simulator. The training sets used in this study
                 provided a small snapshot of the computational effort
                 required to index Web documents accurately and
                 efficiently. Future results will be used to develop and
                 implement Web crawler mechanisms that are capable of
                 assessing the scope of this research effort. The GP
                 model results were generated on a network of SUN
                 workstations and an IBM SP2.",

