Detecting phishing e-mails using text and data mining

  author =       "Mayank Pandey and Vadlamani Ravi",
  booktitle =    "Computational Intelligence Computing Research (ICCIC),
                 2012 IEEE International Conference on",
  title =        "Detecting phishing e-mails using text and data
  year =         "2012",
  DOI =          "doi:10.1109/ICCIC.2012.6510259",
  abstract =     "This paper presents text and data mining in tandem to
                 detect the phishing email. The study employs Multilayer
                 Perceptron (MLP), Decision Trees (DT), Support Vector
                 Machine (SVM), Group Method of Data Handling (GMDH),
                 Probabilistic Neural Net (PNN), Genetic Programming
                 (GP) and Logistic Regression (LR) for classification. A
                 dataset of 2500 phishing and non phishing emails is
                 analysed after extracting 23 keywords from the email
                 bodies using text mining from the original dataset.
                 Further, we selected 12 most important features using
                 t-statistic based feature selection. Here, we did not
                 find statistically significant difference in
                 sensitivity as indicated by t-test at 1percent level of
                 significance, both with and without feature selection
                 across all techniques except PNN. Since, the GP and DT
                 are not statistically significantly different either
                 with or without feature selection at 1percent level of
                 significance, DT should be preferred because it yields
                 'if-then' rules, thereby increasing the
                 comprehensibility of the system.",
  keywords =     "genetic algorithms, genetic programming,
                 Classification, Decision Tree, Group Method Of Data
                 Handling, Logistic regression, Multilayer Perceptron,
                 Phishing webpage, Probabilistic Neural Network, Support
                 Vector Machine, Text mining",
  notes =        "Also known as \cite{6510259}",

