Abstract—This paper presents the performance comparison
of probabilistic classifiers with/without the help of various
boosting algorithms, in the Email Spam classification domain.
Our focus is on complex Emails, where most of the existing
classifiers fail to identify unsolicited Emails. In this paper we
consider two probabilistic algorithms i.e. “Bayesian” and
“Naive Bayes” and three boosting algorithms i.e. “Bagging”,
“Boosting with Re-sampling” and “AdaBoost”. Initially, the
Probabilistic classifiers were tested on the “Enron Dataset”
without Boosting and thereafter, with the help of Boosting
algorithms. The Genetic Search Method was used for selecting
the most informative 375 features out of 1359 features created
at the outset. The results show that, in identifying complex
Spam massages, “Bayesian classifier” performs better than
“Naive Bayes” with or without boosting. Amongst boosting
algorithms, „Boosting with Resample‟ has brought significant
performance improvement to the “Probabilistic classifiers”.
Index Terms—Unsolicited emails, probabilistic classifiers,
boosting algorithms.
The authors are with Indian Institute of Management, Indore, India
(email:f10shrawank@iimidr.ac.in).
[PDF]
Cite:Shrawan Kumar Trivedi and Shubhamoy Dey, "Interplay between Probabilistic Classifiers and Boosting Algorithms for Detecting Complex Unsolicited Emails," Journal of Advances in Computer Networks vol. 1, no. 2, pp. 132-136, 2013.