Spam or Ham

Michael Fong
mcfong@iastate.edu

December 19, 2008

Contents

1 Abstract
2 The SPAM Phenomenon
 2.1 Source of Spam
 2.2 Type of Spam
3 Spam Control
 3.1 Learning Based Spam Filtering
  3.1.1 Naive Bayesian Classifier
  3.1.2 K-Nearest Neighbor Algorithm
  3.1.3 Support Vector Machine
 3.2 Feature Selection
4 Adversarial Activity
 4.1 Remedies
5 Experiment
 5.1 Data-Set
 5.2 Flow of Preprocess Work
 5.3 Required Libraries
6 Classification Result
 6.1 Text Classification (Subject plus Body)
  6.1.1 Naive Bayes
  6.1.2 Support Vector Machine
 6.2 URL Classification (Email address plus URL address)
  6.2.1 Naive Bayes
  6.2.2 K Nearest Neighbor
7 Conclusion and Future Work

References

[1]   P. Graham, “A plan for spam http://www.paulgraham.com/spam.html,”

[2]   I. The Radicati Group, “Trend micro anti-spam innovative defense against evolving spam - a white paper,” tech. rep.

[3]   C. Dwork and M. Naor, “Pricing via processing or combatting junk mail,” pp. 139–147, Springer-Verlag, 1992.

[4]   P. Pantel and D. Lin, “Spamcop: A spam classification & organization program,” in In Learning for Text Categorization: Papers from the 1998 Workshop, pp. 95–98, 1998.

[5]   M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, “A bayesian approach to filtering junk e-mail,”

[6]   I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D. Spyropoulos, and P. Stamatopoulos, “Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach,” in Proceedings of the Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000, pp. 1–13, 2000.

[7]   V. N. Vapnik, The nature of statistical learning theory. New York, NY, USA: Springer-Verlag New York, Inc., 1995.

[8]   T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” 1998.

[9]   G. Rawlinson, “The significance of letter position in word recognition.” Unpublished PhD Thesis, 1976, Nottingham University, by Graham Rawlinson.


Project Log

11/29/08 - Project web page set up
12/12/08 - Project updates and deliverables uploaded


Source files

ComS572 Project Presentation , presented on 12/10/2008
ComS572 Project Report , submitted on 12/12/2008
Java Source Code and bundle of required libraries , submitted on 12/12/2008