Another (Simple) Text Categorization with SVM

Revised by Michael Fong
(originally of Com Sci 573 Project)

December 24, 2008

Contents

1 Introduction
 1.1 Text Categorization Tasks
2 Classification
 2.1 Support Vector Machine
 2.2 Category Selection - Evaluated by Information Gain
 2.3 Feature Weighing
  2.3.1 Term Frequency Inverse Document Frequency (TFIDF)
3 Experiment
 3.1 Data Collection
 3.2 Architecture
  3.2.1 Feature Selection
 3.3 Required Libraries
 3.4 Experimental Results
4 Conclusion and Future Work
5 Acknowledgement


References

[1]   I. Pilszy, “Text categorization and support vector machines,” 2005.

[2]   V. N. Vapnik, The nature of statistical learning theory. New York, NY, USA: Springer-Verlag New York, Inc., 1995.

[3]   T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” 1998.

[4]   Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in ICML ’97: Proceedings of the Fourteenth International Conference on Machine Learning, (San Francisco, CA, USA), pp. 412–420, Morgan Kaufmann Publishers Inc., 1997.


Project Log

12/24/08 - Add command parser (apaches.common.cli)
12/23/08 - Update the webpage content.
12/22/08 - Add 20-categories dataset, and the project will mainly use this dataset from now on.
12/21/08 - Reframe the web page via tex4ht.
08/04/08 - Project web page set up.
05/02/08 - Source files uploaded


Source files

Source , updated on 12/24/08
Report , submitted on 05/02/08
Presentation , presented on 05/02/08