4 Conclusion and Future Work

Obviously, the current tree only expands to one side (right) and hence the misclassified instances(documents) are not considered for reclassification into the correct class. The accuracy of the classifier could be improved further by considering those false positives for reclassification.

Moreover, we could conduct more complex combination with other statistical evaluation methods, for instance, χ2 test, Mutual Information, and etc. Similarlily, we can even use different classifier, for example, Naive Bayes approach. So that, the goal is to find which combination improves the efficiency of the classifier to the maximum.

Last but not least, feature selection is also worth for further investigate for this project. Many interesting studies have suggested that various different feature selection method may be the most suitable for this data source, and as a result, improvement to the performance[4].

In short, our project demonstrates one combination of SVM and information gain to solve this problem in the scenario of information organization and management.