-
oa Conceptual Weighted Feature Extraction and Support Vector Model: A Good Combination for Text Categorization
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2011 Issue 1, Nov 2011, Volume 2011, CSOS4
Abstract
While weighted features are known in information retrieval (IR) systems to be used for increasing recall during the document selection step, conceptual methods helped for finding good features. Starting from the features of a sample of Arabic news belonging to k different financial categories, and using the support vector model (SVM), k(k-1) classifiers are generated using one-against-one classification. A new document is submitted to k(k-1) different classifiers then by using the voting heuristic, is assigned to the most selected category. Categorization results obtained for two different methods for feature extraction: one based on the optimal concepts and the other based on isolated labels, proved that isolated labels generate better feature, because of the specificity of the selected features. Therefore, we can say that the quality of the feature, added to weighting methods, using SVM is an important factor for a more accurate classification. The proposed method based on isolated labels gives a good classification rate of Arabic news greater than 80% in the financial domain for five categories. Generalized to English Texts and to more categories, it becomes a good preprocessing filtering preceding automatic annotation step, and therefore helps for more accurate event structuring. Here attached a figure showing the different steps of the new categorization method.