1887

Abstract

Abstract

While weighted features are known in information retrieval (IR) systems to be used for increasing recall during the document selection step, conceptual methods helped for finding good features. Starting from the features of a sample of Arabic news belonging to k different financial categories, and using the support vector model (SVM), k(k-1) classifiers are generated using one-against-one classification. A new document is submitted to k(k-1) different classifiers then by using the voting heuristic, is assigned to the most selected category. Categorization results obtained for two different methods for feature extraction: one based on the optimal concepts and the other based on isolated labels, proved that isolated labels generate better feature, because of the specificity of the selected features. Therefore, we can say that the quality of the feature, added to weighting methods, using SVM is an important factor for a more accurate classification. The proposed method based on isolated labels gives a good classification rate of Arabic news greater than 80% in the financial domain for five categories. Generalized to English Texts and to more categories, it becomes a good preprocessing filtering preceding automatic annotation step, and therefore helps for more accurate event structuring. Here attached a figure showing the different steps of the new categorization method.

Loading

Article metrics loading...

/content/papers/10.5339/qfarf.2011.CSOS4
2011-11-20
2024-12-21
Loading full text...

Full text loading...

/content/papers/10.5339/qfarf.2011.CSOS4
Loading
/content/papers/10.5339/qfarf.2011.CSOS4
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error