-
oa Minimal Generators Based Algorithm for Text Features Extraction: A More Efficient and Large Scale Approach
- الناشر: Hamad bin Khalifa University Press (HBKU Press)
- المصدر: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2011 Issue 1, نوفمبر ٢٠١١, المجلد 2011, CSP8
ملخص
In the recent years, several mathematical concepts were successfully explored in computer science domain, as basis for finding original solutions for complex problems related to knowledge engineering, data mining, information retrieval, etc.
Thus, Relational Algebra (RA) and Formal Concept Analysis (FCA) may be considered as useful mathematical foundations that unified data and knowledge in information retrieval systems. As for example, some elements in a fringe relation (related to the RA domain) called isolated points were successfully of use in FCA as formal concept labels or composed labels. Once associated to words, in a textual document, these labels constitute relevant features of a text. Here, we propose the GenCoverage algorithm for covering a Formal Context (as a formal representation of a text) based on isolated labels and we use these labels (or text features) for categorization, corpus structuring and micro-macro browsing as an advanced functionality in the information retrieval task.
The main thrust of the introduced approach heavily relies on the snugness connection between isolated points and minimal generators (MGs). MGs stand at the antipodes of the closures within their respective equivalence classes. Relying on the fact the minimal generators are the smallest elements within an equivalence class, so their detection/traversal is largely eased and permits a swift building of the coverage. Thorough carried out experiments provide empirical evidences about the performances of our approach.