-
oa Conceptual approach for multi-level restructuring of categorized documents in a corpus
- Publisher: Hamad bin Khalifa University Press (HBKU Press)
- Source: Qatar Foundation Annual Research Forum Proceedings, Qatar Foundation Annual Research Forum Volume 2010 Issue 1, Dec 2010, Volume 2010, CSP2
Abstract
In order to improve the browsing activity in a documentary database, we propose a conceptual approach for multi-level restructuring of categorized documents in a corpus. Starting from a manual and static organized corpus, based on the domain ontology, we derive new dynamically generated structures embedded in the static one. We use a conceptual recursive indexing method based on the selection of the minimal number of concepts covering either a document or a subset of documents corresponding to a sub-corpus. Hence, our system provides an additional browsing feature to the user, by dynamically providing the system with a conceptual structure of clusters of documents. For illustration, you may find in the figure an application to Arabic financial news for a particular ontology.
Therefore, one finds sub-category under the category . Also, under the category, etc. In parallel with the classical browser system, indexing words, provided for each level, give the user more details about the file's content, as well as the category content, before further exploration. Our approach improves human-computer interaction by decreasing the browsing time. Assessment of the proposed method proves that combining manual documents categorizations, with the automatic feature generations, gives a flexible and effective structured browsing interface to the users. Finally, low-level features help for incrementally placing new documents in the right category, by using suitable supervised classification methods.