Contextual Concept Discovery Algorithm - CentraleSupélec Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

Contextual Concept Discovery Algorithm

Lobna Karoui
  • Fonction : Auteur
Nacéra Bennacer Seghouani

Résumé

In this paper, we focus on the ontological concept extraction and evaluation process from HTML documents. In order to improve this process, we propose an unsupervised hierarchical clustering algorithm namely “Contextual Concept Discovery” (CCD) which is an incremental use of the partitioning algorithm Kmeans and is guided by a structural context. Our context exploits the html structure and the location of words to select the semantically closer cooccurrents for each word and to improve word weighting. Guided by this context definition, we perform an incremental clustering that refines the context of each word clusters to obtain semantically extracted concepts. The CCD algorithm offers the choice between either an automatic execution or a user's interaction. The last function of the CCD algorithm is to provide a complementary support for an easy evaluation task. This functionality is based on a large collection of web documents and several context definitions deduced from it by applying a linguistic and a documentary analysis. We experiment our algorithm on HTML documents related to the tourism domain. Our results show how the execution of our context-based improves the conceptual quality and the relevance of the extracted ontological concepts and how our credibility degree criterion assists the domain experts and facilitates the evaluation task.
Fichier non déposé

Dates et versions

hal-00218204 , version 1 (25-01-2008)

Identifiants

  • HAL Id : hal-00218204 , version 1

Citer

Lobna Karoui, Marie-Aude Aufaure, Nacéra Bennacer Seghouani. Contextual Concept Discovery Algorithm. FLAIRS 2007, May 2007, United States. pp.460-465. ⟨hal-00218204⟩
35 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More