Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents - CentraleSupélec Accéder directement au contenu
Communication Dans Un Congrès Année : 2008

Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents

Résumé

In this paper, we present SHIRI-Annot, an automatic ontology- driven and unsupervised approach for the semantic annotation of doc- uments which contain more or less structured parts. The aim of this approach is to build an integration system called SHIRI 3 which allows the user access to documents related to a specific domain. In this sys- tem, the querying process is guided by an ontology of the domain and the answers are only made of the pertinent parts of the documents unlike keywords-based search engines. The ontology is described using RDFS (Resource Description Framework Schema) language. The SHIRI-Annot approach consists of locating and then annotating concept instances and their semantic relations. The locating step combines existing annotation approaches in order to locate instances in the text. The annotation step exploits a set of metadata and a set of logical rule patterns which are automatically instanciated from the domain description. These metadata are provided from the ontology or are defined specifically for the annota- tion task. The resulting annotations are represented in RDF (Resource Description Framework) language. We show through a preliminary study made on a corpus of HTML documents the usefulness of these specific metadata to represent the heterogeneity of documents. We also illus- trate through examples how the SHIRI system exploits the metadata to approximate the user queries in order to provide more pertinent answers.

Domaines

Web
Fichier principal
Vignette du fichier
PapierSemma.pdf (731 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00293255 , version 1 (11-12-2009)

Identifiants

  • HAL Id : hal-00293255 , version 1

Citer

Mouhamadou Thiam, Nathalie Pernelle, Nacéra Bennacer Seghouani. Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents. 1st Workshop on Semantic Metadata Management and Applications (SeMMA 2008) at the 5 th European Semantic Web Conference (ESWC 2008), Jun 2008, Tenerife, Spain. pp.16-28. ⟨hal-00293255⟩
125 Consultations
338 Téléchargements

Partager

Gmail Facebook X LinkedIn More