Skip to Main content Skip to Navigation
Conference papers

Représentations lexicales pour la détection non supervisée d'événements dans un flux de tweets : étude sur des corpus français et anglais

Abstract : In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem. Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal-centralesupelec.archives-ouvertes.fr/hal-02432990
Contributor : Béatrice Mazoyer <>
Submitted on : Wednesday, January 8, 2020 - 5:57:12 PM
Last modification on : Thursday, July 2, 2020 - 9:12:02 AM
Long-term archiving on: : Thursday, April 9, 2020 - 11:21:06 PM

Files

EGC_2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02432990, version 1
  • ARXIV : 2001.04139

Citation

Béatrice Mazoyer, Nicolas Hervé, Céline Hudelot, Julia Cage. Représentations lexicales pour la détection non supervisée d'événements dans un flux de tweets : étude sur des corpus français et anglais. Extraction et Gestion des connaissances, EGC 2020, Jan 2020, Bruxelles, France. ⟨hal-02432990⟩

Share

Metrics

Record views

86

Files downloads

101