, Modèles : bert-large, uncased et bert-base, multilingual cased 13
, nous cherchons à sélectionner le meilleur type de plongement lexical pour une tâche de détection non-supervisée d'événements dans un flux de tweets, que nous modélisons par un clustering dynamique. Nous montrons, sur un corpus en anglais et un corpus en français, qu'une représentation des tweets par tf-idf permet d'obtenir les meilleurs résultats par rapport à Word2Vec, BERT, ELMo, Universal Sentence Encoder ou Sentence-BERT. Nous montrons également qu'un fine-tuning sur un corpus de quelques centaines de paires de phrases annotées selon leur similarité thématique améliore de deux points les résultats de Sentence-BERT, /UKPLab/sentence-transformers. Modèle : bert-large-nli-stsb-mean-tokens Dans cet article
Detections, bounds, and timelines: Umass and tdt-3, Proc. of Topic Detection and Tracking workshop, pp.167-174, 2000. ,
Beyond trending topics: Real-world event identification on twitter, Fifth international AAAI conference on weblogs and social media, 2011. ,
Universal sentence encoder, 2018. ,
Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation, Proc. of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp.55-64, 2018. ,
Supervised learning of universal sentence representations from natural language inference data, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01897968
, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018.
Scale-invariance of support vector machines based on the triangular kernel, 3rd International Workshop on Statistical and Computational Theories of Vision, pp.1-13, 2003. ,
URL : https://hal.archives-ouvertes.fr/inria-00071984
Multimedia lab @ acl wnut ner shared task: Named entity recognition for twitter microposts using distributed word representations, Proc. of Workshop on Noisy User-generated Text, pp.146-153, 2015. ,
, Distributional structure. Word, vol.10, pp.146-162, 1954.
TwitterNews: Real time event detection from the Twitter data stream, PeerJ PrePrints, 2016. ,
Billion-scale similarity search with gpus, IEEE Transactions on Big Data, 2019. ,
Skip-thought vectors, Advances in neural information processing systems, pp.3294-3302, 2015. ,
Building a large-scale corpus for evaluating event detection on twitter, Proc. of ACM-CIKM, pp.409-418, 2013. ,
Efficient estimation of word representations in vector space, 2013. ,
Glove: Global vectors for word representation, Proc. of EMNLP, pp.1532-1543, 2014. ,
, Deep contextualized word representations, 2018.
Streaming first story detection with application to Twitter, Proc. of NAACL, pp.181-189, 2010. ,
Free-marginal multirater kappa: An alternative to fleiss' fixed-marginal multirater kappa, 2005. ,
, Sentence-bert: Sentence embeddings using siamese bertnetworks, 2019.
Extracting news events from microblogs, Journal of Statistics and Management Systems, vol.21, issue.4, pp.695-723, 2018. ,
Twitterstand: news in tweets, Proc. of ACM-GIS, pp.42-51, 2009. ,
A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, vol.28, issue.1, pp.11-21, 1972. ,
Attention is all you need, Advances in neural information processing systems, pp.5998-6008, 2017. ,
, Glue: A multitask benchmark and analysis platform for natural language understanding, 2018.
A study of retrospective and on-line event detection, Proc. of ACM-SIGIR, pp.28-36, 1998. ,