Design of a Multi-Strategy Parallelization for an Entire Application of Document Categorization on Low-Cost Multiprocessor PCs

Abstract : This paper introduces a research about parallelization of an entire application of Document- Categorization. The objective of this parallel computing research is to obtain a parallelization that can be successfully used on low cost and largely diffused shared memory multiprocessor PCs (not only on powerful and expensive supercomputers), and without any change in the input, output and user interface of the application (under Windows OS). This is a first step toward a parallelization on a cluster of multiprocessor PC, a more generic and still low cost parallel architecture. In this article, we describe parallel algorithms and programming technics we have designed to reach good performances on low cost but limited PC architecture. This leads us to introduce different parallelization strategies, for the different parts of the application, dealing with numerous disk accesses and the variety of configurations chosen by the users. Each parallelization is described and evaluated, and global performances of the final mix are introduced on 4-processor PC with SCSI disk technology and on a more recent 2-processor PC with IDE disk technology, leading to different but significant decreases of execution time. Then we can upgrade regularly our parallel machines to remain competitive compared to new sequential machines, because their low cost allows frequent upgrade and we always reach interesting speed up. The chosen application has been first designed to easily evaluate some classification algorithms (useful to Text-Mining researchers), and second to detect errors in previous manually categorizations and to advise some changes (useful to end-users).
Type de document :
Article dans une revue
Liste complète des métadonnées

https://hal-centralesupelec.archives-ouvertes.fr/hal-01301161
Contributeur : Stéphane Vialle <>
Soumis le : lundi 11 avril 2016 - 16:54:49
Dernière modification le : jeudi 5 avril 2018 - 12:30:25

Identifiants

  • HAL Id : hal-01301161, version 1

Citation

Stéphane Vialle, Guillaume Schaeffer, Michel Ianotto. Design of a Multi-Strategy Parallelization for an Entire Application of Document Categorization on Low-Cost Multiprocessor PCs. Studia Informatica Universalis, Hermann, 2004, 3 (1), pp.61-84. 〈http://studia.complexica.net/index.php?option=com_content&view=article&id=66%3Aarticle-4&catid=36%3Anumber-1&Itemid=72&lang=fr〉. 〈hal-01301161〉

Partager

Métriques

Consultations de la notice

104