Design of a Multi-Strategy Parallelization for an Entire Application of Document Categorization on Low-Cost Multiprocessor PCs

Abstract : This paper introduces a research about parallelization of an entire application of Document- Categorization. The objective of this parallel computing research is to obtain a parallelization that can be successfully used on low cost and largely diffused shared memory multiprocessor PCs (not only on powerful and expensive supercomputers), and without any change in the input, output and user interface of the application (under Windows OS). This is a first step toward a parallelization on a cluster of multiprocessor PC, a more generic and still low cost parallel architecture. In this article, we describe parallel algorithms and programming technics we have designed to reach good performances on low cost but limited PC architecture. This leads us to introduce different parallelization strategies, for the different parts of the application, dealing with numerous disk accesses and the variety of configurations chosen by the users. Each parallelization is described and evaluated, and global performances of the final mix are introduced on 4-processor PC with SCSI disk technology and on a more recent 2-processor PC with IDE disk technology, leading to different but significant decreases of execution time. Then we can upgrade regularly our parallel machines to remain competitive compared to new sequential machines, because their low cost allows frequent upgrade and we always reach interesting speed up. The chosen application has been first designed to easily evaluate some classification algorithms (useful to Text-Mining researchers), and second to detect errors in previous manually categorizations and to advise some changes (useful to end-users).
Document type :
Journal articles
Complete list of metadatas

https://hal-centralesupelec.archives-ouvertes.fr/hal-01301161
Contributor : Stéphane Vialle <>
Submitted on : Monday, April 11, 2016 - 4:54:49 PM
Last modification on : Thursday, April 5, 2018 - 12:30:25 PM

Identifiers

  • HAL Id : hal-01301161, version 1

Citation

Stéphane Vialle, Guillaume Schaeffer, Michel Ianotto. Design of a Multi-Strategy Parallelization for an Entire Application of Document Categorization on Low-Cost Multiprocessor PCs. Studia Informatica Universalis, Hermann, 2004, 3 (1), pp.61-84. ⟨http://studia.complexica.net/index.php?option=com_content&view=article&id=66%3Aarticle-4&catid=36%3Anumber-1&Itemid=72&lang=fr⟩. ⟨hal-01301161⟩

Share

Metrics

Record views

132