Bibliographie
Vous pouvez télécharger directement
le fichier BibTeX qui a servi à générer cette page.
Vous trouverez également disponibles dans une
archive bien ronde les différents fichiers qui ont servi à générer cette page, en têtes HTML comprises.
Attention!
Cette page mélange allégrement l'anglais et le français.
L'outil qui génère la page s'exprime en anglais, et les titres et autres résumés d'articles sont généralement exprimés dans la langue de Shakespeare. Les mots clé (keywords, ou étiquettes (tags) sont tous exprimés en anglais californien qui sent le sable et le silicone.
Ces mots, ainsi que les commentaires sur les articles (reviews) sont en français. Je préfère, ca évite les ambiguités dus à un choix de mots un peu hâtif.
Nouveautés
La table des matières est maintenant triée par ordre alphabétique (pas les articles)
Les codes de références de chaque article (donnée en haut à droite) sont désormais cliquables, et ramènent à la table des matières
Most frequently used keywords:- data parallel programming (6)
- high performance computing (4)
- programming toolkit (3)
- generic programming (3)
- distributed memory architecture (2)
Reviewed articles: 6
Rated articles: 6
article:bal98approaches1998, IEEE Concurrency, vol 6, pp 74–84
AbstractLanguages that support both task and data parallelism are highly general and can exploit both forms of parallelism within a single application. However, integrating the two forms of parallelism cleanly and within a coherent programming model is difficult. This paper describes four languages (Fx, Opus, Orca and Braid) that try to achieve such an integration and identifies several problems. The main problems are how to support both SPMD and MIMD style programs, how to organize the address space of a parallel program, and how to design the integrated model such that it can be implemented efficiently.
Review4 Un article très utile pour moi : il donne quatre exemples d'applications qui conjuguent avec plus ou moins de succès le parallélisme de données et le parallélisme de tâches. A chaque fois, le design des applications est donné en illustration, une étude des performances et du fonctionnement est fourni, enfin une appréciation générale sur l'utilisabilité est donnée en conclusion.
techreport:seinstra2kparallelimagery2000, Intelligent Sensory Information Systems
Department of Computer Science
University of Amsterdam
The Netherlands
AbstractThis report describes a software architecture that enables transparent development of image processing applications for parallel computers. The architecture's main component is an extensive library of low level image processing operations capable of running on distributed memory MIMD-style parallel hardware. Since the library has an application programming interface identical to that of an existing sequential image processing library, the parallelism is completely hidden from the user.
The first part of the report discusses implementation aspects of the parallel library. It is shown how sequential as well as parallel operations are implemented on the basis of so-called parallelizable patterns. A library built in this manner is easily maintainable, as code redundancy is avoided as much as possible. The second part of the report describes the application of performance models to ensure efficiency of execution on a range of parallel machines. A high level abstract machine for parallel image processing, that serves as a basis for the performance models, is described as well. Experiments show that for a realistic image processing application performance predictions are highly accurate. These results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of image processing software.
Review4 Cet article présente une modélisation intérressante qui respecte les grands principes du génie logiciel. Le découplage entre algorithmes génériques et patterns de parallélisation présente un interêt particulier dans le cadre de la parallélisation d'algorithmes existants. Une section entière est reservée à la discussion autour des patterns de parallélisation.
L'article présente également une solution pour les besoins en optimisation, par le biais de "profils de performance", générés par des outils de mesure, fournis. Les différentes stratégies d'ordonnancement ne sont pas discutées, et c'est dommage. En revanche, le fait que l'architecture présentée soit orientée vers le traitement d'images ne semble pas nuire à la généricité du propos.
inproceedings:iwact01stapl2001, International Workshop on Advanced Compiler Technology for High Performance and Embedded Processors
Rating: 4/5
KeywordsSTAPL,
data parallel programming,
distributed memory architecture,
distributed types,
object-oriented programming,
programming toolkit,
adaptive scheduling,
Link(s)PDF WWW AbstractThe Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism in areas such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms, which use dynamically linked data structures.
STAPL provides several different algorithms for some library routines, and selects among them adaptively at run-time. STAPL can replace STL automatically by invoking a preprocessing translation phase. The performance of translated code is close to the results obtained using STAPL directly (less than 5% performance deterioration). However, STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.
Review4 STAPL est une implémentation parallèle de la Standard Templates Library (STL). Les auteurs respectent certains concepts qui sont utilisés par la STL, comme la définition d'itérateurs compatibles. La modélisation de cette bibliothèque autour des conteneurs, des itérateurs (sous formes de ranges) et des algorithmes, étant proche des standards, est très intérressante. Trois conteneurs parallèles sont fournis, équivalents à leurs homologues STL : vecteur, liste et arbre. La plupart des algorithmes "dont la parallélisation est pertinente" sont également implémentés.
Leur run-time system est complexe. D'une part un "parallel region manager" sert de front-end aux mécanismes de threading via pthreads ou autres. Un ordonnanceur (scheduler/distributor) gère la répartition des données entre les noeuds. Enfin, un executeur (tremblez mortels - aka 'executor') est chargé d'executer les tâches sur les noeuds une fois les données mises en place. Enfin, STAPL utilise l'allocateur de mémoire parallèle HOARD, qui est un travail antérieur.
Le dernier volet traite de la performance de STAPL, en illustrant les méthodes adaptatives dans le choix des méthodes d'ordonnancement. Je n'ai pas pris la peine de vérifier précisément l'exactitude de leurs résultats pour le moment.
incollection:reynders96poomaUnexported (unhandled) reference...
Rating: 3/5
KeywordsSPMD paradigm,
high performance computing,
programming toolkit,
Link(s)PDF WWW Review3 Attention, POOMA n'existe plus sur le net. Il semblerait que ce projet, sous ce nom, se soit arrêté.
POOMA est une bibliothèque de classes, a priori sans trace de programmation générique. Le développement de "Global Data Types" ou GDT est intérressant en cela qu'il permet d'abstraire la réalité d'une valeur aux yeux d'un utilisateur, permettant par exemple de manipuler un aggrégat ou un ensemble de valeurs, quelle que soit leur localité. Le découpage se fait entre une couche "Globale", en charge de la valeur "globalement typée" et entre une couche "locale", qui représente tout ou partie de la valeur globale.
En revanche, le fait que la pluspart des types soient orientés vers les applications scientifiques de simulation limite sérieusement l'utilité de POOMA sorti de son domaine. Pire, de l'aveu des auteurs du papier, POOMA est plutot orienté vers les supercalculateurs, même s'il semblerait qu'une version basée sur MPI de leur module de communication ait existé.
article:bova99mpidirectives1999, SIAM News, vol 32, pp
Rating: 3/5
KeywordsMPI,
OpenMP,
high performance computing,
benchmarking,
Link(s)PDF WWW AbstractDeveloppers of parallel applications can be faced with the problem of combining the two dominant models for parallel processing - distributed-memory and shared-memory parallelism - within one source code. In this article we discuss why it is useful to combine these two programming methodologies, both of which are supported on most high-performance computers, and some of the lessons we learned in work on five applications. All our applications make use of two programming models: message-passing, as represented by the PVM or MPI libraries, and the shared-memory style, as represented by the OpenMP directive standard. In all but one of the applications, we use these two models to exploit computer architectures that include both shared- and distributed-memory systems. Message-passing is used to coordinate coarse-grained parallel tasks across distributed compute nodes, whereas OpenMP exploits parallelism within multi-processor nodes. One of our applications, SPECseis96, implements message-passing and shared-memory directives at equal levels, which allows us to compare the performance of the two models.
Review3 Cet article réalise des mesures de performances sur des solutions de parallélisation conjuguant OpenMP et MPI. Les chiffres sont jolis, ca marche bien .. Pas de détails sur l'implémentation, si ce n'est que les interfaces de transfert de message (MPI) sont monothread. J'assimile ça à un fonctionnement "représentant-représenté".
techreport:fletcher03parallelstl2003, Computer Science Department, Indiana University
Rating: 2/5
Keywordsdata parallel programming,
generic programming,
STL,
Link(s)PDF WWW AbstractWhile tremendous progress has been made in developing parallel algorithms, there has not been as much success in developing language support for programming these parallel algorithms. The C++ Standard Template Library (STL) provides an opportunity for extending the concept of generic programming to the parallel realm. This paper discusses the basic requirements for extending STL to provide support for data-parallelism in C++. The ultimate goal is to implement a parallel library that is built within the existing framework of STL and exploits parallelism in existing sequential algorithms and also provides a set of parallel algorithms.
Review2 Un état de l'art assez long sur les solutions existantes. La fin du papier décrit comment adapter la STL pour la rendre parallèle, en restant dans ses spécifications (contrairement à STAPL, qui par exemple les étend). Deux axes sont envisagés: un allocateur de mémoire distribuée couplé avec des itérateurs "intelligents", ou des conteneurs de données pour utilisation parallèle.
Les distributions de données sont laissées à la charge de l'utilisateur. Leur implémentation est basée sur MPI. Pas de détails supplémentaires dans l'article, c'est un peu léger en comparaison avec STAPL. Le mérite de leur approche est de ne pas changer les conteneurs auxquels l'utilisateur est habitué. (Est-ce vraiment possible? N'est-ce pas trop contraignant ?)
conference:karmesin98pooma2Unexported (unhandled) reference...
article:foster97globus1997, The International Journal of Supercomputer Applications and High Performance Computing, vol 11, pp 115-128
Keywordsmetacomputing,
high performance computing,
programming toolkit,
Link(s)PDF WWW
techreport:sheffler95portable1995, RIACS
AbstractThis paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
techreport:gerlach98janus1998, Tsukuba Research Center of the Real World Computing Partnership
Keywordsdata parallel programming,
object-oriented programming,
generic programming,
finite element methods,
data parallel programming,
Link(s)PDF AbstractWe propos Janus - a C++ template library of container classes and communication primitives for parallel dynamic mesh applications. The paper focuses on two phase containers that are a central component of the Janus framework. These containers are quasi-constant, i.e., they have an extended initialization phase after which they provide read-only access to their elements. Two phase containers are useful for the efficient and easy-to-use representation of finite element meshes and generating sparsematrices. Using such containers makes it easy to encapsulate irregular communication patterns that occur when running finite element programs in parallel.
inproceedings:siu97paralleldp1997, IEEE? 1082-8907/97
AbstractHigh complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing. To deal with the complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. This paper describes the concept of a design pattern for the development of parallel applications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallel applications.
A parallel programming system, called DPnDP (Design Patterns and Distributed Processes), that employs such design pattems is described. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization structures. The uniqueness of OUT approach i s in the use of a standard structure and interface for a design pattern. This has several important implications: First, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (Extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and eficient parallel programming tool (Flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior.
misc:sankaralingam03universal2003
AbstractData-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at speci?c classes of these programs. This paper presents a classi?cation scheme for data-parallel program attributes, and proposes micro-architectural mechanisms to support applications with diverse behav
ior using a single recon?gurable architecture. We focus on the following four broad kinds of data-parallel programs ? DSP/multimedia, scienti?c, networking, and
real-time graphics workloads. While all of these programs exhibit high computational intensity, coarse-grain regular control behavior, and some regular memory access behavior, they show wide variance in the computation requirements, ?ne grain control behavior, and the frequency of other types of memory accesses. Based on this study of application attributes, this paper proposes a set of general micro-architectural mechanisms that enable a baseline architecture to be dynamically tailored to the demands of a particular application. These mechanisms provide ef?cient execution across a spectrum of data-parallel applications and can be applied to diverse architectures ranging from vector cores to conventional superscalar cores. Our results using a baseline TRIPS processor show that the con?gurability of the architecture to the application demands provides harmonic mean performance improvement of 5%?55% over scalable yet less ?exible architectures, and performs competitively against specialized architectures.
inproceedings:bi97objectoriented1997, Proceedings of International conference on parallel and distributed processing techniques and applications (PDPTA)
Keywordsdata parallel programming,
SMPD paradigm,
distributed memory parallel architecture,
Link(s)PDF
techreport:siek98matrix1998, Laboratory for Scientific Computing
Department of Computer Science and Engineering
University of Notre Dame
Keywordsgeneric programming,
high performance computing,
Link(s)PDF WWW AbstractWe present a unified approach for expressing high performance numerical linear algebra routines for large classes of dense and sparse matrices. As with the Standard Template Library [10], we explicitly separate algorithms from data structures through the use of generic programming techniques. We conclude that such an approach does not hinder high performance. On the contrary, writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures. We also tackle the performance portability problem for particular architecture dependent algorithms such as matrix-matrix multiply. Recently, code generation systems (PHiPAC [3] and ATLAS [15]) have been created to customize the algorithms according to architecture. A more elegant approach is to use template metaprograms [18] to allow for variation. In this paper we introduce the Basic Linear Algebra Instruction Set (BLAIS), a collection of high performance kernels for basic linear algebra.
techreport:beckman96tulip1996, Computer Science Department, Indiana University
AbstractThis paper describes Tulip, a parallel run-time system used by the pC++ parallel programming language. Tulip has been implemented on a variety of scalable, MPP computers including the IBM SP2, Intel Paragon, HP/Convex SPP, Meiko CS2, SGI Power Challenge, and Cray T3D. Tulip differs from other data-parallel RTS implementations; it is designed to support the operations from object-parallel programming that require remote member function execution and load and store operations on remote objects. It is designed to provide the thinnest possible layer atop the vendor-supplied machine interface. That thin veneer can then be used by other run-time layers to build machine-independent class libraries, compiler back ends, and more sophisticated run-time support. Some preliminary performance measurements for the IBM SP2, SGI Power Challenge, and Cray T3D are given.
techreport:johnson97hpc1997, Department of Computer Science, Indiana University
AbstractHPC++ is a C++ library and language extension framework that is being developed by the HPC++ consortium as a standard model for portable parallel C++ programming. This paper describes an initial implementation of the HPC++ Parallel Standard Template Library (PSTL) framework. This implementation includes seven distributed containers as well as selected algorithms. We include preliminary performance results from several experiments using the PSTL.
techreport:kumar87parallelsearch1987, Department of Computer Sciences, University of Texas at Austin
article:itzkovitz98thread1998, The Journal of Systems and Software, vol 42, pp 71–87
techreport:winter94parallelengineering1994, University of Westminster, Centre for Parallel Computing, London
KFKI-MSZKI, Centre for Parallel Computing, Budapest
article:jarvi04generic2005, Int. J. Parallel Program., vol 33, pp 145–164
AbstractGeneric programming is an attractive paradigm for developing libraries for high-performance computing because of the simultaneous emphases placed on generality and efficiency.
In this approach, interfaces are based on sets of specified requirements on types, rather than on any particular type, allowing algorithms to inter-operate with any data type meeting the necessary requirements. These sets of requirements, known as concepts, can specify syntactic as well as semantic requirements. Although concepts are fundamental to generic programming, they are not supported as first-class entities in mainstream programming languages, thus limiting the degree to which generic programming can be effectively applied. In this paper we advocate better syntactic and semantic support for concepts and describe some straightforward language features that could better support them. We also briefly discuss uses for concepts beyond their use in constraining polymorphism.
techreport:Wu19961996, Department of Computer Science, State University of New York at Buffalo
AbstractThe dimension exchange method (DEM) was initially proposed as a load-balancing algorithm for the hypercube structure. It has been generalized to k-ary n-cubes. However, the k-ary n-cube algorithm must take many iterations to converge to a balanced state. In this paper, we propose a direct method to modify DEM. The new algorithm, Direct Dimension Exchange (DDE) method, takes load average in every dimension to eliminate unnecessary load exchange. It balances the load directly without iteratively exchanging the load. It is able to balance the load more accurately and much faster.
techreport:hernandez05gridcomp2005, Department of Computer and Information Sciences
University of Alabama at Birmingham
Keywordsgrid computing,
end-user tools,
Link(s)PDF WWW AbstractThe present work describes an approach to simplifying the development and deployment of applications for the Grid. Our approach aims at hiding accidental complexities (e.g., low-level Grid technologies) met when developing these kinds of applications. To realize this goal, the work focuses on the development of end-user tools using concepts of domain engineering and domain-specific modeling which are modern software engineering methods for automating the development of software. This work is an attempt to contribute to the long term research goal of empowering users to create complex applications for the Grid without depending on the expertise of support teams or on hand-crafted solutions.
techreport:giloi95promoter1995, RWCP Massively Parallel Systems GMD Laboratory
Keywordscoordination schemes,
distributed memory architecture,
high-level programming model,
parallelizing compiler,
distributed types,
Link(s)PDF AbstractThe superior performance and cost-effectiveness of scalable, distributed memory parallel computers will only then become generally exploitable if the programming difficulties with such machines are overcome. We see the ultimate solution in high-level programming models and appropriate parallelizing compilers that allow the user to formulate a parallel program in terms of application-specific concepts, while low-level issues such as optimal data distribution and coordination of the parallel threads are handled by the compiler. High Performance Fortran (HPF) is a major step in that direction; however, HPF still lacks in the generality of computing domains needed to treat other than regular, data-parallel numerical applications. A more flexible and more abstract programming language for regular and irregular object-parallel applications is PROMOTED. PROMOTED allows the user to program for an application-oriented abstract machine rather than for particular architecture. The wide semantic gap between the abstract machine and the concrete message-passing architecutre is closed by the compiler. Hence, the issues of data distribution, communication, and coordination (thread scheduling) are hidden from the user. The paper presents the underlying concepts of PROMOTER and the corresponding language concepts. The PROMOTER compiler translates the parallel program written in terms of distributed types into parallel threads and maps those optimally onto the nodes of the physical machine. The language constructs and their use, the tasks of the compiler, and the challenges encountered in its implementation are discussed.
techreport:jesshopeXXasynchrony?, Department of Electronicand Electrical Engineering, University of Surrey
phdthesis:parkes94concurrentoopUnexported (unhandled) reference...
AbstractDespite increasing availability, the use of parallel platforms in the solution of significant computing problems remains largely restricted to a set of well-structured, numeric applications. This is due in part to the difficulty of parallel application development, which is itself largely the result of a lack of high-level development environments applicable to the majority of existant parallel architectures. This thesis addresses the issue of facilitating the application of parallel platforms to unstructured problems through the use of object-oriented design techniques and the actor model of concurrent computation. We present a multilevel approach to expressing parallelism for unstructured applications: a high-level interface based on the actor and aggregate models of concurrent object-oriented programming, and a low-level interface which provides an object-oriented interface to system services across a wide range of diverse parallel architectures.The interfaces are manifested in the
ProperCAD II library, a C++ object library supporting actor concurrency on microprocessor-based parallel architectures and appropriate for applications exhibiting medium-grain parallelism. The interface supports uniprocessors, shared memory multiprocessors, distributed memory multicomputers, and hybrid architectures comprising network-connected clusters of uni- and multiprocessors. The library currently supports workstations from Sun, shared memory multiprocessors from Sun and Encore, distributed memory multicomputers from Intel and Thinking Machines, and hybrid architectures comprising IP network-connected clusters of Sun uni- and multiprocessors. We demonstrate our approach through an examination of the parallelization process for two existing unstructured serial applications drawn from the field of VLSI computer-aided design. We compare and contrast the library-based actor approach to other methods for expressing parallelism in C++.
misc:bikshandi06programming2006
Keywordsdata parallel programming,
Link(s)PDF WWW
techreport:littin98block1998, Department of Computer Science,
University of Waikato,
Hamilton, New Zealand
AbstractA fixed-length block-based instruction set architecture (ISA) based on dataflow techniques is described. This ISA is the compared and contrasted to those of more conventional architectures and other developmental architectures. A control mechanism to allow blà ocks to be executed in parallel, so that the original control flow is maintained, is presented. A brief description of the hardware required to realize this mechanism is given.
inproceedings:berger00hoard2000, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX)
AbstractParallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.
This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory ef?cient. Hoard is the ?rst allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.
inproceedings:rauchwerger98stapl1998, Wkshp. on Lang. Comp. and Run-time Sys. for Scal. Comp. (LCR).
AbstractSTAPL (Standard Adaptive Parallel Library) is a parallel C++ library designed as a superset of the STL, sequentially consisten for functions with the same name, and executes on uni- or multiprocessors. STAPL is implemented using simple parallel extensions of C++ which provide a SPMD model of parallelism supporting recursive parallelism. The library is intended to be of generic use but emphasizes irregular, non-numeric programs to allow the exploitation of parallelism in areas such as geometric modeling or graph algorithms which use dynamic linked data structures. Each library routine has several different algorithmic options, and the choice among them will be made adaptively based on a performance model, statistical feedback, and current run-time conditions. Built-in performance monitors can measure actual performance and, using an extension of the BSP model predict the relative performance of the algorighmic choices for each library routine. STAPL is intended to possibly replace STL in a user transparant mannel and run on small to medium scale shared memoery multiprocessors which support OpenMP.
techreport:Asanovic:EECS-2006-1832006, EECS Department, University of California, Berkeley
article:xu92analysis1992, Journal of Parallel and Distributed Computing, vol 16, pp 385–393