Bibliographie

Vous pouvez télécharger directement le fichier BibTeX qui a servi à générer cette page.
Vous trouverez également disponibles dans une archive bien ronde les différents fichiers qui ont servi à générer cette page, en têtes HTML comprises.

Attention!

Cette page mélange allégrement l'anglais et le français.

L'outil qui génère la page s'exprime en anglais, et les titres et autres résumés d'articles sont généralement exprimés dans la langue de Shakespeare. Les mots clé (keywords, ou étiquettes (tags) sont tous exprimés en anglais californien qui sent le sable et le silicone.

Ces mots, ainsi que les commentaires sur les articles (reviews) sont en français. Je préfère, ca évite les ambiguités dus à un choix de mots un peu hâtif.

Nouveautés

La table des matières est maintenant triée par ordre alphabétique (pas les articles)

Les codes de références de chaque article (donnée en haut à droite) sont désormais cliquables, et ramènent à la table des matières

article:bal98approaches

Henri E. Bal and Matthew Haines

Approaches for Integrating Task and Data Parallelism

1998, IEEE Concurrency, vol 6, pp 74–84

Rating: 4/5
Link(s)PDF WWW

AbstractLanguages that support both task and data parallelism are highly general and can exploit both forms of parallelism within a single application. However, integrating the two forms of parallelism cleanly and within a coherent programming model is difficult. This paper describes four languages (Fx, Opus, Orca and Braid) that try to achieve such an integration and identifies several problems. The main problems are how to support both SPMD and MIMD style programs, how to organize the address space of a parallel program, and how to design the integrated model such that it can be implemented efficiently.

Review4 Un article très utile pour moi : il donne quatre exemples d'applications qui conjuguent avec plus ou moins de succès le parallélisme de données et le parallélisme de tâches. A chaque fois, le design des applications est donné en illustration, une étude des performances et du fonctionnement est fourni, enfin une appréciation générale sur l'utilisabilité est donnée en conclusion.

techreport:seinstra2kparallelimagery

F.J. Seinstra, D. KOelma and J.M. Geusebroek

A Software Architecture for User Transparent Parallel Image Processing

2000, Intelligent Sensory Information Systems Department of Computer Science University of Amsterdam The Netherlands

Rating: 4/5
Link(s)PDF

AbstractThis report describes a software architecture that enables transparent development of image processing applications for parallel computers. The architecture's main component is an extensive library of low level image processing operations capable of running on distributed memory MIMD-style parallel hardware. Since the library has an application programming interface identical to that of an existing sequential image processing library, the parallelism is completely hidden from the user.
The first part of the report discusses implementation aspects of the parallel library. It is shown how sequential as well as parallel operations are implemented on the basis of so-called parallelizable patterns. A library built in this manner is easily maintainable, as code redundancy is avoided as much as possible. The second part of the report describes the application of performance models to ensure efficiency of execution on a range of parallel machines. A high level abstract machine for parallel image processing, that serves as a basis for the performance models, is described as well. Experiments show that for a realistic image processing application performance predictions are highly accurate. These results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of image processing software.

Review4 Cet article présente une modélisation intérressante qui respecte les grands principes du génie logiciel. Le découplage entre algorithmes génériques et patterns de parallélisation présente un interêt particulier dans le cadre de la parallélisation d'algorithmes existants. Une section entière est reservée à la discussion autour des patterns de parallélisation.
L'article présente également une solution pour les besoins en optimisation, par le biais de "profils de performance", générés par des outils de mesure, fournis. Les différentes stratégies d'ordonnancement ne sont pas discutées, et c'est dommage. En revanche, le fait que l'architecture présentée soit orientée vers le traitement d'images ne semble pas nuire à la généricité du propos.

inproceedings:iwact01stapl

Ping An, Alin Jula, Silvius Rus, Steven Saunders, Tim Smith, Gabriel Tanase, Nathan Thomas, Nancy M. Amato, Lawrence Rachwerger

STAPL: A Standard Template Adaptive Parallel C++ Library

2001, International Workshop on Advanced Compiler Technology for High Performance and Embedded Processors

Rating: 4/5
KeywordsSTAPL, data parallel programming, distributed memory architecture, distributed types, object-oriented programming, programming toolkit, adaptive scheduling,
Link(s)PDF WWW

AbstractThe Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism in areas such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms, which use dynamically linked data structures.
STAPL provides several different algorithms for some library routines, and selects among them adaptively at run-time. STAPL can replace STL automatically by invoking a preprocessing translation phase. The performance of translated code is close to the results obtained using STAPL directly (less than 5% performance deterioration). However, STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.

Review4 STAPL est une implémentation parallèle de la Standard Templates Library (STL). Les auteurs respectent certains concepts qui sont utilisés par la STL, comme la définition d'itérateurs compatibles. La modélisation de cette bibliothèque autour des conteneurs, des itérateurs (sous formes de ranges) et des algorithmes, étant proche des standards, est très intérressante. Trois conteneurs parallèles sont fournis, équivalents à leurs homologues STL : vecteur, liste et arbre. La plupart des algorithmes "dont la parallélisation est pertinente" sont également implémentés.
Leur run-time system est complexe. D'une part un "parallel region manager" sert de front-end aux mécanismes de threading via pthreads ou autres. Un ordonnanceur (scheduler/distributor) gère la répartition des données entre les noeuds. Enfin, un executeur (tremblez mortels - aka 'executor') est chargé d'executer les tâches sur les noeuds une fois les données mises en place. Enfin, STAPL utilise l'allocateur de mémoire parallèle HOARD, qui est un travail antérieur.
Le dernier volet traite de la performance de STAPL, en illustrant les méthodes adaptatives dans le choix des méthodes d'ordonnancement. Je n'ai pas pris la peine de vérifier précisément l'exactitude de leurs résultats pour le moment.

incollection:reynders96pooma

John V. W. Reynders and Paul J. Hinker and Julian C. Cummings and Susan R. Atlas and Subhankar Banerjee and William F. Humphrey and Steve R. Karmesin and Katarzyna Keahey and Marikani Srikant and Mary Dell Tholburn

POOMA: A Framework for Scientific Simulation on Parallel Architectures

Unexported (unhandled) reference...

Rating: 3/5
KeywordsSPMD paradigm, high performance computing, programming toolkit,
Link(s)PDF WWW

Review3 Attention, POOMA n'existe plus sur le net. Il semblerait que ce projet, sous ce nom, se soit arrêté.
POOMA est une bibliothèque de classes, a priori sans trace de programmation générique. Le développement de "Global Data Types" ou GDT est intérressant en cela qu'il permet d'abstraire la réalité d'une valeur aux yeux d'un utilisateur, permettant par exemple de manipuler un aggrégat ou un ensemble de valeurs, quelle que soit leur localité. Le découpage se fait entre une couche "Globale", en charge de la valeur "globalement typée" et entre une couche "locale", qui représente tout ou partie de la valeur globale.
En revanche, le fait que la pluspart des types soient orientés vers les applications scientifiques de simulation limite sérieusement l'utilité de POOMA sorti de son domaine. Pire, de l'aveu des auteurs du papier, POOMA est plutot orienté vers les supercalculateurs, même s'il semblerait qu'une version basée sur MPI de leur module de communication ait existé.

article:bova99mpidirectives

Steve Bova, Clay Breshears, Rudolf Eigenmann, Henry Gabb, Greg Gartner, Bob Kuhn, Bill Magro, Stegano Salvini and Veer Vatsa

Combining Message-passing and Directives in Parallel Applications

1999, SIAM News, vol 32, pp

Rating: 3/5
KeywordsMPI, OpenMP, high performance computing, benchmarking,
Link(s)PDF WWW

AbstractDeveloppers of parallel applications can be faced with the problem of combining the two dominant models for parallel processing - distributed-memory and shared-memory parallelism - within one source code. In this article we discuss why it is useful to combine these two programming methodologies, both of which are supported on most high-performance computers, and some of the lessons we learned in work on five applications. All our applications make use of two programming models: message-passing, as represented by the PVM or MPI libraries, and the shared-memory style, as represented by the OpenMP directive standard. In all but one of the applications, we use these two models to exploit computer architectures that include both shared- and distributed-memory systems. Message-passing is used to coordinate coarse-grained parallel tasks across distributed compute nodes, whereas OpenMP exploits parallelism within multi-processor nodes. One of our applications, SPECseis96, implements message-passing and shared-memory directives at equal levels, which allows us to compare the performance of the two models.

Review3 Cet article réalise des mesures de performances sur des solutions de parallélisation conjuguant OpenMP et MPI. Les chiffres sont jolis, ca marche bien .. Pas de détails sur l'implémentation, si ce n'est que les interfaces de transfert de message (MPI) sont monothread. J'assimile ça à un fonctionnement "représentant-représenté".

techreport:fletcher03parallelstl

George Fletcher, Sriram Sankaran

Approaches to Parallel Generic Programming in the STL Framework

2003, Computer Science Department, Indiana University

Rating: 2/5
Keywordsdata parallel programming, generic programming, STL,
Link(s)PDF WWW

AbstractWhile tremendous progress has been made in developing parallel algorithms, there has not been as much success in developing language support for programming these parallel algorithms. The C++ Standard Template Library (STL) provides an opportunity for extending the concept of generic programming to the parallel realm. This paper discusses the basic requirements for extending STL to provide support for data-parallelism in C++. The ultimate goal is to implement a parallel library that is built within the existing framework of STL and exploits parallelism in existing sequential algorithms and also provides a set of parallel algorithms.

Review2 Un état de l'art assez long sur les solutions existantes. La fin du papier décrit comment adapter la STL pour la rendre parallèle, en restant dans ses spécifications (contrairement à STAPL, qui par exemple les étend). Deux axes sont envisagés: un allocateur de mémoire distribuée couplé avec des itérateurs "intelligents", ou des conteneurs de données pour utilisation parallèle.
Les distributions de données sont laissées à la charge de l'utilisateur. Leur implémentation est basée sur MPI. Pas de détails supplémentaires dans l'article, c'est un peu léger en comparaison avec STAPL. Le mérite de leur approche est de ne pas changer les conteneurs auxquels l'utilisateur est habitué. (Est-ce vraiment possible? N'est-ce pas trop contraignant ?)

conference:karmesin98pooma2

Karmesin, S. ; Crotinger, J. ; Cummings, J. ; Haney, S. ; Humphrey, W. ; Reynders, J. ; Smith, S. ; Williams, T.J.

Array design and expression evaluation in POOMA II

Unexported (unhandled) reference...

Link(s)PDF WWW

article:foster97globus

Ian Foster, Carl Kesselman

Globus: A Metacomputing Infrastructure Toolkit

1997, The International Journal of Supercomputer Applications and High Performance Computing, vol 11, pp 115-128

Keywordsmetacomputing, high performance computing, programming toolkit,
Link(s)PDF WWW

techreport:sheffler95portable

Thomas J. Sheffler

A portable MPI-based parallel vector template library

1995, RIACS

Link(s)PDF WWW

AbstractThis paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

techreport:gerlach98janus

Jens Gerlach, Mitsuhisa Sato, and Yutaka Ishikawa

Janus - A C++ Template Library for Parallel Dynamic Mesh Applications

1998, Tsukuba Research Center of the Real World Computing Partnership

Keywordsdata parallel programming, object-oriented programming, generic programming, finite element methods, data parallel programming,
Link(s)PDF

AbstractWe propos Janus - a C++ template library of container classes and communication primitives for parallel dynamic mesh applications. The paper focuses on two phase containers that are a central component of the Janus framework. These containers are quasi-constant, i.e., they have an extended initialization phase after which they provide read-only access to their elements. Two phase containers are useful for the efficient and easy-to-use representation of finite element meshes and generating sparsematrices. Using such containers makes it easy to encapsulate irregular communication patterns that occur when running finite element programs in parallel.

inproceedings:siu97paralleldp

Stephen Siu and Ajit Singh

Design Patterns for Parallel Computing Using a Network of Processors

1997, IEEE? 1082-8907/97

Link(s)PDF

AbstractHigh complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing. To deal with the complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. This paper describes the concept of a design pattern for the development of parallel applications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallel applications.
A parallel programming system, called DPnDP (Design Patterns and Distributed Processes), that employs such design pattems is described. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization structures. The uniqueness of OUT approach i s in the use of a standard structure and interface for a design pattern. This has several important implications: First, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (Extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and eficient parallel programming tool (Flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior.

misc:sankaralingam03universal

K. Sankaralingam and S. Keckler and W. Mark and D. Burger

Universal mechanisms for data-parallel architectures

2003

Link(s)PDF WWW

AbstractData-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at speci?c classes of these programs. This paper presents a classi?cation scheme for data-parallel program attributes, and proposes micro-architectural mechanisms to support applications with diverse behav ior using a single recon?gurable architecture. We focus on the following four broad kinds of data-parallel programs ? DSP/multimedia, scienti?c, networking, and real-time graphics workloads. While all of these programs exhibit high computational intensity, coarse-grain regular control behavior, and some regular memory access behavior, they show wide variance in the computation requirements, ?ne grain control behavior, and the frequency of other types of memory accesses. Based on this study of application attributes, this paper proposes a set of general micro-architectural mechanisms that enable a baseline architecture to be dynamically tailored to the demands of a particular application. These mechanisms provide ef?cient execution across a spectrum of data-parallel applications and can be applied to diverse architectures ranging from vector cores to conventional superscalar cores. Our results using a baseline TRIPS processor show that the con?gurability of the architecture to the application demands provides harmonic mean performance improvement of 5%?55% over scalable yet less ?exible architectures, and performs competitively against specialized architectures.

inproceedings:bi97objectoriented

Hua Bi

Object-oriented Data Parallel Programming in C++

1997, Proceedings of International conference on parallel and distributed processing techniques and applications (PDPTA)

Keywordsdata parallel programming, SMPD paradigm, distributed memory parallel architecture,
Link(s)PDF

techreport:siek98matrix

Jeremy G. Siek and Andrew Lumsdaine

The Matrix Template Library:A Generic Programming Approach to High Performance Numerical Linear Algebra

1998, Laboratory for Scientific Computing Department of Computer Science and Engineering University of Notre Dame

Keywordsgeneric programming, high performance computing,
Link(s)PDF WWW

AbstractWe present a unified approach for expressing high performance numerical linear algebra routines for large classes of dense and sparse matrices. As with the Standard Template Library [10], we explicitly separate algorithms from data structures through the use of generic programming techniques. We conclude that such an approach does not hinder high performance. On the contrary, writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures. We also tackle the performance portability problem for particular architecture dependent algorithms such as matrix-matrix multiply. Recently, code generation systems (PHiPAC [3] and ATLAS [15]) have been created to customize the algorithms according to architecture. A more elegant approach is to use template metaprograms [18] to allow for variation. In this paper we introduce the Basic Linear Algebra Instruction Set (BLAIS), a collection of high performance kernels for basic linear algebra.

techreport:beckman96tulip

Peter Beckman and Dennis Gannon

Tulip: A Portable Run-Time System for Object-Parallel Systems

1996, Computer Science Department, Indiana University

Link(s)PDF

AbstractThis paper describes Tulip, a parallel run-time system used by the pC++ parallel programming language. Tulip has been implemented on a variety of scalable, MPP computers including the IBM SP2, Intel Paragon, HP/Convex SPP, Meiko CS2, SGI Power Challenge, and Cray T3D. Tulip differs from other data-parallel RTS implementations; it is designed to support the operations from object-parallel programming that require remote member function execution and load and store operations on remote objects. It is designed to provide the thinnest possible layer atop the vendor-supplied machine interface. That thin veneer can then be used by other run-time layers to build machine-independent class libraries, compiler back ends, and more sophisticated run-time support. Some preliminary performance measurements for the IBM SP2, SGI Power Challenge, and Cray T3D are given.

techreport:johnson97hpc

Elisabeth Johnson and Dennis Gannon

HPC++: Experiments with the Parallel Standard Template Library

1997, Department of Computer Science, Indiana University

KeywordsPSTL,
Link(s)PDF WWW

AbstractHPC++ is a C++ library and language extension framework that is being developed by the HPC++ consortium as a standard model for portable parallel C++ programming. This paper describes an initial implementation of the HPC++ Parallel Standard Template Library (PSTL) framework. This implementation includes seven distributed containers as well as selected algorithms. We include preliminary performance results from several experiments using the PSTL.

techreport:kumar87parallelsearch

V. Nageshwara Rao and Vipin Kumar

Parallel Depth First Search, Part I: Implementation

1987, Department of Computer Sciences, University of Texas at Austin

Link(s)PDF

article:itzkovitz98thread

Ayal Itzkovitz and Assaf Schuster and Lea Shalev

Thread migration and its applications in distributed shared memory systems

1998, The Journal of Systems and Software, vol 42, pp 71–87

Link(s)WWW

techreport:winter94parallelengineering

S.C. Winter and P. Kacsuk

Software Engineering for Parallel Processing

1994, University of Westminster, Centre for Parallel Computing, London KFKI-MSZKI, Centre for Parallel Computing, Budapest

Link(s)PDF

article:jarvi04generic

Douglas Gregor and Jaakko J&\#228;rvi and Mayuresh Kulkarni and Andrew Lumsdaine and David Musser and Sibylle Schupp

Generic programming and high-performance libraries

2005, Int. J. Parallel Program., vol 33, pp 145–164

Link(s)PDF DOI

AbstractGeneric programming is an attractive paradigm for developing libraries for high-performance computing because of the simultaneous emphases placed on generality and efficiency.
In this approach, interfaces are based on sets of specified requirements on types, rather than on any particular type, allowing algorithms to inter-operate with any data type meeting the necessary requirements. These sets of requirements, known as concepts, can specify syntactic as well as semantic requirements. Although concepts are fundamental to generic programming, they are not supported as first-class entities in mainstream programming languages, thus limiting the degree to which generic programming can be effectively applied. In this paper we advocate better syntactic and semantic support for concepts and describe some straightforward language features that could better support them. We also briefly discuss uses for concepts beyond their use in constraining polymorphism.

techreport:Wu1996

Mon-You Wu and Wei Shu

DDE : A Modified Dimension Method for Load Balancing in k-ary n-cubes

1996, Department of Computer Science, State University of New York at Buffalo

Link(s)PDF

AbstractThe dimension exchange method (DEM) was initially proposed as a load-balancing algorithm for the hypercube structure. It has been generalized to k-ary n-cubes. However, the k-ary n-cube algorithm must take many iterations to converge to a balanced state. In this paper, we propose a direct method to modify DEM. The new algorithm, Direct Dimension Exchange (DDE) method, takes load average in every dimension to eliminate unnecessary load exchange. It balances the load directly without iteratively exchanging the load. It is able to balance the load more accurately and much faster.

techreport:hernandez05gridcomp

Francisco Hernández, Purushotham Bangalore, Kevin Reilly

End-User Tools for Grid Computing

2005, Department of Computer and Information Sciences University of Alabama at Birmingham

Keywordsgrid computing, end-user tools,
Link(s)PDF WWW

AbstractThe present work describes an approach to simplifying the development and deployment of applications for the Grid. Our approach aims at hiding accidental complexities (e.g., low-level Grid technologies) met when developing these kinds of applications. To realize this goal, the work focuses on the development of end-user tools using concepts of domain engineering and domain-specific modeling which are modern software engineering methods for automating the development of software. This work is an attempt to contribute to the long term research goal of empowering users to create complex applications for the Grid without depending on the expertise of support teams or on hand-crafted solutions.

mastersthesis:bromling92metaparallel

Steven Bromling

Meta-programming with Parallel Design Patterns

Unexported (unhandled) reference...

Link(s)PDF WWW

techreport:giloi95promoter

W.K. Giloi, M. Kessler, A. Schramm

PROMOTER: A High-Level, Object-Parallel Programming Language

1995, RWCP Massively Parallel Systems GMD Laboratory

Keywordscoordination schemes, distributed memory architecture, high-level programming model, parallelizing compiler, distributed types,
Link(s)PDF

AbstractThe superior performance and cost-effectiveness of scalable, distributed memory parallel computers will only then become generally exploitable if the programming difficulties with such machines are overcome. We see the ultimate solution in high-level programming models and appropriate parallelizing compilers that allow the user to formulate a parallel program in terms of application-specific concepts, while low-level issues such as optimal data distribution and coordination of the parallel threads are handled by the compiler. High Performance Fortran (HPF) is a major step in that direction; however, HPF still lacks in the generality of computing domains needed to treat other than regular, data-parallel numerical applications. A more flexible and more abstract programming language for regular and irregular object-parallel applications is PROMOTED. PROMOTED allows the user to program for an application-oriented abstract machine rather than for particular architecture. The wide semantic gap between the abstract machine and the concrete message-passing architecutre is closed by the compiler. Hence, the issues of data distribution, communication, and coordination (thread scheduling) are hidden from the user. The paper presents the underlying concepts of PROMOTER and the corresponding language concepts. The PROMOTER compiler translates the parallel program written in terms of distributed types into parallel threads and maps those optimally onto the nodes of the physical machine. The language constructs and their use, the tasks of the compiler, and the challenges encountered in its implementation are discussed.

techreport:jesshopeXXasynchrony

C.R. Jesshope, D.B. Barsky, A.B. Bolychevsky and A.V. Shafarenko

Asynchrony in distributed parallel computing

?, Department of Electronicand Electrical Engineering, University of Surrey

Link(s)PDF

phdthesis:parkes94concurrentoop

Steven Michael Parkes

A class library approach to concurrent object-oriented programming with applications to vlsi cad

Unexported (unhandled) reference...

Link(s)PDF

AbstractDespite increasing availability, the use of parallel platforms in the solution of significant computing problems remains largely restricted to a set of well-structured, numeric applications. This is due in part to the difficulty of parallel application development, which is itself largely the result of a lack of high-level development environments applicable to the majority of existant parallel architectures. This thesis addresses the issue of facilitating the application of parallel platforms to unstructured problems through the use of object-oriented design techniques and the actor model of concurrent computation. We present a multilevel approach to expressing parallelism for unstructured applications: a high-level interface based on the actor and aggregate models of concurrent object-oriented programming, and a low-level interface which provides an object-oriented interface to system services across a wide range of diverse parallel architectures.The interfaces are manifested in the ProperCAD II library, a C++ object library supporting actor concurrency on microprocessor-based parallel architectures and appropriate for applications exhibiting medium-grain parallelism. The interface supports uniprocessors, shared memory multiprocessors, distributed memory multicomputers, and hybrid architectures comprising network-connected clusters of uni- and multiprocessors. The library currently supports workstations from Sun, shared memory multiprocessors from Sun and Encore, distributed memory multicomputers from Intel and Thinking Machines, and hybrid architectures comprising IP network-connected clusters of Sun uni- and multiprocessors. We demonstrate our approach through an examination of the parallelization process for two existing unstructured serial applications drawn from the field of VLSI computer-aided design. We compare and contrast the library-based actor approach to other methods for expressing parallelism in C++.

misc:bikshandi06programming

Ganesh Bikshandi and Jia Guo and Daniel Hoeflinger and Gheorghe Almasi and Basilio B. Fraguela and Mara J. Garzaran and David Padua and Christoph von Praun

Programming for Parallelism and Locality with Hierarchically Tiled Arrays

2006

Keywordsdata parallel programming,
Link(s)PDF WWW

techreport:littin98block

Richard H. Littin and J. A. David McWha and Murray W. Pearson and John G. Cleary

Block Based Execution and Task Level Parallelism

1998, Department of Computer Science, University of Waikato, Hamilton, New Zealand

Link(s)PDF WWW

AbstractA fixed-length block-based instruction set architecture (ISA) based on dataflow techniques is described. This ISA is the compared and contrasted to those of more conventional architectures and other developmental architectures. A control mechanism to allow blàocks to be executed in parallel, so that the original control flow is maintained, is presented. A brief description of the hardware required to realize this mechanism is given.

inproceedings:berger00hoard

Emery D. Berger and Kathryn S. McKinley and Robert D. Blumofe and Paul R. Wilson

Hoard: A Scalable Memory Allocator for Multithreaded Applications

2000, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX)

Link(s)PDF WWW

AbstractParallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.
This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory ef?cient. Hoard is the ?rst allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.

inproceedings:rauchwerger98stapl

Lawrence Rauchwerger, Francisco Arzu, and Koji Ouchi

Standard Templates Adaptive Parallel Library (STAPL)

1998, Wkshp. on Lang. Comp. and Run-time Sys. for Scal. Comp. (LCR).

KeywordsSTAPL,
Link(s)PDF WWW

AbstractSTAPL (Standard Adaptive Parallel Library) is a parallel C++ library designed as a superset of the STL, sequentially consisten for functions with the same name, and executes on uni- or multiprocessors. STAPL is implemented using simple parallel extensions of C++ which provide a SPMD model of parallelism supporting recursive parallelism. The library is intended to be of generic use but emphasizes irregular, non-numeric programs to allow the exploitation of parallelism in areas such as geometric modeling or graph algorithms which use dynamic linked data structures. Each library routine has several different algorithmic options, and the choice among them will be made adaptively based on a performance model, statistical feedback, and current run-time conditions. Built-in performance monitors can measure actual performance and, using an extension of the BSP model predict the relative performance of the algorighmic choices for each library routine. STAPL is intended to possibly replace STL in a user transparant mannel and run on small to medium scale shared memoery multiprocessors which support OpenMP.