Bibliographie

Vous pouvez télécharger directement le fichier BibTeX qui a servi à générer cette page.
Vous trouverez également disponibles dans une archive bien ronde les différents fichiers qui ont servi à générer cette page, en têtes HTML comprises.

Attention!

Cette page mélange allégrement l'anglais et le français.
  • L'outil qui génère la page s'exprime en anglais, et les titres et autres résumés d'articles sont généralement exprimés dans la langue de Shakespeare. Les mots clé (keywords, ou étiquettes (tags) sont tous exprimés en anglais californien qui sent le sable et le silicone.
  • Ces mots, ainsi que les commentaires sur les articles (reviews) sont en français. Je préfère, ca évite les ambiguités dus à un choix de mots un peu hâtif.
  • Nouveautés

  • La table des matières est maintenant triée par ordre alphabétique (pas les articles)
  • Les codes de références de chaque article (donnée en haut à droite) sont désormais cliquables, et ramènent à la table des matières
  • Most frequently used keywords:Reviewed articles: 6
    Rated articles: 6
    Table of contents:
    A Software Architecture for User Transparent Parallel Image Processing (2000)
    A class library approach to concurrent object-oriented programming with applications to vlsi cad (1994)
    A portable MPI-based parallel vector template library (1995)
    Analysis of the Generalized Dimension Exchange Method for Dynamic Load Balancing (1992)
    Approaches for Integrating Task and Data Parallelism (1998)
    Approaches to Parallel Generic Programming in the STL Framework (2003)
    Array design and expression evaluation in POOMA II (1998)
    Asynchrony in distributed parallel computing (?)
    Block Based Execution and Task Level Parallelism (1998)
    Combining Message-passing and Directives in Parallel Applications (1999)
    DDE : A Modified Dimension Method for Load Balancing in k-ary n-cubes (1996)
    Design Patterns for Parallel Computing Using a Network of Processors (1997)
    End-User Tools for Grid Computing (2005)
    Generic programming and high-performance libraries (2005)
    Globus: A Metacomputing Infrastructure Toolkit (1997)
    HPC++: Experiments with the Parallel Standard Template Library (1997)
    Hoard: A Scalable Memory Allocator for Multithreaded Applications (2000)
    Janus - A C++ Template Library for Parallel Dynamic Mesh Applications (1998)
    Meta-programming with Parallel Design Patterns (2002)
    Object-oriented Data Parallel Programming in C++ (1997)
    POOMA: A Framework for Scientific Simulation on Parallel Architectures (1996)
    PROMOTER: A High-Level, Object-Parallel Programming Language (1995)
    Parallel Depth First Search, Part I: Implementation (1987)
    Programming for Parallelism and Locality with Hierarchically Tiled Arrays (2006)
    STAPL: A Standard Template Adaptive Parallel C++ Library (2001)
    Software Engineering for Parallel Processing (1994)
    Standard Templates Adaptive Parallel Library (STAPL) (1998)
    The Landscape of Parallel Computing Research: A View from Berkeley (2006)
    The Matrix Template Library:A Generic Programming Approach to High Performance Numerical Linear Algebra (1998)
    Thread migration and its applications in distributed shared memory systems (1998)
    Tulip: A Portable Run-Time System for Object-Parallel Systems (1996)
    Universal mechanisms for data-parallel architectures (2003)
    article:bal98approaches
    Henri E. Bal and Matthew Haines
    Approaches for Integrating Task and Data Parallelism
    1998, IEEE Concurrency, vol 6, pp 74–84
    Rating: 4/5
    Link(s)PDF WWW
    AbstractLanguages that support both task and data parallelism are highly general and can exploit both forms of parallelism within a single application. However, integrating the two forms of parallelism cleanly and within a coherent programming model is difficult. This paper describes four languages (Fx, Opus, Orca and Braid) that try to achieve such an integration and identifies several problems. The main problems are how to support both SPMD and MIMD style programs, how to organize the address space of a parallel program, and how to design the integrated model such that it can be implemented efficiently.
    Review4 Un article très utile pour moi : il donne quatre exemples d'applications qui conjuguent avec plus ou moins de succès le parallélisme de données et le parallélisme de tâches. A chaque fois, le design des applications est donné en illustration, une étude des performances et du fonctionnement est fourni, enfin une appréciation générale sur l'utilisabilité est donnée en conclusion.
    techreport:seinstra2kparallelimagery
    F.J. Seinstra, D. KOelma and J.M. Geusebroek
    A Software Architecture for User Transparent Parallel Image Processing
    2000, Intelligent Sensory Information Systems Department of Computer Science University of Amsterdam The Netherlands
    Rating: 4/5
    Link(s)PDF
    AbstractThis report describes a software architecture that enables transparent development of image processing applications for parallel computers. The architecture's main component is an extensive library of low level image processing operations capable of running on distributed memory MIMD-style parallel hardware. Since the library has an application programming interface identical to that of an existing sequential image processing library, the parallelism is completely hidden from the user.
    The first part of the report discusses implementation aspects of the parallel library. It is shown how sequential as well as parallel operations are implemented on the basis of so-called parallelizable patterns. A library built in this manner is easily maintainable, as code redundancy is avoided as much as possible. The second part of the report describes the application of performance models to ensure efficiency of execution on a range of parallel machines. A high level abstract machine for parallel image processing, that serves as a basis for the performance models, is described as well. Experiments show that for a realistic image processing application performance predictions are highly accurate. These results indicate that the core of the architecture forms a powerful basis for automatic parallelization and optimization of a wide range of image processing software.
    Review4 Cet article présente une modélisation intérressante qui respecte les grands principes du génie logiciel. Le découplage entre algorithmes génériques et patterns de parallélisation présente un interêt particulier dans le cadre de la parallélisation d'algorithmes existants. Une section entière est reservée à la discussion autour des patterns de parallélisation.
    L'article présente également une solution pour les besoins en optimisation, par le biais de "profils de performance", générés par des outils de mesure, fournis. Les différentes stratégies d'ordonnancement ne sont pas discutées, et c'est dommage. En revanche, le fait que l'architecture présentée soit orientée vers le traitement d'images ne semble pas nuire à la généricité du propos.
    inproceedings:iwact01stapl
    Ping An, Alin Jula, Silvius Rus, Steven Saunders, Tim Smith, Gabriel Tanase, Nathan Thomas, Nancy M. Amato, Lawrence Rachwerger
    STAPL: A Standard Template Adaptive Parallel C++ Library
    2001, International Workshop on Advanced Compiler Technology for High Performance and Embedded Processors
    Rating: 4/5
    KeywordsSTAPL, data parallel programming, distributed memory architecture, distributed types, object-oriented programming, programming toolkit, adaptive scheduling,
    Link(s)PDF WWW
    AbstractThe Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism in areas such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms, which use dynamically linked data structures.
    STAPL provides several different algorithms for some library routines, and selects among them adaptively at run-time. STAPL can replace STL automatically by invoking a preprocessing translation phase. The performance of translated code is close to the results obtained using STAPL directly (less than 5% performance deterioration). However, STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.
    Review4 STAPL est une implémentation parallèle de la Standard Templates Library (STL). Les auteurs respectent certains concepts qui sont utilisés par la STL, comme la définition d'itérateurs compatibles. La modélisation de cette bibliothèque autour des conteneurs, des itérateurs (sous formes de ranges) et des algorithmes, étant proche des standards, est très intérressante. Trois conteneurs parallèles sont fournis, équivalents à leurs homologues STL : vecteur, liste et arbre. La plupart des algorithmes "dont la parallélisation est pertinente" sont également implémentés.
    Leur run-time system est complexe. D'une part un "parallel region manager" sert de front-end aux mécanismes de threading via pthreads ou autres. Un ordonnanceur (scheduler/distributor) gère la répartition des données entre les noeuds. Enfin, un executeur (tremblez mortels - aka 'executor') est chargé d'executer les tâches sur les noeuds une fois les données mises en place. Enfin, STAPL utilise l'allocateur de mémoire parallèle HOARD, qui est un travail antérieur.
    Le dernier volet traite de la performance de STAPL, en illustrant les méthodes adaptatives dans le choix des méthodes d'ordonnancement. Je n'ai pas pris la peine de vérifier précisément l'exactitude de leurs résultats pour le moment.
    incollection:reynders96pooma
    John V. W. Reynders and Paul J. Hinker and Julian C. Cummings and Susan R. Atlas and Subhankar Banerjee and William F. Humphrey and Steve R. Karmesin and Katarzyna Keahey and Marikani Srikant and Mary Dell Tholburn
    POOMA: A Framework for Scientific Simulation on Parallel Architectures
    Unexported (unhandled) reference...
    Rating: 3/5
    KeywordsSPMD paradigm, high performance computing, programming toolkit,
    Link(s)PDF WWW
    Review3 Attention, POOMA n'existe plus sur le net. Il semblerait que ce projet, sous ce nom, se soit arrêté.
    POOMA est une bibliothèque de classes, a priori sans trace de programmation générique. Le développement de "Global Data Types" ou GDT est intérressant en cela qu'il permet d'abstraire la réalité d'une valeur aux yeux d'un utilisateur, permettant par exemple de manipuler un aggrégat ou un ensemble de valeurs, quelle que soit leur localité. Le découpage se fait entre une couche "Globale", en charge de la valeur "globalement typée" et entre une couche "locale", qui représente tout ou partie de la valeur globale.
    En revanche, le fait que la pluspart des types soient orientés vers les applications scientifiques de simulation limite sérieusement l'utilité de POOMA sorti de son domaine. Pire, de l'aveu des auteurs du papier, POOMA est plutot orienté vers les supercalculateurs, même s'il semblerait qu'une version basée sur MPI de leur module de communication ait existé.
    article:bova99mpidirectives
    Steve Bova, Clay Breshears, Rudolf Eigenmann, Henry Gabb, Greg Gartner, Bob Kuhn, Bill Magro, Stegano Salvini and Veer Vatsa
    Combining Message-passing and Directives in Parallel Applications
    1999, SIAM News, vol 32, pp
    Rating: 3/5
    KeywordsMPI, OpenMP, high performance computing, benchmarking,
    Link(s)PDF WWW
    AbstractDeveloppers of parallel applications can be faced with the problem of combining the two dominant models for parallel processing - distributed-memory and shared-memory parallelism - within one source code. In this article we discuss why it is useful to combine these two programming methodologies, both of which are supported on most high-performance computers, and some of the lessons we learned in work on five applications. All our applications make use of two programming models: message-passing, as represented by the PVM or MPI libraries, and the shared-memory style, as represented by the OpenMP directive standard. In all but one of the applications, we use these two models to exploit computer architectures that include both shared- and distributed-memory systems. Message-passing is used to coordinate coarse-grained parallel tasks across distributed compute nodes, whereas OpenMP exploits parallelism within multi-processor nodes. One of our applications, SPECseis96, implements message-passing and shared-memory directives at equal levels, which allows us to compare the performance of the two models.
    Review3 Cet article réalise des mesures de performances sur des solutions de parallélisation conjuguant OpenMP et MPI. Les chiffres sont jolis, ca marche bien .. Pas de détails sur l'implémentation, si ce n'est que les interfaces de transfert de message (MPI) sont monothread. J'assimile ça à un fonctionnement "représentant-représenté".
    techreport:fletcher03parallelstl
    George Fletcher, Sriram Sankaran
    Approaches to Parallel Generic Programming in the STL Framework
    2003, Computer Science Department, Indiana University
    Rating: 2/5
    Keywordsdata parallel programming, generic programming, STL,
    Link(s)PDF WWW
    AbstractWhile tremendous progress has been made in developing parallel algorithms, there has not been as much success in developing language support for programming these parallel algorithms. The C++ Standard Template Library (STL) provides an opportunity for extending the concept of generic programming to the parallel realm. This paper discusses the basic requirements for extending STL to provide support for data-parallelism in C++. The ultimate goal is to implement a parallel library that is built within the existing framework of STL and exploits parallelism in existing sequential algorithms and also provides a set of parallel algorithms.
    Review2 Un état de l'art assez long sur les solutions existantes. La fin du papier décrit comment adapter la STL pour la rendre parallèle, en restant dans ses spécifications (contrairement à STAPL, qui par exemple les étend). Deux axes sont envisagés: un allocateur de mémoire distribuée couplé avec des itérateurs "intelligents", ou des conteneurs de données pour utilisation parallèle.
    Les distributions de données sont laissées à la charge de l'utilisateur. Leur implémentation est basée sur MPI. Pas de détails supplémentaires dans l'article, c'est un peu léger en comparaison avec STAPL. Le mérite de leur approche est de ne pas changer les conteneurs auxquels l'utilisateur est habitué. (Est-ce vraiment possible? N'est-ce pas trop contraignant ?)
    conference:karmesin98pooma2
    Karmesin, S. ; Crotinger, J. ; Cummings, J. ; Haney, S. ; Humphrey, W. ; Reynders, J. ; Smith, S. ; Williams, T.J.
    Array design and expression evaluation in POOMA II
    Unexported (unhandled) reference...
    Link(s)PDF WWW
    article:foster97globus
    Ian Foster, Carl Kesselman
    Globus: A Metacomputing Infrastructure Toolkit
    1997, The International Journal of Supercomputer Applications and High Performance Computing, vol 11, pp 115-128
    Keywordsmetacomputing, high performance computing, programming toolkit,
    Link(s)PDF WWW
    techreport:sheffler95portable
    Thomas J. Sheffler
    A portable MPI-based parallel vector template library
    1995, RIACS
    Link(s)PDF WWW
    AbstractThis paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
    techreport:gerlach98janus
    Jens Gerlach, Mitsuhisa Sato, and Yutaka Ishikawa
    Janus - A C++ Template Library for Parallel Dynamic Mesh Applications
    1998, Tsukuba Research Center of the Real World Computing Partnership
    Keywordsdata parallel programming, object-oriented programming, generic programming, finite element methods, data parallel programming,
    Link(s)PDF
    AbstractWe propos Janus - a C++ template library of container classes and communication primitives for parallel dynamic mesh applications. The paper focuses on two phase containers that are a central component of the Janus framework. These containers are quasi-constant, i.e., they have an extended initialization phase after which they provide read-only access to their elements. Two phase containers are useful for the efficient and easy-to-use representation of finite element meshes and generating sparsematrices. Using such containers makes it easy to encapsulate irregular communication patterns that occur when running finite element programs in parallel.
    inproceedings:siu97paralleldp
    Stephen Siu and Ajit Singh
    Design Patterns for Parallel Computing Using a Network of Processors
    1997, IEEE? 1082-8907/97
    Link(s)PDF
    AbstractHigh complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing. To deal with the complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. This paper describes the concept of a design pattern for the development of parallel applications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallel applications.
    A parallel programming system, called DPnDP (Design Patterns and Distributed Processes), that employs such design pattems is described. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization structures. The uniqueness of OUT approach i s in the use of a standard structure and interface for a design pattern. This has several important implications: First, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (Extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and eficient parallel programming tool (Flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior.
    misc:sankaralingam03universal
    K. Sankaralingam and S. Keckler and W. Mark and D. Burger
    Universal mechanisms for data-parallel architectures
    2003
    Link(s)PDF WWW
    AbstractData-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at speci?c classes of these programs. This paper presents a classi?cation scheme for data-parallel program attributes, and proposes micro-architectural mechanisms to support applications with diverse behav ior using a single recon?gurable architecture. We focus on the following four broad kinds of data-parallel programs ? DSP/multimedia, scienti?c, networking, and real-time graphics workloads. While all of these programs exhibit high computational intensity, coarse-grain regular control behavior, and some regular memory access behavior, they show wide variance in the computation requirements, ?ne grain control behavior, and the frequency of other types of memory accesses. Based on this study of application attributes, this paper proposes a set of general micro-architectural mechanisms that enable a baseline architecture to be dynamically tailored to the demands of a particular application. These mechanisms provide ef?cient execution across a spectrum of data-parallel applications and can be applied to diverse architectures ranging from vector cores to conventional superscalar cores. Our results using a baseline TRIPS processor show that the con?gurability of the architecture to the application demands provides harmonic mean performance improvement of 5%?55% over scalable yet less ?exible architectures, and performs competitively against specialized architectures.
    inproceedings:bi97objectoriented
    Hua Bi
    Object-oriented Data Parallel Programming in C++
    1997, Proceedings of International conference on parallel and distributed processing techniques and applications (PDPTA)
    Keywordsdata parallel programming, SMPD paradigm, distributed memory parallel architecture,
    Link(s)PDF
    techreport:siek98matrix
    Jeremy G. Siek and Andrew Lumsdaine
    The Matrix Template Library:A Generic Programming Approach to High Performance Numerical Linear Algebra
    1998, Laboratory for Scientific Computing Department of Computer Science and Engineering University of Notre Dame
    Keywordsgeneric programming, high performance computing,
    Link(s)PDF WWW
    AbstractWe present a unified approach for expressing high performance numerical linear algebra routines for large classes of dense and sparse matrices. As with the Standard Template Library [10], we explicitly separate algorithms from data structures through the use of generic programming techniques. We conclude that such an approach does not hinder high performance. On the contrary, writing portable high performance codes is actually enabled with such an approach because the performance critical code sections can be isolated from the algorithms and the data structures. We also tackle the performance portability problem for particular architecture dependent algorithms such as matrix-matrix multiply. Recently, code generation systems (PHiPAC [3] and ATLAS [15]) have been created to customize the algorithms according to architecture. A more elegant approach is to use template metaprograms [18] to allow for variation. In this paper we introduce the Basic Linear Algebra Instruction Set (BLAIS), a collection of high performance kernels for basic linear algebra.
    techreport:beckman96tulip
    Peter Beckman and Dennis Gannon
    Tulip: A Portable Run-Time System for Object-Parallel Systems
    1996, Computer Science Department, Indiana University
    Link(s)PDF
    AbstractThis paper describes Tulip, a parallel run-time system used by the pC++ parallel programming language. Tulip has been implemented on a variety of scalable, MPP computers including the IBM SP2, Intel Paragon, HP/Convex SPP, Meiko CS2, SGI Power Challenge, and Cray T3D. Tulip differs from other data-parallel RTS implementations; it is designed to support the operations from object-parallel programming that require remote member function execution and load and store operations on remote objects. It is designed to provide the thinnest possible layer atop the vendor-supplied machine interface. That thin veneer can then be used by other run-time layers to build machine-independent class libraries, compiler back ends, and more sophisticated run-time support. Some preliminary performance measurements for the IBM SP2, SGI Power Challenge, and Cray T3D are given.
    techreport:johnson97hpc
    Elisabeth Johnson and Dennis Gannon
    HPC++: Experiments with the Parallel Standard Template Library
    1997, Department of Computer Science, Indiana University
    KeywordsPSTL,
    Link(s)PDF WWW
    AbstractHPC++ is a C++ library and language extension framework that is being developed by the HPC++ consortium as a standard model for portable parallel C++ programming. This paper describes an initial implementation of the HPC++ Parallel Standard Template Library (PSTL) framework. This implementation includes seven distributed containers as well as selected algorithms. We include preliminary performance results from several experiments using the PSTL.
    techreport:kumar87parallelsearch
    V. Nageshwara Rao and Vipin Kumar
    Parallel Depth First Search, Part I: Implementation
    1987, Department of Computer Sciences, University of Texas at Austin
    Link(s)PDF
    article:itzkovitz98thread
    Ayal Itzkovitz and Assaf Schuster and Lea Shalev
    Thread migration and its applications in distributed shared memory systems
    1998, The Journal of Systems and Software, vol 42, pp 71–87
    Link(s)WWW
    techreport:winter94parallelengineering
    S.C. Winter and P. Kacsuk
    Software Engineering for Parallel Processing
    1994, University of Westminster, Centre for Parallel Computing, London KFKI-MSZKI, Centre for Parallel Computing, Budapest
    Link(s)PDF
    article:jarvi04generic
    Douglas Gregor and Jaakko J&\#228;rvi and Mayuresh Kulkarni and Andrew Lumsdaine and David Musser and Sibylle Schupp
    Generic programming and high-performance libraries
    2005, Int. J. Parallel Program., vol 33, pp 145–164
    Link(s)PDF DOI
    AbstractGeneric programming is an attractive paradigm for developing libraries for high-performance computing because of the simultaneous emphases placed on generality and efficiency.
    In this approach, interfaces are based on sets of specified requirements on types, rather than on any particular type, allowing algorithms to inter-operate with any data type meeting the necessary requirements. These sets of requirements, known as concepts, can specify syntactic as well as semantic requirements. Although concepts are fundamental to generic programming, they are not supported as first-class entities in mainstream programming languages, thus limiting the degree to which generic programming can be effectively applied. In this paper we advocate better syntactic and semantic support for concepts and describe some straightforward language features that could better support them. We also briefly discuss uses for concepts beyond their use in constraining polymorphism.
    techreport:Wu1996
    Mon-You Wu and Wei Shu
    DDE : A Modified Dimension Method for Load Balancing in k-ary n-cubes
    1996, Department of Computer Science, State University of New York at Buffalo
    Link(s)PDF
    AbstractThe dimension exchange method (DEM) was initially proposed as a load-balancing algorithm for the hypercube structure. It has been generalized to k-ary n-cubes. However, the k-ary n-cube algorithm must take many iterations to converge to a balanced state. In this paper, we propose a direct method to modify DEM. The new algorithm, Direct Dimension Exchange (DDE) method, takes load average in every dimension to eliminate unnecessary load exchange. It balances the load directly without iteratively exchanging the load. It is able to balance the load more accurately and much faster.
    techreport:hernandez05gridcomp
    Francisco Hernández, Purushotham Bangalore, Kevin Reilly
    End-User Tools for Grid Computing
    2005, Department of Computer and Information Sciences University of Alabama at Birmingham
    Keywordsgrid computing, end-user tools,
    Link(s)PDF WWW
    AbstractThe present work describes an approach to simplifying the development and deployment of applications for the Grid. Our approach aims at hiding accidental complexities (e.g., low-level Grid technologies) met when developing these kinds of applications. To realize this goal, the work focuses on the development of end-user tools using concepts of domain engineering and domain-specific modeling which are modern software engineering methods for automating the development of software. This work is an attempt to contribute to the long term research goal of empowering users to create complex applications for the Grid without depending on the expertise of support teams or on hand-crafted solutions.
    mastersthesis:bromling92metaparallel
    Steven Bromling
    Meta-programming with Parallel Design Patterns
    Unexported (unhandled) reference...
    Link(s)PDF WWW
    techreport:giloi95promoter
    W.K. Giloi, M. Kessler, A. Schramm
    PROMOTER: A High-Level, Object-Parallel Programming Language
    1995, RWCP Massively Parallel Systems GMD Laboratory
    Keywordscoordination schemes, distributed memory architecture, high-level programming model, parallelizing compiler, distributed types,
    Link(s)PDF
    AbstractThe superior performance and cost-effectiveness of scalable, distributed memory parallel computers will only then become generally exploitable if the programming difficulties with such machines are overcome. We see the ultimate solution in high-level programming models and appropriate parallelizing compilers that allow the user to formulate a parallel program in terms of application-specific concepts, while low-level issues such as optimal data distribution and coordination of the parallel threads are handled by the compiler. High Performance Fortran (HPF) is a major step in that direction; however, HPF still lacks in the generality of computing domains needed to treat other than regular, data-parallel numerical applications. A more flexible and more abstract programming language for regular and irregular object-parallel applications is PROMOTED. PROMOTED allows the user to program for an application-oriented abstract machine rather than for particular architecture. The wide semantic gap between the abstract machine and the concrete message-passing architecutre is closed by the compiler. Hence, the issues of data distribution, communication, and coordination (thread scheduling) are hidden from the user. The paper presents the underlying concepts of PROMOTER and the corresponding language concepts. The PROMOTER compiler translates the parallel program written in terms of distributed types into parallel threads and maps those optimally onto the nodes of the physical machine. The language constructs and their use, the tasks of the compiler, and the challenges encountered in its implementation are discussed.
    techreport:jesshopeXXasynchrony
    C.R. Jesshope, D.B. Barsky, A.B. Bolychevsky and A.V. Shafarenko
    Asynchrony in distributed parallel computing
    ?, Department of Electronicand Electrical Engineering, University of Surrey
    Link(s)PDF
    phdthesis:parkes94concurrentoop
    Steven Michael Parkes
    A class library approach to concurrent object-oriented programming with applications to vlsi cad
    Unexported (unhandled) reference...
    Link(s)PDF
    AbstractDespite increasing availability, the use of parallel platforms in the solution of significant computing problems remains largely restricted to a set of well-structured, numeric applications. This is due in part to the difficulty of parallel application development, which is itself largely the result of a lack of high-level development environments applicable to the majority of existant parallel architectures. This thesis addresses the issue of facilitating the application of parallel platforms to unstructured problems through the use of object-oriented design techniques and the actor model of concurrent computation. We present a multilevel approach to expressing parallelism for unstructured applications: a high-level interface based on the actor and aggregate models of concurrent object-oriented programming, and a low-level interface which provides an object-oriented interface to system services across a wide range of diverse parallel architectures.The interfaces are manifested in the ProperCAD II library, a C++ object library supporting actor concurrency on microprocessor-based parallel architectures and appropriate for applications exhibiting medium-grain parallelism. The interface supports uniprocessors, shared memory multiprocessors, distributed memory multicomputers, and hybrid architectures comprising network-connected clusters of uni- and multiprocessors. The library currently supports workstations from Sun, shared memory multiprocessors from Sun and Encore, distributed memory multicomputers from Intel and Thinking Machines, and hybrid architectures comprising IP network-connected clusters of Sun uni- and multiprocessors. We demonstrate our approach through an examination of the parallelization process for two existing unstructured serial applications drawn from the field of VLSI computer-aided design. We compare and contrast the library-based actor approach to other methods for expressing parallelism in C++.
    misc:bikshandi06programming
    Ganesh Bikshandi and Jia Guo and Daniel Hoeflinger and Gheorghe Almasi and Basilio B. Fraguela and Mara J. Garzaran and David Padua and Christoph von Praun
    Programming for Parallelism and Locality with Hierarchically Tiled Arrays
    2006
    Keywordsdata parallel programming,
    Link(s)PDF WWW
    techreport:littin98block
    Richard H. Littin and J. A. David McWha and Murray W. Pearson and John G. Cleary
    Block Based Execution and Task Level Parallelism
    1998, Department of Computer Science, University of Waikato, Hamilton, New Zealand
    Link(s)PDF WWW
    AbstractA fixed-length block-based instruction set architecture (ISA) based on dataflow techniques is described. This ISA is the compared and contrasted to those of more conventional architectures and other developmental architectures. A control mechanism to allow blàocks to be executed in parallel, so that the original control flow is maintained, is presented. A brief description of the hardware required to realize this mechanism is given.
    inproceedings:berger00hoard
    Emery D. Berger and Kathryn S. McKinley and Robert D. Blumofe and Paul R. Wilson
    Hoard: A Scalable Memory Allocator for Multithreaded Applications
    2000, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX)
    Link(s)PDF WWW
    AbstractParallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.
    This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory ef?cient. Hoard is the ?rst allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.
    inproceedings:rauchwerger98stapl
    Lawrence Rauchwerger, Francisco Arzu, and Koji Ouchi
    Standard Templates Adaptive Parallel Library (STAPL)
    1998, Wkshp. on Lang. Comp. and Run-time Sys. for Scal. Comp. (LCR).
    KeywordsSTAPL,
    Link(s)PDF WWW
    AbstractSTAPL (Standard Adaptive Parallel Library) is a parallel C++ library designed as a superset of the STL, sequentially consisten for functions with the same name, and executes on uni- or multiprocessors. STAPL is implemented using simple parallel extensions of C++ which provide a SPMD model of parallelism supporting recursive parallelism. The library is intended to be of generic use but emphasizes irregular, non-numeric programs to allow the exploitation of parallelism in areas such as geometric modeling or graph algorithms which use dynamic linked data structures. Each library routine has several different algorithmic options, and the choice among them will be made adaptively based on a performance model, statistical feedback, and current run-time conditions. Built-in performance monitors can measure actual performance and, using an extension of the BSP model predict the relative performance of the algorighmic choices for each library routine. STAPL is intended to possibly replace STL in a user transparant mannel and run on small to medium scale shared memoery multiprocessors which support OpenMP.
    techreport:Asanovic:EECS-2006-183
    Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams and Katherine A. Yelick
    The Landscape of Parallel Computing Research: A View from Berkeley
    2006, EECS Department, University of California, Berkeley
    Link(s)PDF WWW
    article:xu92analysis
    C. Z. Xu and F. C. M. Lau
    Analysis of the Generalized Dimension Exchange Method for Dynamic Load Balancing
    1992, Journal of Parallel and Distributed Computing, vol 16, pp 385–393
    Link(s)PDF WWW
    Traduit depuis BibTeXML (une traduction de BibTeX) avec btml2html.rb
    Cette page est entièrement statique! Vous pouvez la sauvegarder en l'état, elle ne perdra rien de sa saveur!