Next: Operating Systems Laboratory Up: Department of Computer Previous: Language Processing Systems

# Distributed Parallel Processing Laboratory

/ N. N. Mirenkov / Professor
/ S. G. Sedukhin / Associate Professor
/ Shietung Peng / Assistant Professor
/ Sergey V. Ten / Assistant Professor

Our general program is high-performance computing and advanced information infrastructure. We are undertaking research efforts in parallel and distributed algorithms/architectures, visual (multimedia) languages and tools. We lead joint research projects Massively Parallel Computation: Synchronous vs. Asynchronous Designs", Parallel Scientific Computations", Fine-grain parallelism for Cellular automata" and take part in projects: Aizu Supercomputer" and Efficient Algorithmic Design Techniques for Parallel Computing".

We are also developing a top-down education project related to distributed parallel processing. In addition, together with the Center for Language Research and students we started the project Advanced Computing and Communication Multimedia English-Japanese Glossary". We are studying the problem of parallel program portability and developing the supporting tools. Our theoretical work is related to efficient algorithms for parallel numerical computation including matrix multiplication, Fourier transformation, and Singular Value Decomposition as well as for fault tolerant routing on certain interconnection networks, like hypercubes, star graphs, and torus.

As one of our results we would like to mention the S4CAD system. S4CAD (System of Systolic Structures Synthesis) is a high level interactive software tool for designing, analysis and modeling of systolic VLSI-oriented array processors. Starting with a localized data dependence graph of the algorithm S4CAD automatically produces all permissible array processors, among which the designer may chose the optimal one. World Wide Web page of the S4CAD is available at http://www.u-aizu.ac.jp/labs/sw-dpp/S4CAD/s4cad.html.

The important question of our research is parallel program transparency and accessibility of massively parallel computers to large user populations. We are developing a multimedia technology for the interactive specification of application algorithms. This technology is related to visualization, animation and sonification of data processing methods. In fact, it is a 'filmification' of application algorithms. Each film is based on a rather big peace of knowledge of a class of the algorithms. Our approach is close to the idea of algorithmic skeletons (higher order functions). However, skeletons leave a big gap between reality and a human (a multi-channel being). So, we use algorithmic multimedia skeletons based on both mathematical and physical abstractions.

Each film is a series of frames (computational steps) displaying one or more parameterized sets of nodes and/or moving objects in multi-dimensional space-time. Each frame highlights a subset of these nodes and/or moving objects. The films have their own sets of figures, sounds, colors and their own shape'' of the movement. Users recognize them due to a combination of all these features. Each film defines a partial order of scanning of the nodes or objects (and possibly, colors and sounds). In fact, partial orders of scanning are defined via: positions of nodes and objects, trajectories of movement, external configuration patterns of surrounding nodes and/or variable values (internal configuration) on nodes.

As a rule, computation specified on different nodes (objects) of a frame is considered to be performed in parallel. Computation specified in different frames is performed sequentially. So, it is possible to say: the shorter film the better.

The user defines the specification by creating his new film. The corresponding program (sequential or parallel) is generated automatically. Each film itself is considered as a special-purpose application for which a parallel program is developed in advance as an item of the system library. The specification of a problem is considered as input data for such a parallel program. However, this data can introduce a composition of a few films, an irregularity of computation, special sizes of data, etc. It means that to reach efficiency it is impossible to have only one variant of the film implementation. So, each film is supported at least by a few parallel programs from the system library. Different programs are related to different conditions being defined by the specification. In other words, a film has a special space of decision-making. This space takes into account levels of problem granularity, irregularity, dynamics of the data decomposition, and some others related to algorithm features. Because each film reflects certain knowledge about data processing, we select the best possible solutions for the implementation of this data processing under different conditions and on the basis of the best hand-made algorithms. These solutions approximate'' possible decision-making.

Refereed Journal Papers

1. G. Adhar and S. Peng. Mixed domination in trees: a parallel algorithm. Congressus Numerantium, 100:73--80, 1994.

In this paper, we present new parallel NC algorithm to find smallest mixed dominating set in trees. The algorithm is based on tree compression techniques that have been used traditionally to evaluate linear arithmetic expressions in parallel. The model of parallel computation used is the CRCWP-RAM (Concurrent Read Concurrent Write Parallel RAM), where Writing conflicts are resolved in a non-deterministic fashion. The algorithm requires $O(n)$ processors and runs in $O(\log n)$ time on a CRCW P-RAM.

2. S.G. Sedukhin. Synthesis and analysis of systolic processors for the algebraic paths problem. System Informatics, (3):248--302, 1994.

To solve the algebraic path problem including transitive closure, shortest paths problem and finding of the minimum-cost spanning tree in a graph and also the inverse of a matrix, three functionally equivalent algorithms are introduced. These algorithms have different regularity of the computations. For each of the algorithms, wehave systematically derived and evaluated various designs of systolic array processors, which are differed in their dimension, number and layout of processing elements, the total time of computations, etc. Specific properties of operations performed for different problems in the underlying algebra may be an issue for further design improvment.

Refereed Proceeding Papers

1. N. Mirenkov. VIM language paradigm. In B. Buchberger and J. Volkert, editors, Parallel Processing: CONPAR94 - VAPPVI; Lecture Notes in Computer Science 854, pages 569--580. Springer-Verlag, September 1994.

A visual language paradigm for the interactive specification of application algorithms is proposed for consideration. The approach is based on a set of computational schemes (shapes of computation) presented by color figures, pictures and animation films with sound accompaniment. Each film is related to series of frames (computational steps) and it reflects some knowledge about data processing. Each frame brightens'' up a substructure of data for which operations should be specified. As a rule, this substructure is a set of points and/or moving objects in a multi-dimensional space-time. A user embeds his algorithm into computational schemes by making these schemes more precise. In fact, he defines the specification by creating his new film. The corresponding program (sequential or parallel) is generated automatically.

2. N. Mirenkov. Visualization of methods and parallel program transparency. In I. Jelly and I. Gorton, editors, World Transputer Congress: CASE Tools for Parallel Systems Development, pages 58--63, September 1994.

An approach supporting parallel program transparency for the VIM language paradigm is considered. The basis of the approach is that for each computational scheme (shape'' of computation) presented by an animation film, a virtual'' parallel program is developed. Application specifications are considered as input data for virtual'' programs. Such a virtual'' program is based on a set of hand-made library items covering certain fields of data processing. Each item is a scalable parallel program. Only one of these parallel programs is used during run-time, which one depends on specification and system environment.

3. N. Mirenkov. Visualization and sonification of methods. In The First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, pages 63--72. IEEE Computer Society Press, March 1995.

An approach supporting a new programming philosophy and a way of developing the corresponding technology are presented. The idea behind the approach is making use of animation films as communication units for computer-human dialog. These films have their own sets of figures, sounds, colors and their own shape'' of the movement. Users recognize the units due to a combination of all these features. Each film is related to series of frames (computational steps) and reflects some knowledge about data processing. Each frame highlights a substructure of data for which operations should be specified. As a rule, this substructure is a set of points and/or moving objects in a multi-dimensional space-time. To specify a problem a user employ: modification of the system films, composition of these films, seeing a film being created, etc. The corresponding program (sequential or parallel) is generated automatically.

4. Q. Gu, S. Okawa, and S. Peng. Disjoint paths in hypercubes and star graphs. In Joint Symposium on Parallel Processing, pages 49--56, May 1994.

The set-to-set routing which allows certain node failure is considered. We first present an interesting theorem for the set-to-set routing problem in general graphs. Then, efficient algorithms are given for the set-to-set routing problem in hypercubes and star graphs. Hypercubes and star graphs have been accepted as attractive interconnection networks. They have logarithmic and sublogarithmic node degree and diameter, respectively. They both possess rich structure and symmetry properties as well as many desirable fault tolerance characteristics. For $n$-dimensional hypercubes and star graphs, our algorithms run in $O(n^{2})$ optimal time. The algorithms find the node disjoint paths of length at most $2n-2$ for hypercubes and the node disjoint paths of length at most $\lfloor \frac{3(n-1)}{2} \rfloor + 5$ for star graphs.

5. Q. Gu and S. Peng. Efficient algorithms for disjoint paths in star networks. In The 6th International Transputer/Occam Conference, pages 53--65. IOS Press, June 1994.

In this paper, we give optimal time algorithms for the following node disjoint path problems in $n$-dimensional star graphs $G_n$: (1) Given an arbitrary node $s$ and a set of arbitrary distinct nodes $T=\{t_1,\ldots, t_{n-1}\}$, $s\not\in T$, in $G_n$, find $n-1$ node disjoint paths connecting $s$ and each node of $T$; and (2) Given arbitrary $k= \lceil \frac{n-1}{2}\rceil$ pairs of distinct nodes $(s_1,t_1),\ldots,(s_k,t_k)$ in $G_n$, find $k$ node disjoint paths, one path connecting one pair of nodes. For problem (1), our algorithm finds the $n-1$ node disjoint paths of length at most $d(G_n)+4$ in $O(n^2)$ optimal time, where $d(G_n)=\lfloor\frac{3(n-1)}{2}\rfloor$ is the diameter of $G_n$. The length of the paths given in this paper improves the previous result of $5n-2$. For problem (2), our algorithm finds the $k$ node disjoint paths of length at most $d(G_n)+5$ in $O(n^2)$ optimal time. Our results improve the previous ones of the time $O(n^4\log n)!$ and the length $4(n-2)$, respectively.

6. Q. Gu and S. Peng. k-pairwise cluster fault tolerant routing in hypercubes. In The 5th International Symposium on Algorithms and Computation, pages 342--350. Springer-Verlag, August 1994.

In this paper, we introduce a general fault-tolerant routing problem, {\it cluster fault tolerant routing}, which is a natural extension of the well studied node fault tolerant routing problem. A cluster is a connected subgraph of a graph $G$ and a cluster is faulty if all nodes in it are faulty. {\it Cluster fault tolerant} (abbreviated as CFT in what follows) routing studies the number of fault clusters and the size (diameter) of the clusters that an interconnection network can tolerate in fault tolerant routing problems. In this paper, we investigate $k$-pairwise CFT routing in an $n$-dimensional hypercubes $H_n$. We give an $O(n^2\log n)$ time algorithm which finds the required routing paths for $k$-pairwise CFT routing in $H_n$. Our algorithm implies an $O(n^2\log n)$ algorithm for $k$-pairwise node disjoint path problem in $H_n$. The algorithm for $k$-pairwise node disjoint path problem in $H_n$ significantly improves the previous results of time comp! lexity $O(n^3\log n)$.

7. Q. Gu and S. Peng. Advanced fault tolerant routing in hypercubes. In The International Symposium on Parallel Architectures, Algorithms, and Networks, pages 189--196. IEEE Computer Society Press, December 1994.

In this paper, we study the fault tolerant properties of $n$-dimensional hypercubes $H_n$ for node-to-set and set-to-set routing problems on a general fault-tolerant routing model, {\it cluster fault tolerant routing}, which is a natural extension of the well studied node fault tolerant routing. A cluster of a graph $G$ is a connected subgraph of $G$ and a cluster is called faulty if all nodes in the cluster are faulty. For node-to-set routing and set-to-set routing, where $k$ ($2\leq k\leq n$) fault-free node disjoint paths are needed, in $H_n$, we show that the maximum numbers of fault clusters of diameter at most 1 that can be tolerated is $n-k$. We give $O(kn)$ optimal time algorithms which find $k$ fault-free node disjoint paths of length at most $n+3$ for node-to-set and $k$ fault-free node disjoint paths of length at most $2n$ for set-to-set cluster fault tolerant routing problems in $H_n$, respectively.

8. Q. Gu and S. Peng. Algorithms for node disjoint paths in incomplete star networks. In International Conference on Parallel and Distributed Systems, pages 296--303. IEEE Computer Society Press, December 1994.

In this paper, we give an efficient algorithm for the following $k$-pairwise node disjoint path problem in $n$-dimensional hypercubes $H_n$: Given $k=\lceil \frac{n}{2}\rceil$ pairs of $2k$ distinct nodes $(s_1,t_1),\ldots,(s_k,t_k)$ in $H_n$, $n\geq 4$, find $k$ node disjoint paths, each path connecting one pair of nodes. Our algorithm finds the $k$ node disjoint paths in $O(n^2\log^* n)$ time which improves the previous result of $O(n^2\log n)$. The length of the paths constructed in our algorithm is at most $n+\lceil \log n\rceil +1$ which improves the previous result of $2n$ as well.

9. Q. Gu and S. Peng. Fault tolerant routing in toroidal networks. In The first Aizu International Symposium on Algorithm/Architecture Synthesis, pages 162--168. IEEE Computer Society Press, March 1995.

In this paper, we give an $O(r^2)$ time algorithm for onstructing a fault-free routing path of ptimal length between any two non-fault nodes of $r$-dimensional torus with $2r-1$ faulty nodes. he fastest previously known algorithm for this problem takes $O(r^3)$ time; e show that Rabin diameter of a $r$-dimensional torus is ts diameter plus one. e also describe a cluster fault tolerant (CFT) routing model and give an efficient algorithm or node-to-node CFT routing.

10. S. G. Sedukhin and I. S. Sedukhin. Systematic approach and software tool for systolic design. In B. Buchberger and J. Volkert, editors, Proceedings of Third Join International Conference on Vector and Parallel Processing, pages 172--183, Altenbergerstr. 69, A-4040 Linz, Austria, September 1994. Johannes Kepler Univ. Linz (Austria), Springer-Verlag.

A systematic approach for formal synthesis of a set of systolic array processors for the matrix formulated algorithms is presented. This approach is a theoretical background for the visual software tool S4CAD. Starting with the source algorithm specification in the form of the system of linear equations the formal approach and S4CAD tool allow us to obtain and examine the set of permissible array processors. As an example the design of systolic array processors for the transitive closure algorithm is shown.

11. S. G. Sedukhin and I. S. Sedukhin. Software tool for systolic design. In I. Jelly and I. Gorton, editors, World Transputer Congress. Proceedings of the Workshop on CASE Tools for Parallel Systems Development, pages 74--78, September 1994.

The S4CAD software tool for systolic design is presented. The tool uses advanced media technologies of the MS Windows that gets more comfort for the user to evaluate and chose an optimal solution observing requirements on the computing time, number of processing elements, connection topology, etc. Number of basic algorithms of linear algebra and graph theory was considered and added to the S4CAD library.

12. S. V. Ten, V. V. Savchenko, and A. A. Pasko. Time performance evaluation of implicit surfaces polygonization on distributed systems. In PCAT-94 Parallel Computing: Technology and Practice, pages 183--193. University of Wollongong, IOS Press, November 1994.

This paper presents an approach and examples of a polygonal approximation for surfaces defined by implicit functions. The algorithm has been designed to be scalable and to provide high rate of parallelization. Firstly this algorithm has been implemented on a transputer network with toroidal architecture. Later its version suitable for execution under PVM system was implemented.

13. A. Kondratyev, A. Taubin, and S. Ten. Verification of asynchronous circuits by petri net unfolfdings. In IEEE Symposium on Emerging Technologies & Factory Automation, pages 404--413. IEEE Service Center, November 1994.

This paper suggests a way for Petri Net analysis by checking the ordering relations between places and transitions. The method is based on unfolding the original net into an equivalent acyclic description. We improved on the previously known cutoff criterion for truncating the unfolding. No restrictions are imposed on the class of general PNs. The new criterion significantly reduces the size of unfolding obtained by PN. The implementability conditions are formulated in such a way that they can be checked by analysis of ordering relations between signal transitions rather than by traversal of states. This allows avoiding the state explosion problem for highly parallel specifications.

14. O. Hammami and S. Ten. Comparison of static and dynamic analysis of nasa and clapack benchmarks using cc and gcc compilers and prediction capacity. In Applied Informatics: Thirteen IASTED International Conference, pages 446--448. IASTED, IASTED-Acta Press, February 1995.

In this study, we conducted an empirical static analysis of NASA and CLAPACK Benchmarks. Different metrics were measured and their dispersion analysed. All the programs are written in C and compiled with two different compilers cc and gcc. For each compiler we used both no optimization and full optimization modes. The goal of this study is multiple and attempts to understand more programs structure and programs requirements in order to derive some hints about architectural support or compiler time optimizations algorithms.

Books

N. Mirenkov, Q-P. Gu, S. Peng, and S. Sedukhin, editors. The First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis. IEEE Computer Society Press, 1995.

Technical Reports

1. N. Mirenkov, S. Peng, I. Sedukhin, S. Sedukhin, and S. Ten. Parallel processing education. 94-1-034, University of Aizu, 1994.

2. T. Ikedo and N. Mirenkov. Aizu supercomputer: A massively parallel system for virtual reality problems. 94-2-003, University of Aizu, 1994.

3. I. Sedukhin, S. Sedukhin, A. Pasko, V. Savchenko, and N. Mirenkov. Parallel rendering of functionally represented geomtric objects with the network Linda system. 95-1-001, University of Aizu, 1995.

4. S. Sedukhin. A new systolic architecture for pipeline prime factor dft-algorithm. 94-1-020, University of Aizu, 1994.

5. S. Peng and S. Sedukhin. Computation of singular value decomposition on a ring of computers. 94-1-037, University of Aizu, 1994.

1. Nikolay N. Mirenkov, The Journal: Scientific programming, 1994. Member of Editorial Advisory Board.

2. Nikolay N. Mirenkov, The new series of books in Parallel and Distributed Computing, Chapman and Hall, UK, 1994. Member of Editorial Board.

3. Nikolay N. Mirenkov, Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, Aizu-Wakamatsu, Japan, March 1995. Editor of the pAs'95 Proceedings.

4. Nikolay N. Mirenkov, ACM, IEEE and The New York Academy of Sciences, 1994. Member.

5. Nikolay N. Mirenkov, CONPAR 94-VAPP VI Conference, Linz, Austria, 1994. Member of Standing and Program Committe.

6. Nikolay N. Mirenkov, Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, Aizu-Wakamatsu, Japan, 1995. Chairman of the Program Committee.

7. Nikolay N. Mirenkov, Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, Aizu-Wakamatsu, Japan, 1995. Member of the Organizing Committee.

8. Nikolay N. Mirenkov, American Mathematical Society, 1994. Reviewer.

9. Nikolay N. Mirenkov, The Journal: Programming, Nauka, Russia, 1994. Member of Editorial Board.

10. Shietung Peng, Aizu International Symposium on Parallel Algorithms/Architectures Synthesis, 1994.

Editor of Proceedings.

11. Shietung Peng, IEEE Transactions on Parallel and Distributed Systems, 1994. Reviewer.

12. Shietung Peng, IEEE Transactions on Computers, 1994. Reviewer.

13. Shietung Peng, Information Processing Letters, 1994. Reviewer.

14. Shietung Peng, International conferences (pAs'95, ISPAN'94, ICPADS'94), 1994. Reviewer.

15. Shietung Peng, International conference pAs'95, 1995. Member of program and organizing committees.

16. Stanislav G. Sedukhin, Journal: Parallel Processing Letters, 1994. Reviewer of articles.

17. Stanislav G. Sedukhin, Univ. of Aizu, 1994. Member of the Supercomputing Scientific Group.

18. Stanislav G. Sedukhin, International Connection Machines Group, 1994. Member.

19. Stanislav G. Sedukhin, University of Aizu, 1994. Editor and developer of the English-Japanese Multimedia Glossary on Advanced Computing and Communication.

20. Stanislav G. Sedukhin, The International Journal: Parallel Processing Letters, 1994. Member of Editorial Board.

21. Stanislav G. Sedukhin, CONPAR/VAPP International Conferences on Vector and Parallel Processing, 1994. Member of the Program Committee.

22. Stanislav G. Sedukhin, The Aizu International Symposium on Parallel Algorithms/Architecture Synthesis (pAs'95), 1995. Member of the Organizing and Program Committees.

23. Stanislav G. Sedukhin, IEEE Computer Society, 1994. Member.

24. Stanislav G. Sedukhin, ACM, 1994. Member.

25. Stanislav G. Sedukhin, Aizu International Symposium on Parallel Algorithms/Architecture Synthesis (pAs'95), 1995. Editor of the Proceedings.

Next: Operating Systems Laboratory Up: Department of Computer Previous: Language Processing Systems

www@u-aizu.ac.jp
January 1996