Japanese
Department of Computer Software

Distributed Parallel Processing Laboratory

Nikolay N. MirenkovStanislav G. SedukhinRentaro Yoshioka
Nikolay N. Mirenkov
Professor
Stanislav G. Sedukhin
Professor
Rentaro Yoshioka
Special Researcher
Our general program is cyberinfrastructure and high-performance computing. We are also thinking about a peer-to-peer and grid computing, as well as about active knowledge being developed by humanity. We are undertaking research efforts in parallel and distributed algorithms/architectures, visual (multimedia) languages and tools.

  We have proposed a new scalable parallel processor architecture for the future VLSI technology. This processor (code name Trident) is able to support the scalar, vector, and matrix instructions and based on local communications not only within but also between processing elements. We have shown that proposed architecture can be effectively used for many scientific, engineering, and multimedia applications since they are mostly based on mixing of scalar, vector, and matrix parts.

  In a great part, our research and development are based on an idea of self-explanatory components in a cyberFilm format. A cyberFilm is a set of color stills supported, if necessary, by text, voice/sound and special links. Each still is to represent a view (some features) of objects or processes. Each cyberFilm is to represent a multiple view (an extended set of dynamic and/or static features) of objects or processes. Different views can be related to different moments of time, positions in space, levels of hierarchy, values of data attached to space points, etc. Different media can be used for different views. A self-explanatory cyberFilm means that the associated stills are organized and presented in such a way that the semantic richness is clearly brought out. The investment of meaning in the cyberFilm is reduced to developing a series of views watching (and hearing) in nonlinear order. Usually, a still is self-evident and a cyberFilm is a result of special gathering of clues or hints. This result is a piece of knowledge. So, self-explanatory adequacy depends on this knowledge. The more accurate and relevant views are used, the greater adequacy is reached.

  The idea of cyberFilms is used for the specification of information resources and programming operations with the resources, as well as for the representation of multimedia messages and implementation of human-computer interfaces. The idea of equal opportunities to all individuals in the use of information resources is used to create a right set of cyberFilms and methods of their adaptation. We lead three projects related to filmification of methods and data: 1) Active Knowledge Studio for teachers, students, and programmers, 2) F-Communication System for children, elderly and handicapped people, and 3) Virtual objects, haptic interface and tactile visualization for medical doctors and caretakers.

Refereed Journal Papers

[nikmir-01:2003]Tetsuya Hirotomi and Nikolay Mirenkov. Self-explanatory components: a basis for new communicators. Journal of Visual Languages and Computing, 1(14):215-232, 2003.

An approach for using the self-explanatory component concept in multimedia message representation is considered. It is based on cyberFfilm formats of multimedia symbols (multimedia words) and multimedia sentences (multimedia hieroglyphs). Corresponding films are multiple views of objects, processes, etc. They are pieces of knowledge. These pieces are acquired in a film database. The user should not study them in advance. A film management system (including self-explanatory interface panels) provides effective access to the database items. In this paper, the concept of self-explanatory components, formats to represent multimedia sentences and words, as well as a brief explanation of the film (component) management system are provided. Some implementation results oriented to people with communication disorders are presented.are provided.
[nikmir-02:2003]Tetsuya Hirotomi and Nikolay Mirenkov. Augmentative and Alternative Communication Based on Multimedia Hieroglyphs. Journal of the Japanese Society for Wellbeing Science and Assistive Technology, 3(2), 2004.

We have developed an augment ative and alternative communication system oriented to people with expressive disorder. In this system, multimedia is used to fully utilize available communication capabilities of different users. The system is based on film formats of multimedia words and sentences called multimedia hieroglyphs. Corresponding films are multiple views of objects and/or processes. They are pieces of knowledge. These pieces are acquired in a network-accessible film database. To access them, the system provides special interfaces implemented as rather independent subsystems oriented to different groups of users. These interfaces support keyboard-less operations for searching, creating, and browsing multimedia words/hieroglyphs, as well as sending and receiving corresponding messages. The goal of this paper is to represent a new computer mediated communication environment. This environment is based on our approach. It includes ideas of multimedia sentences consisting of multimedia words as well as self-explanatory features behind such words. Special attention is paid to the implementation of F-Mail subsystem oriented to cerebral palsied people. Initial results of usability tests are also provided.
[nikmir-03:2003]Robert Roxas and Nikolay Mirenkov. Visual Tool for Specifying MPI Collective Communication Operations. WSEAS Transactions on Computers, 2(1):454-459, 2003.

Since scientific applications usually manipulate a very huge volume of data, parallel computing is the best choice for solving this kind of applications. However, making parallel programs is not an easy task for programmers.They need to have a good tool or environment wherein they can easily specify what they want to solve. Because of the complexity in making parallel programs, visual programming is a very promising approach in this area. This paper describes a system that allows the creation of programs from algorithmic film specifications, where the use of text is minimal. In this system, a language of micro-icons is used to specify any collective operation and this paper shows how this visual programming environment supports the manipulation with these micro-icons. Because of space constraints, we limit the example to a broadcast operation. Using the same technique, it is possible for other types of collective communication operations to be specified in this visual programming environment.
[nikmir-04:2003]Mahmoud Saber and Nikolay Mirenkov. Multiple Views Specifications for Cellular Automata Systems. Journal of Three Dimensional Images, 18(1):59-65, 2004.

It has been a long time since Von Neumann invented the cellular automata model.Many programming environments and languages were developed to specify and implement cellular automata-like systems (CA). These languages focus on computational and performance issues, and provide a rather conventional view of the CA description. Visual methods are not often applied in CA specification, despite their advantages and the maturity of tools. A new approach to specify cellular automata systems is presented, where multiple views are simultaneously used to describe more comprehensively the required systems. Syntactic and semantic aspects of the corresponding models are captured using recognizable and intuitively clear interfaces. This work is based on self-explanatory components technology which constitutes a framework for visual representation and specification of objects/processes, based on the idea of multiple views and multimedia algorithmic skeletons. In this paper, we describe the design and development of the multiple views specification of CA, and explain the implementation of the visual user interfaces used in manipulation with these views.
[nikmir-05:2003]Mahmoud Saber and Nikolay Mirenkov. Visual Steps towards Better Understanding of Cellular Automata Algorithms. WSEASTransactions on Computers, 2(2):315-320, 2003.

The cellular automata (CA) models and corresponding algorithms have a rich theoretical basis. They have also been used in a great variety of applications. A number of programming languages and systems have been developed to support the implementation of the CA models. However, these languages focus on computational and performance issues, and do not pay enough attention to programming productivity, usability, understandability, and other aspects of software engineering. In this paper, we describe a new special-purpose programming environment developed for visual specification, presentation and explanation of CA systems, as well as, for programming them. This environment is based on using visual patterns, colors, and animation for representing the CA system structures and operations on these structures, and for performing editing and composing manipulations with corresponding software components. Examples of the CA algorithm representations and details of the environment implementation are presented.
[nikmir-06:2003]P.-A. Fayolle, A. Pasko, and N. Mirenkov. Fitting of Parameterized FRep Shape Models. he Journal of Three Dimensional Images, 18(1):40-46, 2004.

reverse engineering of constructive solids in CAD applications. Recovering the best parameters of a FRep model from the given scanned surface points turns out to be a diAEcult problem of nonlinear optimization. We recall the traditional methods for solving such problems: direct methods like Levenberg Marquardt or Newton and sampling methods like simulated annealing.In order to overcome problems of each method, we combine them in a two-step process. We apply and compare all these different methods to the recovery of some mechanical parts.
[sedukhin-01:2003]M. Soliman and S. Sedukhin. Technology Scalable Matrix Architecture for Data Parallel Applications. IEICE Trans. INF. & SYST., E86-D(9):1449-1559, 2003.

Within a few years it will be possible to integrate a billion transistors on a single chip operating at frequency 10 GHz. At this integration level, we propose using a multi-level ISA to express fine-grain data parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of data parallel applications are scalar, vector, and matrix, our proposed Trident processor extends a scalar ISA with vector and matrix instruction sets to effectively process matrix formulated applications. Like parallel vector architectures, the Trident processor consists of a set of parallel lanes (each lane contains a set of vector pipelines and a slice of register file) combined with a fast scalar core. However, Trident processor can effectively process on the parallel lanes not only vector but also matrix data. One key point of our architecture is the local communication within and across lanes to overcome the limitations of the future VLSI technology. Another key point is the effective execution of a mixture of scalar, vector, and matrix operations. This paper describes the architecture of the Trident processor and evaluates its performance on BLAS and on the standard matrix bidiagonalization algorithm. The last one is evaluated as an example of an entire application based on a mixture of scalar, vector, and matrix operations. Our results show that many data parallel applications, such as scientific, engineering, multimedia, etc., can be speeded up on the Trident processor. Besides, the scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires only replication of parallel lanes.
[sedukhin-02:2003]M. Soliman and S. Sedukhin. Matrix Bidiagonalization: Implementation and Evaluation on the Trident Processor. Neural, Parallel & Scientific Computation, 11(4):395-422, 2003.

This paper discusses the parallel implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector, matrix vector, and matrix operations needed for bidiagonalizing a matrix. The number of clock cycles per FLOP is used as a metric to evaluate the performance of the Trident processor. Our results show that the high-eAEciency is attained by using as much as possible matrix-vector and matrix operations because of reducing the ratio of memory accesses to FLOP. On a 32K?32K matrix and 128 Trident lanes, the speedup of using matrix-vector operations on the standard Golub and Kahan algorithm over using only vector operations on one lane is around 190 times (super linear) and on 128 lanes is around two times. However, using matrix operations on the GEBRD subroutine gives speedup around 307 times (super linear) over using vector operations on one lane, 3.2 times over using vector operations on 128 lanes, and 1.3 times over using matrix-vector operations.

Refereed Proceeding Papers

[nikmir-07:2003]Robert R. Roxas and Nikolay N. Mirenkov. Visual Approach to Specifying Message-Passing Operations. In Proc. of the 2003 International Conference on Parallel Processing Workshops, pages 263-270, Taiwan, Oct. 2003.

Visual programming is a very promising approach for parallel programming because of the complexity in making parallel programs. There were several attempts to provide a visual environment for making parallel programs but only achieved a limited success.The common lyused technique is to draw some graphs whose nodes represent modules and arcs represent some communication paths. The graphs are then annotated by attaching some conventional programming codes. In practice, this approach can be useful but in a limited number of cases. To improve the situation, a new visual programming environment is being developed that allows the creation of programsfrom algorithmic film specifications with a minimal use of text in making programs. In this environment, there are six different groups of frames for the programmer to watch, edit, and specify operations.Oneof them is for specifiying I/Ooperationsandcommunication between software components in a complex program. Specifying communications among processes in a parallel program is just a partial case in this subsystem. This paper presents a visual environment for specifying communication among processes in a parallel program using a language of micro-icons. As an example, the scatter and gather types of collectivecommunication are presented based on the master/slave scheme of computation. These examples show how to define message-passing communication without using text-based programming style.
[nikmir-08:2003]Mahmoud Saber and Nikolay N. Mirenkov. Multimedia Parallel Programming Tool for Cellular Automata Systems. In Proceedings of International Symposium on Parallel and Distributed Processing and Applications (ISPA): Lecture notes in computer science, 2745, pages 437-448, Springer, 2003.

The Active Knowledge Studio group at the University of Aizu is studying, designing, and developing multimedia programming tools for various domains. These special purpose tools are developed within the frame-work of a more general environment (Active Knowledge Studio) and based on common design and implementation approaches. However, because of the orientation to different domains, each tool possesses its own features represented through specific multimedia objects and interface panels. In this paper, we provide an outline of our approach to programming cellular automata systems with one of our multimedia tools. We also provide a brief explanation of a user interface subsystem and discuss concepts and features of a program generator subsystem in this tool. We pay special attention to the parallel template programs supporting the automatic generation of executable codes from the multimedia specifications.
[nikmir-09:2003]Mahmoud Saber and Nikolay N. Mirenkov. A multimedia programming environment for cellular automata systems. In The 9th International Conference on Distributed Multimedia Systems, pages 263-270, Florida, USA, Sep. 2003.

In this paper, an Active Knowledge Studio environment oriented to programming cellular automata systems where global behavior arises from the collective effect of many locally interacting, simple components is presented. We provide an outline of our approach to programming such systems with essential attention to multimedia features of the environment. This comprising a multimedia interface subsystem, a program generation subsystem, a rendering engine, a parser, a template programs library, and internal meta files.
[sedukhin-03:2003]M. Soliman and S. Sedukhin. BLAS on the Trident Processor: Implementation and Performance Evaluation. In Editor N. Debnath, editor, Proceedings of the ISCA 18th International Conference on Computers and their Applications (CATA-2003), pages 359-364, Cary, NC, USA, March 2003. ISCA, International Society for Computers and Their Applications.

This paper describes the implementation of the Basic Linear Algebra Subprograms (BLAS), which are widely used in many applications, on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector-vector, matrix-vector, and matrix-matrixoperations needed for implementingBLAS.TheTFLOPSrate on infinite-size problems (Roo), which is primarily a characteristic of the computer technology, and the problem size needed to reach one-half of Roo (N1/2), which is a measure of the amount of parallelism in a computer architecture, are used to evaluate the performance of the Trident processor on BLAS. On 128 parallel Trident lanes and 10 GHz clock frequency, which are possible in the billiontransistor era, Roo of dot-product, matrix-vector, and matrix-matrix multiplications are 1.1, 1.8, and2.5TFLOPS,respectively. Besides, N1/2increases when switching from low level to high level of BLAS.
[sedukhin-04:2003]M. Soliman and S. Sedukhin. Matrix Bidiagonalization on the Trident Processor. In Editor M. Cosnard, editor, Proceedings of the 17th Internationl Parallel & DistributedProcessing Symposium (IPDPS2003), pages CD-ROM Edition, Nice, France, Apr. 2003. IEEE, ACM, IEEE Computer Society Press.

This paper discusses the implementation and evaluation of the reduction of a matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in Level 2 BLAS, andtheLapacksubroutineDGEBRD,which is richinLevel 2andLevel 3BLAS, are simulated on the Trident processor. We showhowto use the Trident parallel execution units, ring, andcommunication registers to effectively perform vector, matrix-vector, and matrix operations needed for bidiagonalizing a matrix. The number of cycles per FLOP is used as a metric to evaluate the performance of the Trident processor. Our results show that increasing the number of the Trident lanes decreases the number of cycles needed per FLOP. On a square matrix 32K?32K and 128 Trident lanes, the speedup of using Level 2 BLAS is around 1.5 times over Level 1 BLAS. However, using Level 3 BLAS gives speedup around 3 times over Level 1 BLAS, and 2 times over Level 2 BLAS, assuming 64 elements can be loaded or stored per cycle.
[sedukhin-05:2003]S. Sedukhin and M. Soliman. Trident: Technology-Scalable Architecture for Data Parallel Applications. In Editor M. Cosnard, editor, Proceedings of the 17th Internationl Parallel & Distributed Processing Symposium (IPDPS 2003), pages CD{ROM Edition, Nice, France, Apr. 2003. IEEE, ACM, IEEE Computer Society Press.

This paper discusses the physical and technological limitations of today and future VLSI technology and proposes the Trident processor architecture which is based on local communications.
[sedukhin-06:2003]A. Takahashi, M. Soliman, and S. Sedukhin. Parallel LU-decomposition on Pentium Streaming SIMD Extensions. In Joe K. Amano H. Aiso H Editors Veidenbaum, A., editor, Proceedings of the 5th International Symposium on High Performance Computing (ISHPC 2003), pages 423-430, Tokyo, Japan, Oct. 2003. Springer, Springer Verlag.

Solving systems of linear equations is central in scientific computation. In this paper, we focus on using Intel's Pentium Streaming SIMD Extensions (SSE) for parallel implementation of LU-decomposition algorithm. Two implementations (non-SSE and SSE) of LU-decomposition are compared. Moreover, two different variants of the algorithm for the SSE version are also compared. Our results demonstrate an avarage performance of 2.25 times faster than the non SSE version. This speedup is higher than 1.74 times the speedup of Intel's SSE implementation. The source of the speedup is highly reusing of loaded data by eAEciently organizing SSE instructions.
[sedukhin-07:2003]M. Soliman and S. Sedukhin. Highly-Scalable Array Processor for Data Parallel Applications. In KyushuUniversity, editor, Proceedings of the International Symposium on Information Science and Electrical Engineering 2003 (ISEE 2003), pages 167-170, Kyushu, Japan, Nov.2003. ACROS, Kyushu University.

Trident is a researchprocessor, which consists of a set of parallel lanes combined with a fast scalar core. Like vector architectures, each Trident lane contains a set of vector pipelines and a slice of register file. However, Trident has two types of registers to store and cyclically shift 1-D array of data within and across the parallel lanes. One key point of our architecture is the local communication within and across lanes to overcome the limitations of the future VLSI technology. Another key point is the effective processing of a mixture of 0-D (scalar), 1-D (vector), and 2-D (matrix) arrays of data, which are the fundamental data structures for data parallel applications. Moreover, the architecture of the Tridentprocessor is scalable and its scalabilitydoes not require more fetch, decode, or issue bandwidth, but requires replicating the number of lanes. This paper de scribes the main features of the Trident processor and evaluates its performance on the QR factorization algorithm.

Books

[sedukhin-08:2003]AmosOmondiand Stanislav Sedukhin. Advances in Computer Systems Architecture, Proc. of the 8th Asia-Pacific Conference, ACSAC 2003,. Number 2823 in Lecture Notes in Computer Science. Springer Verlag, New York, 2003.

Grants

[nikmir-10:2003]N. Mirenkov. Ministry of Education Scientific Research Fund, 2001-2003.
[nikmir-11:2003]N.Mirenkov. KnowledgeClusterCreationSupportProject, 2003-2004.
[nikmir-12:2003]N. Mirenkov. Fukushima Prefectural Foundation, 2003.
[rentaro-01:2003]RYoshioka. Grant-in-Aid for Scientific Research, Japanese Ministry of Education, Culture, Sports, Science and Technology, 2003-2005.

Academic Activities

[nikmir-13:2003]Nikolay Mirenkov, 2003.

Regional ACM/ICPC Contest Director

[nikmir-14:2003]Nikolay Mirenkov, Sep. 2003.

Preparation activity as Co-chair of the Program Committee, Int. conference DMS-2004, USA

[nikmir-15:2003]Nikolay Mirenkov, 2003.

Member, the IFIP Working Group 10.3 (Concurrent Systems)

[nikmir-16:2003]Nikolay Mirenkov, 2003.

Associate Editor, the Tamkang Journal of Science and Engineering, International Journal

[nikmir-17:2003]Nikolay Mirenkov, 2003.

Member of ACM and IEEE

[sedukhin-09:2003]S Sedukhin, Apr. 2003.

IEEE CS, member

[sedukhin-10:2003]S Sedukhin, Apr. 2003.

ACM,member

[sedukhin-11:2003]S Sedukhin, Apr. 2003.

IASTED Technical Committee on Parallel Processing, member

[sedukhin-12:2003]S Sedukhin, Apr. 2003.

International Journal of Neural, Parallel & Scientific Computations, Member of the Editorial Board

[sedukhin-13:2003]S Sedukhin, Apr. 2003.

Associative Editor for the IEICE Trans., Special Issue

[sedukhin-14:2003]S Sedukhin, Apr. 2003.

International Journal of Parallel Processing Letters, Member of the Editorial Board

[sedukhin-15:2003]S Sedukhin, Sept. 2003.

The Eighth Asia-Pacific Computer Systems Architecture Conference (AC-SAC003), Organizing & Program Committee Chair

[sedukhin-16:2003]S Sedukhin, July 2003.

The International Symposium on Parallel and Distributed Processing and Applications (ISPA003),Program Committee Member

[sedukhin-17:2003]S Sedukhin, Aug. 2003.

4th International Conference on Parallel and Distributed Computing, Applications, and Technologies (PDCAT 2002), Program Committee Member

Patents

[nikmir-18:2003]Hirotomi T. Mirenkov, N. and S. Narita. A method to create 3D objects, filed, 2003-432771, Dec 2003.
[nikmir-19:2003]N. Mirenkov and T. Hirotomi. A method to represent data and interfaces, filed, 2004-86903, March 2004.

Ph.D and Other Theses

[nikmir-20:2003]Masami Azuma. Graduation Thesis: Composition of Self-explanatory Components using Data Flow Schemes, University of Aizu,2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-21:2003]Takuya Azumi. Graduation Thesis: An Extendable Format to Represent Formulas for Self-explanatory Components, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-22:2003]Satoshi Koyanagi. Graduation Thesis: Methods and Interfaces for Supporting Composite CyberFilms in Active Knowledge Studio, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-23:2003]Sho Narita. Graduation Thesis: Haptic Interface and Tactile Visualization for Multimedia Systems, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-24:2003]Yuusuke Nomiya. Graduation Thesis: Multimedia Hieroglyph Translation and Semantic Filtering, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-25:2003]Hidenori Fujishiro. Graduation Thesis: Data and Methods for Editing Algorithmic Skeleton Schemes, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-26:2003]Michie Okawa. Master Thesis: Multimedia word attributes in F-Mail system, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-27:2003]Yutaka Watanobe. Master Thesis: CyberFilm library and its interface, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[nikmir-28:2003]Yuho Tsuchida. Master Thesis: Scenes and Stills Editor for Algorithmic Skeletons, University of Aizu, 2003.

Thesis Advisor: Nikolay Mirenkov

[sedukhin-18:2003]Akihito Takahashi. Graduation Thesis: Parallel LU-decomposition on Pentium Streaming SIMD Extensions, University of Aizu, 2004.

Thesis Advisor: Sedukhin, S.

[sedukhin-19:2003]Ryosuke Kato. Graduation Thesis: Sharing Computers for Packet Analyzing, University of Aizu, 2004.

Thesis Advisor: Sedukhin, S.

[sedukhin-20:2003]Daisaku Fujikawa. Master Thesis: Analysis and Evaluation of 2D Parallel DCT, University of Aizu, 2004.

Thesis Advisor: Sedukhin, S.