Operating Systems Laboratory, Annual Review 2006, The University of Aizu

Annual Review 2006 > Operating Systems Laboratory

Operating Systems Laboratory


Stanislav Sedukhin Professor	Hitoshi Oi Assistant Professor	Steven A. Guccione Visiting Researcher

Toroidal computer architecture and algorithms
Embedded Processor Design and Evaluation
Virtual Machine for Sensor Network Nodes
Performance Analysis of Server Workload
System Level Virtual Machine

Toroidal computer architecture and algorithms

Embedded Processor Design and Evaluation

Virtual Machine for Sensor Network Nodes

Performance Analysis of Server Workload

System Level Virtualization

Refereed Proceeding Papers

[hitoshi-01:2006]	Hitoshi Oi. Instruction Folding in a Hardware-Translation Based Java VirtualMachine. In Proceedings of ACM International Conferenceon Computing Frontiers, pages 138.145. ACM/SIGMICRO, May 2006.
	Bytecode hardware-translation improves the performance ofa Java Virtual Machine (JVM) with small hardware resourceand complexity overhead.Instruction folding is a technique to further improve the performanceof a JVM by reducing the redundancy in the stack-basedinstruction execution.However, the variable instruction length of the Java bytecode makesthe folding logic complex.In this paper, we propose a folding scheme with reduced hardwarecomplexity and evaluate its performance.For seven benchmark cases, the proposed scheme folded6.6% to 37.1% of the bytecodes which correspond to84.2% to 102% of the PicoJava-II’s performance.
[sedukhin-01:2006]	A.S. Zekri and S.G. Sedukhin. Fine-grained Matrix Multiply- Add on a Torus Array Processor. In Bidyut Gupta, editor, Proceedings of the ISCA 22nd International Conference on Computers and Their Applications, pages 44.51, Honolulu, Hawaii, March 2007. ISCA, ISCA.
	In performing the n×n matrix multiply-add operation C=C+A×B on a finegrain N×Ntorus array processor, nN, the matrices are partitioned into blocks of size N so that the whole result is obtained by a sequence of N×N matrix multiply-add operations. When the sizes of matrices are not exact multiples of the array size, the remaining parts may drastically affect the performance depending on the shape of the matrices. Previously, we represented the 3D index space of the N×N matrix multiply-add operation as a 3D torus. The projection method was used to obtain the optimal 2D data allocations to perform the operation on the N×N torus array processor in N multiply-add-roll steps. In this paper, we use the optimal data allocations to present two approaches to deal with the fine-grain blocking of the matrix multiply-add operation. The packing approach performs multiple vector scaling or vector reduction operations together by proper alignment of data inside the array processor and applying the suitable data allocation. The padding approach pads the remaining parts up to the block size N. The analytical experiments show a gained performance of the packing approach over the padding approach when the sizes of the remaining parts are small compared to N.
[sedukhin-02:2006]	A.S. Zekri and S.G. Sedukhin. Matrix Transpose on 2D Torus Array Processor. In N.A., editor, Sixth International Conference on Computer and InformationTechnology (CIT 2006), pages 45.46, Seoul, Korea, Sept. 2006. CIT, IEEE Computer Society.
	Previously, we represented the index space of the (n×n)-matrix multiply-add problem C=C+A×B as a 3D torus, where A, B, and C are rolled along the corresponding axes of the index space. All optimal 2D data allocations (resulted from projection) to solve the problem on the n×n torus array processor in n multiply-add-roll steps were obtained.In this paper, we formulate the operations needed for aligning both the data before computing and the results after computing as matrix multiply-add problems. These alignment operations are combined with the optimal data allocations that solve the matrix multiply-add problem to propose new algorithms to transpose an n×n matrix on the n×n torus array processor in O(n) multiply-add-roll steps. Using the proposed algorithms, we showed different approaches to solve the transposed matrix multiply-add problem, C=C+AT×BT , on the 2D torus array processor.
[sedukhin-03:2006]	A.S. Zekri and S.G. Sedukhin. The general matrix multiplyadd operation on 2D torus. In N.A., editor, Proc. of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), pages 125. 131, Rhodes Island, Greece, April 2006. IPDPS, IEEE Computer Society.
	In this paper, the computation space of the (n×n)-matrix multiply-add problem C = C +A・B is represented as a 3D n×n×n torus. All possible time-scheduling functions to activate the computations inside the 3D torus are determined. To maximize efficiency when solving a single problem, we mapped the computation points into the 2D n×n toroidal array processor. All optimal 2D data allocations that solve the problem in n multiply-add-roll steps are obtained. The well known Cannon’s algorithm is one of the resulting allocations. We used the optimal data allocations to describe all variants of the GEMM operation on the 2D toroidal array processor. By controlling the data flow, the transposition operation is avoided in 75% of the GEMM variants. However, only one explicit matrix transpose is needed for the remaining 25%. Ultimately, we described four versions of the GEMM operation covering the possible layouts of the initially loaded data into the array processor.

Unrefereed Papers

[hitoshi-02:2006]

Hitoshi Oi and C. J. Bleakley. Towards a Low Power Virtual Machine for Wireless Sensor Network Motes. In Proceedings of the Japan-China Joint Workshopon Frontier of Computer Science and Technology (FCST 2006). IEEE, November 2006.

Virtual Machines (VMs) have been proposed as an efficientprogramming model for Wireless Sensor Network (WSN) devices.However, the processing overhead required for VM execution hasa significant impacton the power consumption and battery lifetime of these devices.This paper analyses the sources of power consumptionin the Mat´e VM for WSNs. The paper proposesa generalised processor architecture allowingfor hardware acceleration of VM execution.The paper proposes a numberof hardware accelerators for Mat´e VM execution and assesses theireffectiveness.

Grants

[hitoshi-03:2006]	Hitoshi Oi. The University of Aizu Competitive Research Fund, 2006-2007.
	Title: Design Investigation of the Embedded Microprocessors, Amount: \=817,000

Academic Activities

[hitoshi-04:2006]	Hitoshi Oi, Feb. 2006. Professional Member, IEEE/CS
[hitoshi-05:2006]	Hitoshi Oi, Apr. 2005. Professional Member, ACM
[hitoshi-06:2006]	Hitoshi Oi, Jan. 2006. Academic Member, representative for Aizu University, EEMBC
[hitoshi-07:2006]	Hitoshi Oi, Jul. 2006. Reviewer for Microprocessors and Microsystems, Elsevier
[hitoshi-08:2006]	Hitoshi Oi, May 2006. ACM International Conference on Computing Frontiers, Liaison Chair for Asia
[hitoshi-09:2006]	Hitoshi Oi, November 2006. Program Committee Member for the Japan-China Joint Workshopon Frontier of Computer Science and Technology (FCST 2006)
[hitoshi-10:2006]	Reviewer for the IEEE 2006 International Workshop onSignal Processing Systems
[sedukhin-04:2006]	S. Sedukhin, Apr. 2006. IEEE CS, member
[sedukhin-05:2006]	S. Sedukhin, Apr. 2006. ACM, member
[sedukhin-06:2006]	S. Sedukhin, Apr. 2006. IEICE, member
[sedukhin-07:2006]	S. Sedukhin, Apr. 2006. IASTED Technical Committee on Parallel Processing, member
[sedukhin-08:2006]	S. Sedukhin, Apr. 2006. International Journal of Parallel Processing Letters, Member of the Editorial Board
[sedukhin-09:2006]	S. Sedukhin, Apr. 2006. International Journal of High Performance Systems Architecture, Member of the Editorial Board
[sedukhin-10:2006]	S. Sedukhin, Sept. 2006. The 11th Asia-Pacific Computer Systems Architecture Conference (ACSAC006), Korea, Stearing Committee Member
[sedukhin-11:2006]	S. Sedukhin, August 2006. The 8th Workshop on High Performance Scientific and Engineering Computing, Program Committee Member
[sedukhin-12:2006]	S. Sedukhin, Apr. 2006. International Journal of Neural, Parallel & Scientific Computations, Member of the Editorial Board
[sedukhin-13:2006]	S. Sedukhin, May 2006. The 2006 High Performance Computing & Simulation Conference (HPC&S 2006), Program Committee Member
[sedukhin-14:2006]	S. Sedukhin, Feb. 2006. IASTED International Conference on Parallel and Distributed Computing and Networks, Program Committee Member

Ph.D, Master and Graduation Theses

[hitoshi-11:2006]	Masato Chiba. Graduation Thesis: Performance Evaluation of Dual CoreProcessors under On-Line Transaction Processing Workload, University of Aizu, 2007.
	Thesis Adviser: Hitoshi Oi
[sedukhin-15:2006]	Yuusuke Kobayashi. Graduation Thesis: An Evoluation of Four Algorithms for the Algebraic Paths Problem, University of Aizu, 2006.
	Thesis Advisor: Sedukhin, S.
[sedukhin-16:2006]	Ben Hachimori. Graduation Thesis: Parallelization of the Raytracing Algorithm with FastMATH Processor, University of Aizu, 2006.
	Thesis Advisor: Sedukhin, S.