|
|
|
|
|
Current research at the Distributed Parallel Processing Laboratory (DPPL) encompasses:
[Stanislav G. Sedukhin]
Co-design of massively-parallel and extremely scalable algorithms and architectures for forthcoming technologies;
Topology-aware orbital computing for multilinear algebra;
Design, evaluation, and optimization of coarse-grained parallel numerical and graph algorithms for hybrid CPU/GPU-based systems.
[Hitoshi Oi]
Adaptive Resource Management in a Virtualized System
Workload and power consumption characterization of server applications using industrial standard and open-source benchmark pr
ograms, including SPEC, TPC and OSDL DBT.
Design and analysis of Java and wireless sensor network virtual machines.
System-level virtualization: performance analysis and modeling of consolidated systems
[Naohito Nakasaoto]
Development and Evaluation of High Precision Computing on May-core Accelerators
Application of GPU to Numerical Simulations in Astronomy
Development of GPU applications in OpenCL
[Veles, Oleksandr]
High Performance Computing with hybrid GPU cluster technology.
GPGPU method in astrophysical computations.
Using OpenCL for efficient calculations.
H. Daisaka, N. Nakasato, J. Makino, F. Yuasa, and T. Ishikawa. GRAPE-MP: An SIMD Accelerator Board for Multi-precision Arithmetic. Procedia Computer Science, 4:878-887, 2011.
We describe the design and performance of the GRAPE-MP board, an SIMD accelerator board for quadrupleprecision arithmetic operations. A GRAPE-MP board houses one GRAPE-MP processor chip and an FPGA chip which handles the communication with the host computer. A GRAPE-MP chip has 6 processing elements (PE) and operates with 100 MHz clock cycle. Each PE can perform one addition and one multiplication in every clock cycle. The architecture of the GRAPE-MP is similar to that of the GRAPE-DR. It is implemented using the structured ASIC chip from eASIC corp. A GRAPE-MP processor board has the theoretical peak quadruple-precision performance of 1.2 Gflops. As a preliminary result, we present the performance of the GRAPE-MP board for two target applications. The performance of the numerical integration of Feynman loop is 0.53 Gflops. The performance of a N-body simulation with the second order leapfrog schema is 0.505 Gflops for N = 1984, which is more than 10 times faster than the performance of the host computer.
K. Matsumoto, N. Nakasato, T. Sakai, H. Yahagi, and S. G. Sedukhin. Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems. Procedia Computer Science, 4:342-351, 2011.
This paper presents results of our study on double-precision general matrix-matrix multiplication (DGEMM) for GPU-equipped systems. We applied further optimization to utilize the DGEMM stream kernel previously implemented for a Cypress GPU from AMD. We have examined the effects of different memory access patterns to the performance of the DGEMM kernel by changing its layout function. The experimental results show that the GEMM kernel with X-Morton layout function superiors to the one with any other functions in terms of performance and cache hit rate. Moreover, we have implemented a DGEMM routine for large matrices, where all data cannot be allocated in a GPU memory. Our DGEMM performance achieves up to 472 GFlop/s and 921 GFlop/s on a system, using one GPU and two GPUs, respectively.
A. V. Shavrina, I. A. Mikulskaya, S. I. Kiforenko, V. A. Sheminova, A. A. Veles, and O. B. Blum. The Study of Ground-Level Ozone in Kiev and its Impact on Public Health. Kosmichna Nauka i Tekhnologiya (ISSN 1561-8889), 17(1):52-59, 2011.
Ground-level ozone in Kiev for the episode of its high contentration in August 2000 is simulated with the model of urban air pollution UAM-V.
Hitoshi Oi. Power-Performance Analysis of JVM Implementations. In Proceedings of 5th International conference on Information Technology and Multimedia (ICIM &L), pages 1-7. IEEE/CS Conference Publishing Services, November 2011.
DOI: 10.1109/ICIMU.2011.6122743
Hitoshi Oi and Kazuaki Takahashi. Performance Modeling of a Consolidated Java Application Server. In Proceedings of 2011 IEEE International Conference on High Performance Computing and Communications (HPCC2011), pages 834-838. IEEE Conference Proceedings, September 2011.
DOI: 10.1109/HPCC.2011.118
K. Matsumoto, N. Nakasato, and S. G. Sedukhin. Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System. In , 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC), pages 145-152, 2011.
This paper presents a blocked algorithm for the all-pairs shortest paths (APSP) problem for a hybrid CPU-GPU system. In the blocked APSP algorithm, the amount of data communication between CPU (host) memory and GPU memory is minimized. When a problem size (the number of vertices in a graph) is large enough compared with a blocking factor, the blocked algorithm virtually requires CPUGPU exchanging of two block matrices for a block computation on the GPU. We also estimate a required memory/communication bandwidth to utilize the GPU efficiently. On a system containing an Intel West mere CPU (Core i7 970) and an AMD Cypress GPU (Radeon HD 5870), our implementation of the blocked APSP algorithm achieves the performance up to 1 TFlop/s in single precision.
Sho Niboshi and Hitoshi Oi. Performance Analysis of SPECjEnterprise2010. In IPSJ SIG Technical Report, number IPSJ-EVA1103600, pages 1-2. Information Processing Society of Japan, 2011.
Peter Berczik, Keigo Nitadori, Shiyan Zhong, Rainer Spurzem, Tsuyoshi Hamada, Xiaowei Wang, Ingo Berentzen, Alexander Veles, and Wei Ge. High performance massively parallel direct N-body simulations on large GPU clusters. In Proceedings of International conference on High Performance Computing 2011 Kyiv, Ukraine, 2011.
Hitoshi Oi, Since 2005.
Professional Member, ACM
Hitoshi Oi, Since 2005.
Member, IEEE/Computer Society
Hitoshi Oi, Since 2009.
Academic member of the T-Engine Forum (representative for the University of Aizu). http://www.t-engine.org/
Hitoshi Oi, Since 2006.
Academic Member, EEMBC
Hitoshi Oi, Since 2009.
Senior Member, IACSIT
Hitoshi Oi, March 2012.
Hosted 37th meeting of Information Processing Society of Japan (IPSJ), Special Interest Group on System Evaluation (SIGEVA) at the University of Aizu, on March 30, 2012.
Hitoshi Oi, Since 2011.
Program committee member and chair of the special session in Network on Chip and Multi-core technologies (NMT2011). (The conference has been postponed to 2012 due to earthquake).
Hitoshi Oi, Since 2011.
Program Committee member, The 10th IEEE International Symposium on Parallel and Distributed Processing with Applications.
Yuta Suzuki. Graduation thesis, School of Computer Science and Engineering, 2012.
Thesis Adviser: N.Nakasato
Takafumi Suzuki. Graduation thesis, School of Computer Science and Engineering, 2012.
Thesis Adviser: N.Nakasato
Kou Kaimijima. Graduation thesis, School of Computer Science and Engineering, 2012.
Thesis Adviser: N.Nakasato
Kousuke Nakamura. Graduation thesis, School of Computer Science and Engineering, 2012.
Thesis Adviser: N.Nakasato
Kazuhiro Seiwa. Graduation Thesis: GPU Acceleration of Numerical Simulation of Fluid by the Lattice Boltzmann Method, University of Aizu, 2012.
Thesis Adviser: N.Nakasato
Tsuyoshi Watanabe. Graduation Thesis: Fluid Simulations in Curved Pipes using Smoothed Particle Hydrodynamics on GPU, University of Aizu, 2012.
Thesis Adviser: N.Nakasato
Hitoshi Oi.
Journal reviewer for Microprocessor and Microsystems (Elsevier) and International Journal of High Performance Systems Architecture (Inderscience Enterprises)