Professor |
Associate Professor |
Assistant Professor |
Assistant Professor |
The title of the laboratory changed from ”Computer Education Lab.” to ”Adaptive Systems Lab.” this year. The academic area of this laboratory mainly covers computer design methodology. Educational course design for VLSI design, formal verification and reconfigurable computing are the major themes of this lab. Education:
We welcomed Prof. Ben A. Abderazek in 2007, and now, members of our laboratory are four professors, 10 master program students, and 9(B4)+15(B3) graduate thesis students. Research:
We started a new approach to high performance computing systems with reconfigurable devices collaborating with the RIKEN research group, the PROGRAPE Project. The target of this project is Desktop Supercomputing System. Desktop means low-cost with commercially available FPGAs and PCs. We have been developing interface circuits and new applications such as fluid dynamics, data mining, and others. We also started a new approach to high performance computer architecture, ”Queue Processor” based on Prof. Ben’s research background. Members of the Computer Education LaboratoryProf. Kenichi Kuroda:
Prof. Junji Kitamichi:
Prof. Abderazek Ben Abdallah:
Prof. Yuichi Okuyama:
Students :Master ProgramM2:Hayata Ikegami, Takahito Obara, Kaai Kojima, Anna Sato, Akihiro Shindo, Tomohiro Nishikawa, Takahiro Machino.M1:Fumiko Ohori, Daisuke Ohwada, Kenji Mitome. Graduation Thesis Students:B4:Yuki Omoto, Kazunori Fujita, Shuhei Igari, Mizuho Shiga, Hiroki Hoshino, Takahiro Honma, Taichi Maekawa, Masashi Masuda, Sho Yamaki.B3:Yukihiro Yoshida, Kazunori Nemoto, Shoichi Igarashi, Yasue Nagumo, Haruka Niinuma, Yasushi Haga, Reo Honjoya, Shohei Miura, Tsubasa Murohashi, Kenichi Mori, Ryuhei Morita, Yoshitomo Yabuki, Yuka Yasuda, Saeko Yoshinari, Tetsuya Watanabe. |
[benab-01:2008] |
Abderazek Ben Abdallah, Arquimedes Canedo, Tsutomo Yoshinaga,
and Masahiro Sowa. The QC-2 Parallel Queue Processor Architecture. Journal
of Parallel and Distributed Computing, 68(2):235–24, 2008. |
Queue based instruction set architecture processor offers an attractive option in the
design of embedded systems. In our previous work, we proposed a novel queue processor
architecture as a starting point for hardware/software design space exploration
for embedded applications. In this paper, we present a high performance 32-bit Synthesizable
QueueCore (QC-2)―an improved and optimized version of the produced
order parallel Queue processor (PQP), with single precision floating-point support.
The QC-2 core also implements a novel technique used to extend immediate values
and memory instruction offsets that were otherwise not representable because
of bit-width constraints in the PQP processor. A prototype implementation is produced
by synthesizing the high-level model for a target FPGA device. We present
the architecture description and design results in a fair amount of details. |
|
[benab-02:2008] |
Arquimedes Canedo, Abderazek Ben Abdallah, and Masahiro Sowa.
Compiling for Reduced Bit-Width Queue Processors. The Journal of Signal
Processing Systems, 2009. |
Embedded systems are characterized by the requirement of demanding small memory
footprint code. A popular architectural modification to improve code density in
RISC embedded processors is to use a reduced bit-width instruction set. This approach
reduces the length of the instructions to improve code size. However, having
less addressable registers by the reduced instructions, these architectures suffer a
slight performance degradation as more reduced instructions are required to execute
a given task. On the other hand, 0-operand computers such as stack and queue
machines implicitly access their source and destination operands making instructions
naturally short. Queue machines offer a highly parallel computation model, unlike the
stack model. This paper proposes a novel alternative for reducing code size by using
a queue-based reduced instruction set while retaining the high parallelism characteristics
in programs. We introduce an efficient code generation algorithm to generate
programs for our reduced instruction set. Our algorithm successfully constrains the
code to the reduced instruction set with the addition of only 4% extra code, in average.
We show that our proposed technique is able to generate about 16% more
compact code than MIPS16, 26% over ARM/Thumb, and 50% over MIPS32 code.
Furthermore, we show that our compiler is able to extract about the same parallelism
than fully optimized RISC code. |
|
[benab-03:2008] |
Mushfiq Akanda, Abderazek Ben Abdallah, and Masahiro Sowa.
Dual-Execution Mode Processor Architecture. Journal of Supercomputing,,
44(2):103–125, 2008. |
In this research work, we propose a novel embedded dual-execution mode 32-bit processor
architecture (QSP32), which supports queue and stack programming models.
The QSP32 core is based on a high performance produced order parallel queue architecture
and is targeted for applications constrained in terms of area, memory, and
power requirements. The design focuses on the ability to execute queue programs and
also to support stack programs without a considerable increase in hardware to the
base queue architecture. A prototype implementation of the processor is produced by
synthesizing the high level model for a target FPGA device. We present the architecture
description and design results in a fair amount of details. From the design and
evaluation results, the QSP32 core efficiently executes both queue and stack based
programs and achieves on average about 65 MHz speed. In addition, when compared
to the base single-mode architecture (PQP), the QSP32 core requires only about
2.41% additional hardware. Moreover, the prototype fits on a single FPGA device,
thereby eliminating the need to perform multi-chip partitioning which results in a
loss of resource efficiency. |
|
[benab-04:2008] |
Arquimedes Canedo, Abderazek Ben Abdallah, and Masahiro Sowa. A
New Code Generation Algorithm for 2-offset Producer Order Queue Computation
Model. Journal of Computer Languages, Systems & Structures,
34(4):184–194, 2008. |
Queue computing is an attractive alternative for the compulsive demand of highperformance
architectures. Code generation for queue machines has some problems
but the solutions have not been studied thoroughly. A new parallel queue computation
model, 2-offset P-Code queue computation model, is presented together with a
new code generation algorithm. The code generation algorithm takes leveled DAGs
as input and produces 2-offset P-Code assembly. We also developed a queue compiler
to evaluate the new algorithm and compiled a set of C language benchmark
programs for the 2-offset P-Code. The queue compiler generates between 8.55% less
instructions and 10.55% more instructions than an actual MIPS32 compiler for the
compiled programs. |
[benab-05:2008, kuroken-01:2008] |
Taichi Maekawa, Ben Abdallah Abderazek, and
Kenichi Kuroda. Single Instruction Dual-Execution Model Processor Architecture.
In In Proceedings of the 2008 IEEE/IFIP International Conference
on Embedded and Ubiquitous Computing (EUC 2008), pages 30–36. IEEE,
Dec. 2008. |
We present in this paper architecture and preliminary evaluation results of a novel
dual-mode processor architecture which supports queue and stack computation models
in a single core. The core is highly adaptable in both functionality and configuration.
It is based on a reduced bit produced order queue computation instruction set
architecture and functions into Queue or Stack execution models. This is achieved
via a so called dynamic switching mechanism implemented in hardware. The current
design focuses on the ability to execute Queue programs and also to support Stack
based programs without considerable increase in hardware to the base architecture.
We present the architecture description and design results in a fair amount of details. |
|
[benab-06:2008, kuroken-02:2008] |
Hiroki Hoshino, Ben Abdallah Abderazek, and
Kenichi Kuroda. Advanced Optimization and Design Issues of a 32-bit Embedded
Processor Based on Produced Order Queue Computation Model. In
In Proceedings of the 2008 IEEE/IFIP International Conference on Embedded
and Ubiquitous Computing (EUC2008), pages 16–22, IEEE, Dec. 2008. |
Queue computing based programs are generated using a so called level order traversal
that exposes all available parallelism in the programs. All instructions within the
same level are data independent from each other and are safely to be executed
in parallel. This property is leveraged by the compiler generating queue programs
with high amounts of grouped independent instructions. Thus, the hardware invests
little efforts to find parallelism. In this paper, we present various optimization and
design issues of a synthesizable queue processor architecture 1 targeted for embedded
applications. A prototype implementation is produced by synthesizing the high-level
model for a target FPGA device. |
|
[kitamiti-01:2008, kuroken-03:2008, okuyama-01:2008] |
Kenta Matsumoto, Yuichi
Okuyama, Junji Kitamichi, Ken ichi Kuroda, and Tsuyoshi Hamada.
Implementation of a Resource Sharing Machine for Reconfigurable Computing
System using FPGAs. In Proceedings of the 18th Intelligent System
Symposium (FAN2008), pages 447–450, SICE, Oct. 2008. |
The research and development of reconfigurable computing system that can reconfigure
the hardware according to the application has activated around the world. In
this research, the computer system that can share hardware resources by applications
is constructed in reconfigurable computing system using FPGAs. Therefore,
the mechanism that manages hardware resources and provides an exclusive access
control of hardware resources to users is implemented. As a result, it was confirmed
to be able to safely share hardware resources by applications. |
|
[kitamiti-02:2008, kuroken-04:2008, okuyama-02:2008] |
Takahiro Machino, Shin
ya Iwazaki, Yuichi Okuyama, Junji Kitamichi, Ken ichi Kuroda, and
Ryuichi Oka. Optimizing Two-Dimensional Continuous Dynamic Programming
for Cell Broadband Engine Processors. In Proceedings or the
2008 Japan-China Joint Workshop on Frontier of Computer Science and
Technology (FCST2008), pages 186–193, IEEE, Dec. 2008. |
Two-Dimensional Continuous Dynamic Programming (2DCDP) is a specialized DP
matching method for an image recognition, and it can be applied to many applications,
for example, object tracking, pattern matching, etc. However, the execution
time is large, and the current general purpose processor does not achieve the performance
in real-time. In this paper, we present our approach to realtime image
recognition using a Cell Broadband Engine processor (Cell processor). We optimize
the 2DCDP for the Cell processor by vectorizing using SIMD instructions, parallelizing
with multiple SPEs, dynamic branch prediction in assembly level, and so on.
Finally, the performance on the Cell processor is achieved over 15 times faster than
the performance on an Intel Xeon 5160 processor. |
|
[kuroken-05:2008] |
S.G. Sedukhin and T.Miyazakiand K.Kuroda. 3-D Toroidal Array
Processor for Multidimensional DSP Transforms. In Proc. of International 3D
System Integration Conference 2008(3D-SIC2008), pages 401–409. ASET,
May. 2008. |
We present the forward and inverse 3-D separable transforms in the form of chained
matrix-matrix multiply-add and transposition operations. Then, a new orbital
systolic-like implementation of the 3-D transforms and its inverse on a 3-D toroidal
array processor are proposed. In this implementation, a required transposition is
avoided, but each 1-D transform becomes data dependent from other coexisting 1-D
transforms. The main difference of our orbital implementation is in dimensionality
of I/O: we design an array processor under assumption when all 3-D data is immediately
available for the fine-grain massively-parallel computing and during processing
we do not destroy an integrity (space-time locality) of processed data. |
|
[kuroken-06:2008, okuyama-03:2008] |
K.Kojima, Y.Okuyama, and K.Kuroda. Arithmetic
Precision of the Generalized Hebbian Algorithm for Hardware Implementation.
In Proc. of 8th International Conference on Computer and
Information Technology (CIT2008), pages 886–890. IEEE, July 2008. |
The Principal Component Analysis (PCA) is a data mining methodology to express
multivariate data comprehensively. The PCA reduces the dimension of data set, but
its computational complexity easily gets large depending on the input factors. In
this paper, we evaluate calculation accuracy of the PCA for hardware implementation.
As a PCA learning algorithm, the Generalized Hebbian Algorithm (GHA) is
adopted under the assumption of targeting Field Programmable Gate Arrays (FPGAs).
With the aim of verification of the errors and required accuracy to reduce
necessary hardware resources, The GHA is implemented by software in C language
using input graphical images. The relationship between the three parameters, the
number of principal components, mantissa bit width, and the errors, was found by
comparing the output principal component images with the originals. This result
will be applied to the implementation of circuit on hardware. |
|
[kuroken-07:2008, okuyama-04:2008] |
D. Ohwada, Y. Okuyama, and K. Kuroda. Implementation
of a Combined Autocorrelation Method for Real-time Tissue Elasticity
Imaging on FPGA. In Proc. of 8th International Conference on Computer
and Information Technology (CIT2008), pages 891–897. IEEE, July
2008. |
Tissue elasticity imaging by ultrasound echo is expected to be a powerful tool for
tumor detection. However, elasticity imaging methods are not suitable for real-time
processing because of the enormous computation time required. In this study, a strain
distribution estimation algorithm with spatial differentiation of the displacement
distribution using the Combined Autocorrelation method (CA) is implemented. The
CA has the advantage of producing highquality strain images with application to
large displacements. Circuit implementation is realized using an Structured Function
description Language (SFL) after evaluation on a fixed-point design in SystemC. The
designed parallel-pipelined CA architecture performs real-time processing at 30 fps
on a Dini Group DNDVI DC board powered by Virtex-4 FX100 FPGA clocked at
100 MHz. |
[kitamiti-03:2008, kuroken-08, okuyama-05] |
Yuichi Okuyama, Yuka Sato, Akihiro
Shindo, Takahito Obara, Junji Kitamiti, Toshiaki Miyazaki, Kenichi Kuroda,
and Stanislav G. Sedukhin. Architectural Consideration of Processor Elements
for Matrix-Multiplication Array Processor. In Proceedings of 32nd
Parthenon WS, pages 23–28. NGO Parthenon Society, June. 2008. |
Matrix calculation appears frequently in various useful algorithms, and its acceleration
is eagerly demanded and it will dramatically help the development not only for
scientific, but also for industrial applications. This paper discusses a processing element
(PE) suitable for the matrix computation. High clock rate operation requires
increase of pipeline stages, but (1) this can induce latency rising and (2) there is
an upper limit of number of pipeline stages. We carefully reconsider a systolic array
structure and find grouping of PEs has an advantage for high performance matrix
computation suppressing latency rising. We review some ideas of matrix computation
architecture proposed till now and compare our proposal with them. |
|
[kitamiti-04:2008, kuroken-09, okuyama-06] |
Anna Satou, Yuichi Okuyama, Tsuyoshi
Hamada, Junji Kitamichi, and Ken ichi Kuroda. Acceleration of twodimensional
liquid simulation. In The Technical Report of IEICE, Vol.108,
No.220, RECONF2008 23-37, pages 1–6. IEICE, Sep. 2008. |
Simulations of behaviors of water as liquid are quite important in the fields of video
pictures and disaster analysis and prediction . These simulations require much
calculation time due to complicated evaluation governing equations.In this paper,
we propose a method to accelerate twodimensional water simulation by FPGAs.We
use the PROGRAPE (Programmable GRAPE proposed by RIKEN) as a calculation
platform.We adopt SPH (Smoothed Particle Hydrodynamics) method to simulation
of fluid behaviors because this method can simulate dynamic fluid flows with large
deformations.In the simulation by software,evaluation of the motion equation is
complex and needs much more time.Therefore we design the circuit of the motion
equations. Moreover, we evaluate the minimum calculation accuracy for circuit.
By these evaluations,we confirmed that the liquid simulation on the PROGRAPE
system is faster than that on software. |
|
[kuroken-10:2008] |
K. Kojima, Y. Okuyama, and K. Kuroda. Host Process Acceleration
of SPH Calculation in FPGA Computing. In Proceeding of IPSJ Tohoku
Branch Workshop, pages 1–3. IPSJ, Feb. 2009. |
The advanced hardware technologies realized the parallel processing by plenty of
FPGAs inexpensively, and this contributed to the acceleration of complex scientific
calculation. As the parallel processing with FPGA becomes large-scale, the acceleration
of the host process in FPGA system has been required. This paper proposes
a method to accelerate the host process when the Smoothed Particle hydrodynamics(
SPH) is executed on the FPGA system. SPH is a method to calculate fluid motion
by computing particle interaction. Considering a possibility that FPGA scales will
be large, we adopted a method which is specialized in the scalability. |
|
[kuroken-10:2008, okuyama-07] |
F. Ohori, Y. Okuyama, K. Kuroda, and T. Hamada.
Image Filter Processing Based on a Particle Simulation. In Proceeding of
32nd Parthenon WS, pages 15–21. NGO Parthenon Society, June 2008. |
Specific purpose processors are available as an accelerated calculation framework
that is less expensive than supercomputers and can be used for scientific calculations
and development of leading-edge technologies. These processors are assisting
various issues because of these inexpensive advanced hardware technologies. This
paper focuses on this advantage and presents image filter processing with one of the
specific purpose processors. Image filter processing calculation is proceeded with one
of specific purpose processors, PROGRAPE- 4 that is specialized in particle calculations.
We demonstrate the effective utilization by considering image filter processing
as inter-particle calculations. |
[benab-07:2008] |
Abderazek Ben Abdallah. MULTICORE SYSTEMS-ON-CHIP: Practical
Hardware/Software Design Issues. Number 978-90-78677-22-2,90-78677-
22-8. World Scientific Publishers, 2009. |
Conventional on-chip communication design mostly use ad-hoc approaches that fail to
meet the challenges posed by the next-generation MultiCore Systems-on-Chip (MCSoC)
designs. These major challenges include wiring delay, predictability, diverse interconnection
architectures, and power dissipation. A Network-on-Chip (NoC) paradigm
is emerging as the solution for the problems of interconnecting dozens of cores into a
single system-on-chip. However, there are many problems associated with the design
of such systems. These problems arise from non-scalable global wire delays, failure to
achieve global synchronization, and difficulties associated with non-scalable bus-based
functional interconnects. The book consists of three parts, with each part being subdivided
into four chapters. The first part deals with design and methodology issues.
The architectures used in conventional methods of MCSoCs design and custom multiprocessor
architectures are not flexible enough to meet the requirements of different
application domains and not scalable enough to meet different computation needs and
different complexities of various applications. Several chapters of the first part will emphasize
on the design techniques and methodologies. The second part covers the most
critical part of MCSoCs design ― the interconnections. One approach to addressing
the design methodologies is to adopt the so-called reusability feature to boost design
productivity. In the past years, the primitive design units evolved from transistors to
gates, finite state machines, and processor cores. The network-on-chip paradigm offers
this attractive property for the future and will be able to close the productivity gap.
The last part of this book delves into MCSoCs validations and optimizations. A more
qualitative approach of system validation is based on the use of formal techniques for
hardware design. The main advantage of formal methods is the possibility to prove
the validity of essential design requirements. As formal languages have a mathematical
foundation, it is possible to formally extract and verify these desired properties of the
complete abstract state space. Online testing techniques for identifying faults that can
lead to system failure are also surveyed. Emphasis is given to analytical redundancybased
techniques that have been developed for fault detection and isolation in the
automatic control area. Contents: * System Design and Methodology * Interconnection
Architecture for MCSoCs * System Validation and Simulation Readership: This
book, that addresses the practical issues in MCSoC design and evaluation, is of interest
for both professionals and researchers in the field of chip design. Also, the book can be
used as a textbook by lecturers and students. |
[benab-08:2008] |
Arquimedes Canedo Abderazek Ben Abdallah and Kenichi Kuroda.
Chapter XXXV: Processor for Mobile Applications, ISBN: 978-1-60566-046-
2. IGI Publishing, 2008. |
Handbook of Research on Mobile Multimedia, Second Edition, Section V: Applications
And Services |
[benab-09:2008] |
Abderazek Ben Abdallah. BANSMOM: Smart Body Area Network System
for Elderly Monitoring, 2008. |
[kitamiti-05:2008] |
Junji Kitamichi. The Telecommunications Advancement Foundation,
2009-2010. |
[benab-10:2008] |
Abderazek Ben Abdllah, 2008. Member, IEEE |
[kitamiti-06:2008] |
J. Kitamichi, 2007. Member, IEEE |
[kitamiti-07:2008] |
J. Kitamichi, 2007. Member, IPSJ |
[kitamiti-08:2008] |
J. Kitamichi, 2007. Member, IEICE |
kuroken-12:2008] |
Kenichi Kuroda, 2008. IEICE Regular member |
[kuroken-13:2008] |
Kenichi Kuroda, 2008. JSAP Regular member |
[kuroken-14:2008] |
Kenichi Kuroda, 2008. IPSJ Regular member |
[kuroken-15:2008] |
Kenichi Kuroda, 2008. Member of Management Board of PARTHENON Society (NPO) |
[kuroken-17:2007] |
Kenichi Kuroda, 2008. Member of IEICE Student Activity Support Committee |
[benab-11:2008] |
Hiroki Hoshino. Graduation Thesis: Advanced Hardware Optimization
Algorithms for High Performance Queue Processor Architecture, University of
Aizu, 2008. Thesis Advisor: Abderazek Ben Abdllah |
[benab-12:2008] |
Taichi Maekawa. Graduation Thesis: Research on Hardware Design of
Dual-Mode Processor Architecture, University of Aizu, 2008. Thesis Advisor: Abderazek Ben Abdllah |
[benab-13:2008] |
Masashis Masuda. Graduation Thesis: Graph Transformation Methods
and Theoretical Performance Evaluation of Queue Computation Model,
University of Aizu, 2008. Thesis Advisor: Abderazek Ben Abdllah |
[kuroken-17:2008] |
Takahiro Machino. Master Thesis: Optimization of 3-D Shape Reconstruction
Using 2DCDP for Cell Processors, University of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |
[kuroken-18:2008] |
Akihiro Shindo. Master Thesis: Memory Hierarchy Design of Array
Processor focusing on Locality of Operations, University of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |
[kuroken-19:2008] |
Kazunori Kojima. Master Thesis: Host Process Acceleration of SPH
Calculation in FPGA Computing,University of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |
[kuroken-20:2008] |
Takahito Obara. Master Thesis: FPGA Implementation of the
Particle-based Algorithm for Contour Tracking, University of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |
[kuroken-21:2008] |
Kaai Kojima. Master Thesis: Evaluation of ECG Attractor Analysis
using Principal Component Analysis, University of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |
[kuroken-22:2008] |
Kazunori Fujita. Graduation Thesis: Development of Embedded
System Education Environment for Programming on Embedded OS, University
of Aizu, 2008 Thesis Advisor: Kenichi Kuroda |
[kuroken-23:2008] |
Tomohiro Nishikawa. Master Thesis: LDPC Code Introduction to
Sensor Networks for Reliability Improvement, University of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |
[kuroken-24:2008] |
Kenta Matsumoto. Master Thesis: Dynamic Load Balancing for Visualization
and Computation of Liquid Motion in FPGA Computing, University
of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |
[kuroken-25:2008] |
Anna Sato. Master Thesis: HW/SW Co-design of an SPH Simulator
for Accelerated Liquid Dynamics Computation, University of Aizu, 2008. Thesis Advisor: Kenichi Kuroda |