Energy-Efficient
Computing
~Low-power
Architecture and Systems/SoCs~
Members: Abderazek Ben
Abdallah (PI), Dang Nam Khanh (Co-PI)
Our research in
power/energy-efficient computing systems is
critical for addressing the growing demand for
more powerful and sustainable technology. As our
reliance on computing devices increases, so does
the need to manage their energy consumption.
Efficient computing systems can reduce energy
costs, minimize environmental impact, and extend
the battery life of portable devices. Moreover,
in large-scale data centers, energy efficiency
translates to significant savings and reduced
carbon footprints. Our research in this area
drives innovation in hardware and software
design, leading to smarter, greener technologies
that benefit both users and the planet.
Scaling
Deep-Learning Pneumonia Detection Inference on a
Reconfigurable Platform
Jangkun
Wang, Khanh N. Dang and Abderazek Ben Abdallah,
“Scaling Deep-Learning Pneumonia Detection
Inference on a Reconfigurable Self-Contained
Hardware Platform”, 2023 IEEE 6th International
Conference on Electronics Technology (ICET), May
12-15, 2023.
Artificial Intelligence (AI) has
been used in applications to alleviate specific
problems in academia and industry. For instance,
in healthcare, where edge-based computing
platforms are heavily used, when it comes to
latency and security issues, the increased demands
of application of AI applications such as deep
learning require a specific platform to meet the
latency, security, and power consumption
challenges. This work presents methods and
architectures for scaling deep learning inference
for pneumonia detection in chest X-ray images
based on a reconfigurable self-contained hardware
platform named AIRBiS 1. The performance
evaluation results show that the proposed approach
achieves 95.2% detection accuracy of pneumonia
over the collected test data with the
computer-aided diagnosis scenario. The secure
collaborative-learning approach achieves
comparable accuracy to the conventional training
scenario. However, for rapid batch detection, the
detection could be accelerated by 0.023s.
Moreover, the system inference acceleration is 13
times (on average) more energy-efficient than
conventional approaches.
|

|
- Wang, Jiangkun, Ogbodo Mark
Ikechukwu, Khanh N. Dang, and Abderazek Ben
Abdallah. 2022. "Spike-Event X-ray Image
Classification for 3D-NoC-Based Neuromorphic
Pneumonia Detection" Electronics 11, no. 24:
4157.
https://doi.org/10.3390/electronics11244157
The
success of deep learning in extending the
frontiers of artificial intelligence has
accelerated the application of AI-enabled systems
in addressing various challenges in different
fields. In healthcare, deep learning is deployed
on edge computing platforms to address security
and latency challenges, even though these
platforms are often resource-constrained. Deep
learning systems are based on conventional
artificial neural networks, which are
computationally complex, require high power, and
have low energy efficiency, making them unsuitable
for edge computing platforms. Since these systems
are also used in critical applications such as
bio-medicine, it is expedient that their
reliability is considered when designing them. For
biomedical applications, the spatio-temporal
nature of information processing of spiking neural
networks could be merged with a fault-tolerant
3-dimensional network on chip (3D-NoC) hardware to
obtain an excellent multi-objective performance
accuracy while maintaining low latency and low
power consumption. In this work, we propose a
reconfigurable 3D-NoC-based neuromorphic system
for biomedical applications based on a
fault-tolerant spike routing scheme. The
performance evaluation results over X-ray images
for pneumonia (i.e., COVID-19) detection show that
the proposed system achieves 88.43% detection
accuracy over the collected test data and could be
accelerated to achieve 4.6% better inference
latency than the ANN-based system while consuming
32% less power. Furthermore, the proposed system
maintains high accuracy for up to 30% inter-neuron
communication faults with increased latency

- Patent: Abderazek Ben Abdallah,
Huankun Huang, Nam Khanh Dang, Jiangning Song, "AIプ
ロセッサ," 特願2020-194733 (2020 年11月24日)
|
Embedded Multicore SoC
Architecture and Design for Real-time ECG Processing
Recent
technological advances in wireless networking,
microelectronics and the Internet allow us to
fundamentally change the way elderly health care
services are practiced. Traditionally, embedded personal
medical monitoring systems have been used only to
collect data. Data processing and analysis are performed
off-line, making such devices impractical for continual
monitoring and early detection of medical disorders. The
goal of this project is to research about efficient
novel in-body snart embedded system to effectively
monitor elderly health status remotely. In particular,
we investigate an extreme area in the design space of
networked embedded objects: the domain of low energy,
and real-time. Issues related to the design,
implementation and deployment of such systems are also
studied.
|
Low-Power
Queue Processor Architecture and Design (1999-2008)
This project
focuses on the research about a novel low power and high
performance parallel processor processor based on Queue
computation model, where Queue programs are generated by
traversing a given data flow graph using level order
traversal. The Queue processor uses a circular
queue-register to manipulatelates operands and results,
and exploits parallelism dynamically with "little efforts"
when compared with conventional architectures. The
nonexistence of false dependencies allows programs to
expose maximum parallelism that the queue processor can
execute without complex and power-hungry hardware such as
register renaming and large instruction windows. Parallel
processing allows queue processors to speed-up the
execution of applications. We are researching and
developing a complete tool-chain for this promising
computing model consisting of: compiler, assembler,
functional and cycle accurate simulator, and hardware
design.
A. Ben Abdallah, A. Canedo, T. Yoshinga, and M. Sowa,
The QC-2 Parallel Queue Processor Architecture, Journal
of Parallel and Distributed Computing, Vol. 68, No. 2,
pp. 235-245, 2008.
Queue based instruction set architecture
processor offers an attractive option in the design of
embedded systems. In our previous work, we proposed a
novel queue processor architecture as a starting point
for hardware/software design space exploration for
embedded applications. Inthis paper, we present a high
performance 32-bit Synthesizable QueueCore (QC-2)—an
improved and optimized version of the produced order
parallel Queue processor (PQP), with single precision
floating-point support. The QC-2 core also implements a
novel technique used to extend immediate values and
memory instruction offsets that were otherwise not
representable because of bit-width constraints in the
PQP processor. A prototype implementation is produced by
synthesizing the high-level model for a target FPGA
device. We present the architecture description and design
results in a fair amount of details.
A. Ben
Abdallah, M. Masuda, A. Canedo, K. Kuroda,Natural Instruction
Level Parallelism-aware Compiler for High-Performance
Processor Architecture, The Journal of supercomputing,
Volume 57, Number 3, pp. 314-338, Sept. 2011.
This work
presents a static method implemented in a compiler for
extracting high instruction level parallelism for the
32-bit QueueCore, a queue computationbased processor. The
instructions of a queue processor implicitly read and
write their operands making instructions short and the
programs free of false dependencies. This characteristic
allows the exploitation of maximum parallelism and
improves code density. Compiling for the QueueCore
requires a new approach since the concept of registers
disappears. We propose a new efficient code generation
algorithm for the QueueCore. For a set of numerical
benchmark programs our compiler extracts more parallelism
than the optimizing compiler for a RISC machine by a
factor of 1.38. Through the use of QueueCore’s reduced
instruction set, we are able to generate 20% and 26%
denser code than two embedded RISC processors.

A. Canedo,
Abderazek Ben Abdallah, and M. Sowa, ''Efficient
Compilation for Queue Size Constrained Queue Processors,
The Journal of Parallel Computing, Vol.35, pp. 213-225,
2009.
Queue computers use a FIFO data
structure for data processing. The essential
characteristics of a queue-based architecture excel at
satisfying the demands of embedded systems, including:
compact instruction set, simple hardware logic, high
parallelism, and low power consumption. The size of
the queue is an important concern in the design of a
realizable embedded queue processor. We introduce the
relationship between parallelism, length of data
dependency edges in data flow graphs and the queue
utilization requirements. This paper presents a
technique developed to make the compiler aware of the
size of the queue register file and, thus, optimize
the programs to effectively utilize the available
hardware. The compiler examines the data flow graph of
the programs and partitions it into clusters whenever
it exceeds the queue limits of the target
architecture. The presented algorithm deals with the
two factors that affect the utilization of the queue,
namely: parallelism and the length of variables'
reaching definitions. We analyze how the quality of
the generated code is affected for SPEC CINT95
benchmark programs and different queue size
configurations. Our results show that for reasonable
queue sizes the compiler generates code that is
comparable to the code generated for infinite
resources in terms of instruction count, static
execution time, and instruction level parallelism.
-
Abderazek
Ben Abdallah, QueueCore
Instruction Set Architecture, Technical
Report, Parallel/Distributed Systems Laboratory,
Graduate School of Information Systems,University of
Electro-Communications, Tokyo, January 2003.
-
Abderazek
Ben Abdallah, QC-1
Processing Stages Algorithms, Technical
Report, Parallel/Distributed Systems Laboratory,
Graduate School of Information Systems, University
of Electro-Communications, Tokyo, 2003.
|