Energy-Efficient Computing
~Low-power Architecture and Systems/SoCs~

Members: Abderazek Ben Abdallah (PI), Dang Nam Khanh (Co-PI)

Our research in power/energy-efficient computing systems is critical for addressing the growing demand for more powerful and sustainable technology. As our reliance on computing devices increases, so does the need to manage their energy consumption. Efficient computing systems can reduce energy costs, minimize environmental impact, and extend the battery life of portable devices. Moreover, in large-scale data centers, energy efficiency translates to significant savings and reduced carbon footprints. Our research in this area drives innovation in hardware and software design, leading to smarter, greener technologies that benefit both users and the planet.

Scaling Deep-Learning Pneumonia Detection Inference on a Reconfigurable Platform

Jangkun Wang, Khanh N. Dang and Abderazek Ben Abdallah, “Scaling Deep-Learning Pneumonia Detection Inference on a Reconfigurable Self-Contained Hardware Platform”, 2023 IEEE 6th International Conference on Electronics Technology (ICET), May 12-15, 2023.

Artificial Intelligence (AI) has been used in applications to alleviate specific problems in academia and industry. For instance, in healthcare, where edge-based computing platforms are heavily used, when it comes to latency and security issues, the increased demands of application of AI applications such as deep learning require a specific platform to meet the latency, security, and power consumption challenges. This work presents methods and architectures for scaling deep learning inference for pneumonia detection in chest X-ray images based on a reconfigurable self-contained hardware platform named AIRBiS 1. The performance evaluation results show that the proposed approach achieves 95.2% detection accuracy of pneumonia over the collected test data with the computer-aided diagnosis scenario. The secure collaborative-learning approach achieves comparable accuracy to the conventional training scenario. However, for rapid batch detection, the detection could be accelerated by 0.023s. Moreover, the system inference acceleration is 13 times (on average) more energy-efficient than conventional approaches.

- Wang, Jiangkun, Ogbodo Mark Ikechukwu, Khanh N. Dang, and Abderazek Ben Abdallah. 2022. "Spike-Event X-ray Image Classification for 3D-NoC-Based Neuromorphic Pneumonia Detection" Electronics 11, no. 24: 4157. https://doi.org/10.3390/electronics11244157
The success of deep learning in extending the frontiers of artificial intelligence has accelerated the application of AI-enabled systems in addressing various challenges in different fields. In healthcare, deep learning is deployed on edge computing platforms to address security and latency challenges, even though these platforms are often resource-constrained. Deep learning systems are based on conventional artificial neural networks, which are computationally complex, require high power, and have low energy efficiency, making them unsuitable for edge computing platforms. Since these systems are also used in critical applications such as bio-medicine, it is expedient that their reliability is considered when designing them. For biomedical applications, the spatio-temporal nature of information processing of spiking neural networks could be merged with a fault-tolerant 3-dimensional network on chip (3D-NoC) hardware to obtain an excellent multi-objective performance accuracy while maintaining low latency and low power consumption. In this work, we propose a reconfigurable 3D-NoC-based neuromorphic system for biomedical applications based on a fault-tolerant spike routing scheme. The performance evaluation results over X-ray images for pneumonia (i.e., COVID-19) detection show that the proposed system achieves 88.43% detection accuracy over the collected test data and could be accelerated to achieve 4.6% better inference latency than the ANN-based system while consuming 32% less power. Furthermore, the proposed system maintains high accuracy for up to 30% inter-neuron communication faults with increased latency

[Patent No.7699791] (June 20, 2025) Abderazek Ben Adballah, Hoang Huang Kun, Dang Nam Khanh, Song Janning, "ＡＩ Processor," 特願2020-194733 (2020 年11月24日) [LINK]

Embedded Multicore SoC Architecture and Design for Real-time ECG Processing

Recent technological advances in wireless networking, microelectronics and the Internet allow us to fundamentally change the way elderly health care services are practiced. Traditionally, embedded personal medical monitoring systems have been used only to collect data. Data processing and analysis are performed off-line, making such devices impractical for continual monitoring and early detection of medical disorders. The goal of this project is to research about efficient novel in-body snart embedded system to effectively monitor elderly health status remotely. In particular, we investigate an extreme area in the design space of networked embedded objects: the domain of low energy, and real-time. Issues related to the design, implementation and deployment of such systems are also studied.

Low-Power Queue Processor Architecture and Design (1999-2008)

This project focuses on the research about a novel low power and high performance parallel processor processor based on Queue computation model, where Queue programs are generated by traversing a given data flow graph using level order traversal. The Queue processor uses a circular queue-register to manipulatelates operands and results, and exploits parallelism dynamically with "little efforts" when compared with conventional architectures. The nonexistence of false dependencies allows programs to expose maximum parallelism that the queue processor can execute without complex and power-hungry hardware such as register renaming and large instruction windows. Parallel processing allows queue processors to speed-up the execution of applications. We are researching and developing a complete tool-chain for this promising computing model consisting of: compiler, assembler, functional and cycle accurate simulator, and hardware design.

A. Ben Abdallah, A. Canedo, T. Yoshinga, and M. Sowa, The QC-2 Parallel Queue Processor Architecture, Journal of Parallel and Distributed Computing, Vol. 68, No. 2, pp. 235-245, 2008.

Queue based instruction set architecture processor offers an attractive option in the design of embedded systems. In our previous work, we proposed a novel queue processor architecture as a starting point for hardware/software design space exploration for embedded applications. Inthis paper, we present a high performance 32-bit Synthesizable QueueCore (QC-2)—an improved and optimized version of the produced order parallel Queue processor (PQP), with single precision floating-point support. The QC-2 core also implements a novel technique used to extend immediate values and memory instruction offsets that were otherwise not representable because of bit-width constraints in the PQP processor. A prototype implementation is produced by synthesizing the high-level model for a target FPGA device. We present the architecture description and design results in a fair amount of details.

A. Ben Abdallah, M. Masuda, A. Canedo, K. Kuroda,Natural Instruction Level Parallelism-aware Compiler for High-Performance Processor Architecture, The Journal of supercomputing, Volume 57, Number 3, pp. 314-338, Sept. 2011.
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computationbased processor. The instructions of a queue processor implicitly read and write their operands making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs our compiler extracts more parallelism than the optimizing compiler for a RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.

A. Canedo, Abderazek Ben　Abdallah, and M. Sowa, ''Efficient Compilation for Queue Size Constrained Queue Processors, The Journal of Parallel Computing, Vol.35, pp. 213-225, 2009.
Queue computers use a FIFO data structure for data processing. The essential characteristics of a queue-based architecture excel at satisfying the demands of embedded systems, including: compact instruction set, simple hardware logic, high parallelism, and low power consumption. The size of the queue is an important concern in the design of a realizable embedded queue processor. We introduce the relationship between parallelism, length of data dependency edges in data flow graphs and the queue utilization requirements. This paper presents a technique developed to make the compiler aware of the size of the queue register file and, thus, optimize the programs to effectively utilize the available hardware. The compiler examines the data flow graph of the programs and partitions it into clusters whenever it exceeds the queue limits of the target architecture. The presented algorithm deals with the two factors that affect the utilization of the queue, namely: parallelism and the length of variables' reaching definitions. We analyze how the quality of the generated code is affected for SPEC CINT95 benchmark programs and different queue size configurations. Our results show that for reasonable queue sizes the compiler generates code that is comparable to the code generated for infinite resources in terms of instruction count, static execution time, and instruction level parallelism.

Abderazek Ben Abdallah, QueueCore Instruction Set Architecture, Technical Report, Parallel/Distributed Systems Laboratory, Graduate School of Information Systems,University of Electro-Communications, Tokyo, January 2003.
Abderazek Ben Abdallah, QC-1 Processing Stages Algorithms, Technical Report, Parallel/Distributed Systems Laboratory, Graduate School of Information Systems, University of Electro-Communications, Tokyo, 2003.
Parallel Queue Processor Project page

Energy-Efficient Computing ~Low-power Architecture and Systems/SoCs~

Scaling Deep-Learning Pneumonia Detection Inference on a Reconfigurable Platform

Embedded Multicore SoC Architecture and Design for Real-time ECG Processing

Low-Power Queue Processor Architecture and Design (1999-2008)

Energy-Efficient Computing
~Low-power Architecture and Systems/SoCs~