Advanced On-chip
Interconnects (2D/3D,
Si-Photonics, Hybrid)
Future System-on-Chip (SoC) will contain hundreds of
components made of processor cores, DSPs, memory,
accelerators, and I/O all intergared into a single
die area of just a few square millimeters. Such
complex system/SoC will be interconnected via a
novel on-chip interconnect closer to a sophisticated
network
than to current bus-based solutions.This network
must provide high throughput and low latency while
keeping area and power consumption low. Our research
effort is about solving several design challenges to
enable such new paradigm in massively parallel
many-core systems. In particular, we are
investigating fault-tolerance, 3D-TSV integration,
photonic communication, low-power mapping
techniques, low-latency adaptive routing.
Patents/
特許
- 特願
2020-094220(特 許第/Patent No.7488989号)Abderazek
Ben Abdallah, Khanh N. Dang, "複数のTSVを
含むTSVグループが層間を接続するオンチップの3次元 システム''/A
three-dimensional system on chip in which a
TSV group including a plurality of TSVs
provided to connect between layers'',
特願2020-094220 (特許査 定受領日:2024年5月23日)
- [特
許第6846027 号] (2021.03.03) Abderazek Ben
Abdallah, ''Defect tolerance router for
network on-chip'', 特願 2016-100732号
(2016.05.19)
- [特
許第6747660号] (登録日2020.11.08), Abderazek Ben
Abdallah), ''Optical network-on-chip system
using non-block photo-switches each including
control unit, and optical network-on-chip
setup method '', 特願2015-196698号
(2015.10.02)
- [特
許第6284177号] (登録日2018.2.09), Abderazek Ben
Abdallah), ''Error resilience router,
IC using the same, and error resilience
router control method'',
特願2013-262523号 (2013.12.19)
- 特願
2017-218953(特 許第7239099号)Abderazek Ben
Abdallah, Khanh N. Dang, Masayuki Hisada, "TSV
Error Tolerant Router Device for 3D Network On
Chip," 特願 2017-218953 (2023.03.14)
- Abderazek
Ben Abdallah, Khanh N. Dang, Masayuki
Hisada, ‘‘Distance-aware Extended Parity
Product Coding for multiple faults detection
for on-chip links [三 次元ICリ
ンクにおける多重故障検出のための距離に基づく 拡張パリティ積符号], 特 願
2020-171553
- Khanh
N. Dang, Akram Ben Ahmed, Abderazek Ben
Abdallah, Xuan-Tu Tran, ‘‘HotCluster: A
thermal-aware defect recovery method for
Through-Silicon-Vias Towards Reliable 3-D ICs
systems’’, IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems
March 2021. DOI: 10.1109/TCAD.2021.3069370
Through silicon via (TSV) is considered as the
near-future solution to realize low-power and
high-performance 3D-integrated circuits (3D-ICs)
and 3D-Network-on-Chips (3D-NoCs). However, the
lifetime reliability issue of TSV due to its
fault sensitivity and the high operating
temperature of 3D-ICs, which also accelerates
the fault rate, is one of the most critical
challenges. Meanwhile, most current works focus
on detecting and correcting TSV defects after
manufacturing without considering
high-temperature nodes’ impact on lifetime
reliability. Besides, the recovery for defective
clusters is also challenging because of costly
redundancies. In this work, we present
HotCluster : a hotspot-aware self-correction
platform for clustering defects in 3D-NoCs to
help understand and tackle this problem. Wefirst
give a method to predict normalized fault rates
and place redundant TSV groups according to each
region’s fault rate. In our particular medium
fault rate (normalized to the coolest area),
HotCluster reduces about 60% of the redundancies
in comparison to the uniformly distributed
redundancies while having a higher ratio of
router working in a normal state. Furthermore,
HotCluster integrates both online (weight based)
and offline (max-flow min-cut offline method)
mapping algorithms to help the system correct
the faulty TSV clusters. The experimental
results show that both the max-flow min-cut
offline method and weight-based online mode with
a redundancy of 0.25 exhibits less than 1% of
routers disabled under 50% defect rates.
|
|
- Khanh
N. Dang, Akram Ben Ahmed, Abderazek Ben
Abdallah, X. Tran, ‘‘A thermal-aware on-line
fault tolerance method for TSV lifetime
reliability in 3D-NoC systems’’, IEEE Access,
Volume 8, pp 166642-166657, 2020.
Through-silicon-via (TSV) based 3D Integrated
Circuits (3D-IC) are one of the most advanced
architectures by providing low power
consumption, shorter wire length and smaller
footprint. However,3D-ICs confront lifetime
reliability due to high operating temperature
and interconnect reliability, especiallythe
Through-Silicon-Via (TSV), which can
significantly affect the accuracy of the
applications. In this paper,we present an online
method that supports the detection and
correction of lifetime TSV failures, named
IaSiG. By reusing the conventional recovery
method and analyzing the output syndromes, IaSiG
can determine and correct the defective TSVs.
Results show that within a group, R redundant
TSVs can fully localize and correct R defects
and support the detection of R+1 defects.
Moreover, by using G groups, it can localize up
to G×R and detect up to G × (R + 1) defects. An
implementation of IaSiG for 32-bit data in eight
groups and two redundancies has a worst-case
execution time (WCET) of 5,152 cycles while
supporting at most 16 defective TSVs (50%
localization). By integrating IaSiG onto a 3D
Network-on-Chip, we also perform a grid-search
based empirical method to insert suitable
numbers of redundancies into TSV groups. The
empirical method takes the operating temperature
as the factor of accelerated fault due to the
fact that temperature is one of the major issues
of 3D-ICs. The results show that the proposed
method can reduce the number of redundancies
from the uniform method while still maintaining
the required Mean Time to Failure
- Khanh
N. Dang, Akram Ben Ahmed, Michael Meyer,
Abderazek Ben Abdallah, and Xuan-Tu Tran, ‘’A
non-blocking non-degrading multiple defect
link test method for 3D-Networks-on-Chip,’’
IEEE Access, Vol8, pp. 59571 – 59589,
2020. DOI: 10.1109/ACCESS.2020.2982836
As one of the most promising technologies to
realize 3D Integrated Circuits (3D-ICs),
Through-Silicon-Via (TSV) acts as the
inter-layer link inside 3D Networks-on-Chip.
However, the reliability issues due to the low
yield rates and the sensitivity to thermal
hotspots and stress issues are preventing
TSV-based 3D-ICs from being widely and
efficiently used. To ensure the correctness of
TSV connections at run-time, detecting multiple
(clustering) defects is an important feature.
While Error Correction Codes are limited by a
certain number of detectable faults, using
Built-In-Self-Test (BIST) prevents the system
from operating normally during the test time.
This paper first presents a Parity Product Code
(PPC) with the ability to correct one fault and
detect, at least, two faults. Second, we present
extended PPC (EPPC) to detect multiple defects
within the links of Networks-on-Chip by using
two or more additional matrices. Furthermore, we
present the distance-aware version of EPPC to
detect multiple defects by using only one extra
matrix. The results show that the distance-aware
EPPC can detect 100% of clustering defects and
multiple random defects within two and three
cycles, respectively. The performance evaluation
for Networkon-Chip testing also shows no
degradation while providing an extremely short
response time (2-3 cycles).
- K.
N. Dang, A. B. Ahmed, A. Ben Abdallah and X.
Tran, "TSV-OCT: A Scalable Online Multiple-TSV
Defects Localization for Real-Time 3-D-IC
Systems," IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 28, no.
3, pp. 672-685, 3/2020. doi:
10.1109/TVLSI.2019.2948878.
In order to detect and localize
through-silicon-via (TSV) failures in both
manufacturing and operating phases, most of the
existing methods use a dedicated testing
mechanism with long response time and
prerequisite interruptions for online testing.
This article presents an error correction code
(ECC)-based method named “TSV on-communication
test” (TSV-OCT) to detect and localize faults
without halting the operation of TSV-based
3-D-IC systems. We first propose a statistical
detector, a method to detect open and short
defects in TSVs that work in parallel with data
transactions. Second, we propose an
isolation-and-check algorithm to enhance the
localization ability of the method. Moreover,
the Monte Carlo simulations show that the
proposed statistical detector increases ×2 the
number of detected faults when compared to
conventional ECC-based techniques. With the help
of isolation and check, TSV-OCT localizes the
number of defects up to ×4 and ×5 higher. In
addition, the response time is kept below 65000
cycles, which could be easily integrated into
real-time applications. On the other hand, an
implementation of TSV-OCT on a 3-D
Network-on-Chip (NoC) router shows no
performance degradation for testing while having
a reasonable area overhead.
- K.
N. Dang, A. B. Ahmed, Y. Okuyama and A. Ben
Abdallah, "Scalable Design Methodology and
Online Algorithm for TSV-Cluster Defects
Recovery in Highly Reliable 3D-NoC Systems,"
in IEEE Transactions on Emerging
Topics in Computing, vol. 8, no. 3, pp.
577-590, 1 July-Sept. 2020, doi:
10.1109/TETC.2017.2762407.
3D-Network-on-Chips exploit the benefits of
Network-on-Chips and 3D-Integrated Circuits
allowing them to be considered as one of the
most advanced and auspicious communication
methodologies. On the other hand, the
reliability of 3D-NoCs, due to the vulnerability
of Through Silicon Vias, remains a major
problem. Most of the existing techniques rely on
correcting the TSV defects by using redundancies
or employing routing algorithms. Nevertheless,
they are not suitable for TSV-cluster defects as
they can either lead to costly area and power
consumption overheads, or they may result in
non-minimal routing paths; thus, posing serious
threats to the system reliability and overall
performance. In this work, we present a scalable
and low-overhead TSV usage and design method for
3D-NoC systems where the TSVs of a router can be
utilized by its neighbors to deal with the
cluster open defects. An adaptive online
algorithm is also introduced to assist the
proposed system to immediately work around the
newly detected defects without using
redundancies. The experimental results show the
proposal ensure less than 2 percent of the
routers being disabled, even with 50 percent of
the TSV clusters defects. The performance
evaluations also demonstrate unchanged
performances for real applications under 5
percent of cluster defects.
- Khanh
N. Dang, Akram Ben Ahmed, Xuan-Tu Tran, Yuichi
Okuyama, Abderazek Ben Abdallah, ”A
Comprehensive Reliability Assessment of
Fault-Resilient Network-on-Chip Using
Analytical Model”, IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, Vol.
25, Issue: 11, pp. 3099 – 3112, vol.
2017. DOI: 10.1109/TVLSI.2017.2736004
The component's failure in network-on-chips
(NoCs) has been a critical factor on the
system's reliability. In order to alleviate the
impact of faults, fault tolerance has been
investigated in the recent years to enhance
NoC's robustness. Due to the vast selection of
fault-tolerance mechanisms and critical design
constraints, selecting and configuring an
appropriate mechanism to satisfy the
fault-tolerance requirements constitute new
challenges for designers. Consequently,
reliability assessment has become prominent for
the early stages of manufacturing process to
solve these problems. This paper approaches the
fault-tolerance analysis by providing an
analytical model to approximate the lifetime
reliability and compares it with a system-level
simulation. Based on the proposed approach, we
measure the fault-tolerance efficiency using a
new parameter, named reliability acceleration
factor. The goal of this paper is to provide an
efficient and accurate reliability assessment to
help designers easily understand and evaluate
the advantages and drawbacks of their potential
fault-tolerance methods.
- Achraf
Ben Ahmed, Tsutomu Yoshinaga, Abderazek Ben
Abdallah, “Scalable Photonic Networks-on-Chip
Architecture Based on a Novel Wavelength-Shifting
Mechanism”, IEEE Transactions on
Emerging Topics in Computing, 2017. DOI:
10.1109/TETC.2017.2737016
Since Photonic Networks-on-Chip (PNoCs) were
proposed, there was an unanimity about the
benefits that photonic links could bring to the
on-chip interconnection. However, a debate
always takes place regarding the suitable
architecture and routing scheme to be used. This
debate concerns the use of fully photonic PNoC
or an Electro-assisted one. Both schemes have
their pros and cons, but the main drawback in
both architectures is their scalability. We
propose in this paper an alternative to these
two conventional PNoC architectures. Our
proposed system is based on a novel
Wavelength-Shifting mechanism, which combines
the benefits of the previously mentioned schemes
while limiting their drawbacks. The proposed
system was validated by an analytical model, in
addition to a set of simulations using synthetic
and realistic traffic patterns. Evaluation
results show that compared to the
electro-assisted architectures, we could enhance
the latency, power, and bandwidth by an order of
magnitude, reaching a performance similar to the
fully photonic architecture. In addition, the
number of used photonic devices still much lower
than the one used in conventional fully photonic
architectures by an average of 60 percent.
Furthermore, the new wavelength-shifting
mechanism is highly scalable, and it is not
affected anymore by the communication's
distance, nor the traffic pattern, which make it
a promising solution to replace existing
conventional architectures.
- kram Ben Ahmed, Abderazek Ben Abdallah, ”Adaptive
Fault-Tolerant Architecture and Routing
Algorithm for Reliable Many-Core 3D-NoC Systems”,
Journal of Parallel and Distributed Computing,
Volumes 93–94, July 2016, Pages 30-43, ISSN
0743-7315, doi:10.1016/j.jpdc.2016.03.014
During the last few decades, Three-dimensional
Network-on-Chips (3D-NoCs) have been showing their
advantages against 2D-NoC architectures. This is
thanks to the reduced average interconnect length
and lower interconnect-power consumption inherited
from Three-dimensional Integrated Circuits
(3D-ICs). On the other hand, questions about their
reliability is starting to arise. This issue is
mainly caused by their complex nature where a
single faulty transistor may cause intolerable
performance degradation or even the entire system
collapse. To ensure their correct functionality,
3D-NoC systems must be fault-tolerant to any
short-term malfunction or permanent physical
damage to ensure message delivery on time while
minimizing the performance degradation as much as
possible.In this paper, we present a
fault-tolerant 3D-NoC architecture, called
3D-Fault-Tolerant-OASIS (3D-FTO).11This project is
partially supported by Competitive research
funding, Ref. P1-5, Fukushima, Japan. With the aid
of a light-weight routing algorithm, 3D-FTO
manages to avoid the system failure at the
presence of a large number of transient,
intermittent, and permanent faults. Moreover, the
proposed architecture is leveraging on
reconfigurable components to handle the fault
occurrence in links, input-buffers, and crossbar,
where the faults are more often to happen. The
proposed 3D-FTO system is able to work around
different kinds of faults ensuring graceful
performance degradation while minimizing the
additional hardware complexity and remaining
power-efficient. Adaptive fault-tolerant
3D-Network-on-Chip system architecture.RAB
mechanism for deadlock recovery and
fault-tolerance in
input-buffers.Traffic-Prediction-Unit technique
for congestion relief.Bypass-Link-on-Demand to
tackle fault-occurrence in the
Crossbar.Fault-tolerance and graceful performance
degradation obtained at high fault-rates.
|
|
- Akram
Ben Ahmed, A. Ben Abdallah,”Graceful
Deadlock-Free Fault-Tolerant Routing Algorithm
for 3D Network-on-Chip Architectures”,
Journal of Parallel and Distributed Computing,
74/4 (2014), pp. 2229-2240.
Three-Dimensional Networks-on-Chip (3D-NoC) has
been presented as an auspicious solution merging
the high parallelism of Network-on-Chip (NoC)
interconnect paradigm with the high-performance
and lower interconnect-power of 3-dimensional
integration circuits. However, 3D-NoC systems
are exposed to a variety of manufacturing and
design factors making them vulnerable to
different faults that cause corrupted message
transfer or even catastrophic system failures.
Therefore, a 3D-NoC system should be
fault-tolerant to transient malfunctions or
permanent physical damages. In this work, we
present an efficient fault-tolerant routing
algorithm, called
Hybrid-Look-Ahead-Fault-Tolerant (HLAFT), which
takes advantage of both local and look-ahead
routing to boost the performance of 3D-NoC
systems while ensuring fault-tolerance. A
deadlock-recovery technique associated with
HLAFT, named Random-Access-Buffer (RAB), is also
presented. RAB takes advantage of look-ahead
routing to detect and remove deadlock with no
considerably additional hardware complexity. We
implemented the proposed algorithm and
deadlock-recovery technique on a real 3D-NoC
architecture (3D-OASIS-NoC1) and prototyped it
on FPGA. Evaluation results show that the
proposed algorithm performs better than XYZ,
even when considering high fault-rates (i.e.,
20%), and outperforms our previously designed
Look-Ahead-Fault-Tolerant routing (LAFT)
demonstrated in latency/flit reduction that can
reach 12.5% and a throughput enhancement
reaching 11.8% in addition to 7.2% dynamic-power
saving thanks to the Power-management module
integrated with HLAFT.
|