Solution: Use layer fusion to combine multiple layers of a neural network into a single layer. This can help reduce the number of computations and memory accesses required during inference; leading to faster execution times and lower power consumption.
Year
Venue
Authors
Title
Tags
P
E
N
2016
MICRO
SBU
Fused-Layer CNN Accelerators
fuse the processing of multiple CNN layers by modifying the order in which the input data are brought on chip
2025
TC
KU Leuven
Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
fine-grain mapping paradigm; mapping of layer-fused DNNs on heterogeneous dataflow accelerator architectures; memory- and communication-aware latency analysis; constraint optimization
2024
SOCC
IIT Hyderabad
Hardware-Aware Network Adaptation using Width and Depth Shrinking including Convolutional and Fully Connected Layer Merging
Width Shrinking: reduces the number of feature maps in CNN layers; Depth Shrinking: Merge of conv layer and fc layer
2024
ICSAI
MIT
LoopTree: Exploring the Fused-Layer Dataflow Accelerator Design Space
design space that supports set of tiling, recomputation, retention choices, and their combinations; model that validates design space
Challenge: LLM accelerators face challenges in terms of memory bandwidth; power consumption; and the need for efficient data movement.
Year
Venue
Authors
Title
Tags
P
E
N
2024
DATE
NTU
ViTA: A Highly Efficient Dataflow and Architecture for Vision Transformers
highly efficient memory-centric dataflow; fused special function module for non-linear functions; A comprehensive DSE of ViTA Kernels and VMUs
2025
arXiv
SJTU
ROMA: A Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM
hybrid ROM-SRAM architecture for on-device LLM; B-ROM design for area-efficient ROM; fused cell integration of ROM and compute unit; QLoRA rank adaptation for task-specific tuning; on-chip storage optimization for quantized models
Solution: Quantized DNN accelerators are designed to efficiently execute quantized neural networks, which use lower precision representations for weights and activations.
Year
Venue
Authors
Title
Tags
P
E
N
2018
ISCA
SNU
Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation
accelerator architecture for outlier-aware quantized models; outlier-aware low-precision computation; separate outlier MAC unit
4
3
2
2018
ISCA
Georgia Tech
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
accelerator for layer-aware quantized DNN; bit-flexible computation unit; block-structured instruction set architecture
Solution: Dataflow architecture allows the execution of instructions based on the availability of data rather than a predetermined sequence; leading to more efficient use of resources and better performance in parallel processing and real-time systems.
Year
Venue
Authors
Title
Tags
P
E
N
2019
ASPLOS
THU
Tangram: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators
Challenge: Many-core architectures are designed to handle a large number of cores; but they face challenges in terms of power consumption; performance; and resource allocation.
Year
Venue
Authors
Title
Tags
P
E
N
2015
HPCA
Cornel
Increasing Multicore System Efficiency through Intelligent Bandwidth Shifting