Skip to content

Parallel and Multi-Processor Architecture

Heterogeneous Architecture

Challenge: Classic Heterogeneous Architecture faces challenges in the data movement and memory access patterns; leading to performance bottlenecks.

Year Venue Authors Title Tags P E N
2017 TACO Intel HAShCache: Heterogeneity-Aware Shared DRAMCache for Integrated Heterogeneous Systems heterogeneity-aware DRAMCache scheduling PrIS; temporal bypass ByE; spatial occupancy control chaining
2018 ICS NC State ProfDP: A Lightweight Profiler to Guide Data Placement in Heterogeneous Memory Systems latency sensitivity; bandwidth sensitivity; moving factor based data placement
2023 HPCA THU Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking stage area and selective commit for stable block; dual-format metadata scheme; cacheline-aligned compression and two-level replacements

CPU-GPU System

Year Venue Authors Title Tags P E N
2024 arXiv KTH Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper Grace Hopper system memory characterization; integrated CPU-GPU page table analysis; first-touch policy impact study; system page size impact study; access-counter page migration evaluation 2 4 3

Disaggregated Memory

Challenge: CXL and NVM offer higher speed & bandwidth than storage devices with byte-level access. Memory disaggregation using DRAM (high-speed/BW + small capacity) and NVM (low-speed/BW + large capacity), faces latency, bandwidth, and consistency challenges.

CXL-based Disaggregated Memory

Year Venue Authors Title Tags P E N
2025 ASPLOS Yale PULSE: Accelerating Distributed Pointer-Traversals on Disaggregated Memory iterator-based programming model; disaggregated accelerator architecture; in-network routing for distributed traversal 3 4 3
2025 ASPLOS Purdue EDM: An Ultra-Low Latency Ethernet Fabric for Memory Disaggregation Ethernet PHY network stack; PHY in-network scheduler; PHY intra-frame preemption 4 4 4
2025 arXiv Micron Architectural and System Implications of CXL-enabled Tiered Memory CXL parallelism bottleneck analysis; Unfair queuing analysis; MIKU dynamic request control; ToR-based service time estimation; Hierarchical CXL throttling 4 4 3

Survey

Year Venue Authors Title Tags P E N
2025 arXiv SJTU Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters Cross-layer classification of DM techniques; hardware-level categories; architectural-level classifications; system and runtime-level groupings; application-level optimizations such as general-purpose and domain-specific approaches

Chiplets

Challenge: Current chip designs are often monolithic and inflexible; leading to high costs and limited performance optimization opportunities.

Solution: Use chiplets to enable more flexible and cost-effective system designs by allowing the integration of specialized dies manufactured using optimal processes; leading to improved performance and yield.

Survey

Year Venue Authors Title Tags P E N
2020 Electronics NUDT Chiplet Heterogeneous Integration Technology—Status and Challenges heterogeneous integration technology; interconnect interfaces and protocols; packaging technology
2022 CCF THPC ICT Survey on chiplets: interface, interconnect and integration methodology development history; interfaces and protocols; packaging technology; EDA tool; standardization of chiplet technology
2024 IEEE CASS THU Chiplet Heterogeneous Integration Technology—Status and Challenges wafer-scale chip architecture; compiler tool chain; integration technology; wafer-scale system; fault tolerance

Cost Analysis

Year Venue Authors Title Tags P E N
2025 arXiv ASU CATCH: a Cost Analysis Tool for Co-optimization of chiplet-based Heterogeneous systems heterogeneous chiplet system modeling; DSE on chiplets size,IO,connection

3D IC

Solution: 3DIC technology enables higher integration density; shorter interconnects; and improved performance by stacking multiple active layers in a single device.

General 3D IC

Year Venue Authors Title Tags P E N
2019 GLSVLSI Boston Univeristy An Overview of Thermal Challenges and Opportunities for Monolithic 3D ICs TSV-based 3D integration; Mono3D integration with nanoscale monolithic inter-tier vias; influence of lateral heat flow and inter-connection
2019 ECTC TSMC System on Integrated Chips (SoIC) for 3D Heterogeneous Integration system on integrated chips; SoIC package integration; reliability of SoIC bond,TSV and TDV
2020 DATE Georgia Tech Macro-3D: A Physical Design Methodology for Face-to-Face-Stacked Heterogeneous 3D ICs face-to-face stack; separate 2D floorplans generation; memory-on-logic projection
2022 IEEE Micro Cerebras Cerebras Architecture Deep Dive: First Look Inside the Hardware/Software Co-Design for Deep Learning fine-grained dataflow scheduling; high-bandwidth, low-latency fabric design; weight streaming

Interconnection

Year Venue Authors Title Tags P E N
2025 HPCA Fudan EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer Dual-layer interconnect architecture, Reinforcement learning routing, Switch-programmable interconnect 3 2 3

Design Space Exploration

Year Venue Authors Title Tags P E N
2025 arXiv SJTU Cool-3D: An End-to-End Thermal-Aware Framework for Early-Phase Design Space Exploration of Microfluidic-Cooled 3DICs end-to-end thermal-aware framework; microfluidic cooling integration; Pre-RTL design space exploration; floorplan designer; microfluidic cooling strategy generator

Benchmarks

Year Venue Authors Title Tags P E N
2025 arXiv NJU Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation open-source 3D-IC benchmark; modular 3D partitioning and placement; Open3D-DMP algorithm for cross-die co-placement; comprehensive PPA evaluation with thermal simulation