Profilers (sampling, instrumentation)
Detection
Bottleneck Analysis
Challenge: Bottleneck Analysis faces challenges of high system complexity, unexpected real-world factors and the resource constraints when detecting.
Year
Venue
Authors
Title
Tags
P
E
N
2018
PPoPP
THU
vSensor: Leveraging Fixed-Workload Snippets of Programs for Performance Variance Detection
fixed-workload snippets; dependency propagation algorithm; lightweight on-line analysis algorithm
2020
SC
THU
ScalAna: automating scaling loss detection with graph analysis
program structure graph; program performance graph; backtracking root cause detection algorithm
2022
PPoPP
THU
Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications
state transition graph; fixed workload snippets identification clustering algorithm; variance breakdown model; time of factors quantification method
2024
arXiv
UGA
Performance Debugging through Microarchitectural Sensitivity and Causality Analysis
constraints propagation engine for causality analysis; differential analysis engine for sensitivity analysis
2024
SC
BUAA
GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems
asynchronous state transition graph; parameter-based workload estimation method; asynchronous event tracing technology
Variance Attribution
Year
Venue
Authors
Title
Tags
P
E
N
2014
ISPASS
Intel
A Top-Down Method for Performance Analysis and Counters Architecture
top-down bottleneck analysis method; frontend bound; bad speculation; retiring; backend bound
2016
TPDS
ICT
Understanding Big Data Analytics Workloads on Modern Processors
top-down analysis for big data workload; pipeline-characteristics basd performance implication analysis; BigDataBench benchmark
2019
SC
NCSU
Pinpointing Performance Inefficiencies via Lightweight Variance Profiling
function-level variance detection; stack based deep call chains maintain; on-the-fly binary analysis technique for calling context
Root Cause Analysis
Challenge: difficulties in dependency graph modeling, scalability of detection algorithm for large-scale applications.
Year
Venue
Authors
Title
Tags
P
E
N
2003
TISSEC
IBM
Clustering Intrusion Detection Alarms to Support Root Cause Analysis
attribute-oriented induction based clustering algorithm; generalized alarm analysis
2
3
2
2017
Arxiv
Intel; CA technologies
Survey on Models and Techniques for Root-Cause Analysis
deterministic/probabilistic model; RCA learning algorithms; RCA inference algorithms
4
1
1
2021
ASE
eBay
Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings
event-graph based RCA; service dependency graph; event causality graph; pagerank based root cause ranking
4
5
2
2021
ASPLOS
Cornell
Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
RPC latency decomposition model; Markov based RPC latency propagation; causal bayesian network based dependency model
3
3
2
2023
ASPLOS
Alibaba
Sleuth: A Trace-Based Root Cause Analysis System for Large-Scale Microservices with Graph Neural Networks
HDBSCAN trace clustering algorithm; GNN based dependency modeling
3
3
2
Burst Detection
Challenge: maintaining accuracy at high speed data streams, tradeoff between memory usage and detection accuracy.
Year
Venue
Authors
Title
Tags
P
E
N
2019
CloudNet
PKU
Dynamic Sketch: Efficient and Adjustable Heavy Hitter Detection for Software Packet Processing
door keeper mechanism for high memory efficiency; bucket sampling for accuracy monitoring
3
3
1
2021
SIGMOD
PKU
BurstSketch: Finding Bursts in Data Streams
running track based burst item filtering; snapshotting based burst item detection
3
3
1
2024
IFIP NPC
PKU
2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams
improved arbitration strategy for in-bucket competition; cross-bucket conflict avoidance hashing scheme
2
3
1
2024
IEEE ICDE
PKU
Scalable Overspeed Item Detection in Streams
bucket sharing based basic speedsketch algorithm; global-clock for reducing timestamp overhead; counter-flip technique for compression
3
4
2
Network Tomography
Survey
Year
Venue
Authors
Title
Tags
P
E
N
2004
STAT SCI
Berkeley
Network Tomography: Recent Developments
tomography linear model; multicast delay distribution inference; origin–destination traffic matrix inference
3
1
1
Passive Inference
Year
Venue
Authors
Title
Tags
P
E
N
2003
IMC
AT&T Laboratories
Simple Network Performance Tomography
smallest consistent failure set algorithm; seperable performance; false positive/coverage probability estimation of bad links
3
3
3
2014
ICDCS
ZJU
Domo: Passive Per-Packet Delay Tomography in Wireless Ad-hoc Networks
FIFO/order/sum-of-delays constraints for delay reconstruction; semi-definite relaxation based optimization
4
3
2
Active Inference
Year
Venue
Authors
Title
Tags
P
E
N
2022
ICASSP
UMich
Unicast-based inference of network link delay distributions using mixed finite mixture models
dirac delta based mixed finite mixture model; EM algorithm for parameter evaluation
3
2
2
2003
IEEE TSP
Rice University
Network Delay Tomography
end-to-end packet pair link delay distribution estimation; FFT based expectation-maximization acceleration algorithm
3
3
2
2021
IEEE TNSM
QMUL
Optimal Estimation of Link Delays Based on End-to-End Active Measurements
active network monitoring framework; ILP/heuristic/meta-heuristic algorithm for monitoring flows selection
3
3
2
Simulators and emulators (for software/system analysis)
Challenge: how to balance the accuracy, time cost and complexity of a simulator.
Focusing on the performance modeling for general systems. The LLM performance modeling is in the LLM Performance Modeling section.
Year
Venue
Authors
Title
Tags
P
E
N
2009
CACM
Berkeley
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
operational intensity; memory bound; compute bound
2014
IISWC
ETH Zurich
Extending the Roofline Model: Bottleneck Analysis with Microarchitectural Constraints
dag-based performance model; Tomasulo's greedy algorithm; scheduled dag based bottleneck modeling
3
4
3
2021
Intelligent Computing
Berkeley
Hierarchical Roofline Performance Analysis for Deep Learning Applications
Nsight Compute based hierarchical roofline model; FP16、FP32 extension for ERT
2025
arXiv
Google
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
per-resource throughput analysis; fine-grained performance attribution
3
2
2
Solution: LLM inference is expensive, performance modeling can help decide on the best configuration for the given system without actually running the LLM.
Year
Venue
Authors
Title
Tags
P
E
N
2024
arXiv
KAIST
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
iteration-level simulation; computation reuse optimization; heterogeneous accelerator mapping
2024
Mlsys
GIT
Vidur: A Large-Scale Simulation Framework For LLM Inference
Operation-level simulation; Using the simulator to search the best configuration for the given system
3
3
3
Year
Venue
Authors
Title
Tags
P
E
N
2025
MLSys
Cornell
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
trace-driven performance modeling and estimation toolkit; the first system to provide accurate performance models that effectively capture the execution behaviors of LLMs; modify and generate new execution graphs from existing traces
3
4
2
Benchmarking methodologies and suites
Benchmark
Solution: benchmark targeted at performance analysis and characterization.
Year
Venue
Authors
Title
Tags
P
E
N
2018
ICPP
WUSTL
Varbench: an Experimental Framework to Measure and Characterize Performance Variability
spatial/temperal variability; Resource Variability (RV) statistic
2021
IEEE Access
D-ITET
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
NDP focused workload characterization methodology; memory-bound function identification; locality-based clustering; memory bottlenecks classification