Profilers (sampling, instrumentation)
Detection
Bottleneck Analysis
Challenge: Bottleneck Analysis faces challenges of high system complexity, unexpected real-world factors and the resource constraints when detecting.
Year
Venue
Authors
Title
Tags
P
E
N
2018
PPoPP
THU
vSensor: Leveraging Fixed-Workload Snippets of Programs for Performance Variance Detection
fixed-workload snippets; dependency propagation algorithm; lightweight on-line analysis algorithm
2020
SC
THU
ScalAna: automating scaling loss detection with graph analysis
program structure graph; program performance graph; backtracking root cause detection algorithm
2022
PPoPP
THU
Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications
state transition graph; fixed workload snippets identification clustering algorithm; variance breakdown model; time of factors quantification method
2024
arXiv
UGA
Performance Debugging through Microarchitectural Sensitivity and Causality Analysis
constraints propagation engine for causality analysis; differential analysis engine for sensitivity analysis
2024
SC
BUAA
GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems
asynchronous state transition graph; parameter-based workload estimation method; asynchronous event tracing technology
Variance Attribution
Year
Venue
Authors
Title
Tags
P
E
N
2014
ISPASS
Intel
A Top-Down Method for Performance Analysis and Counters Architecture
top-down bottleneck analysis method; frontend bound; bad speculation; retiring; backend bound
2016
TPDS
ICT
Understanding Big Data Analytics Workloads on Modern Processors
top-down analysis for big data workload; pipeline-characteristics basd performance implication analysis; BigDataBench benchmark
2019
SC
NCSU
Pinpointing Performance Inefficiencies via Lightweight Variance Profiling
function-level variance detection; stack based deep call chains maintain; on-the-fly binary analysis technique for calling context
Root Cause Analysis
Challenge: difficulties in dependency graph modeling, scalability of detection algorithm for large-scale applications.
Year
Venue
Authors
Title
Tags
P
E
N
2003
TISSEC
IBM
Clustering Intrusion Detection Alarms to Support Root Cause Analysis
attribute-oriented induction based clustering algorithm; generalized alarm analysis
2
3
2
2017
Arxiv
Intel; CA technologies
Survey on Models and Techniques for Root-Cause Analysis
deterministic/probabilistic model; RCA learning algorithms; RCA inference algorithms
4
1
1
2021
ASE
eBay
Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings
event-graph based RCA; service dependency graph; event causality graph; pagerank based root cause ranking
4
5
2
2021
ASPLOS
Cornell
Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
RPC latency decomposition model; Markov based RPC latency propagation; causal bayesian network based dependency model
3
3
2
2023
ASPLOS
Alibaba
Sleuth: A Trace-Based Root Cause Analysis System for Large-Scale Microservices with Graph Neural Networks
HDBSCAN trace clustering algorithm; GNN based dependency modeling
3
3
2
Burst Detection
Challenge: maintaining accuracy at high speed data streams, tradeoff between memory usage and detection accuracy.
Heavy Hitter Burst
Year
Venue
Authors
Title
Tags
P
E
N
2019
CloudNet
PKU
Dynamic Sketch: Efficient and Adjustable Heavy Hitter Detection for Software Packet Processing
door keeper mechanism for high memory efficiency; bucket sampling for accuracy monitoring
3
3
1
2021
SIGMOD
PKU
BurstSketch: Finding Bursts in Data Streams
running track based burst item filtering; snapshotting based burst item detection
3
3
1
2023
SIGMOD
PKU
Double-Anonymous Sketch: Achieving Top-đž-fairness for Finding Global Top-đž Frequent Items
double-anonymity technique; randomized admission policy for top-k stage; CMM sketch for count stage
3
4
2
2024
IFIP NPC
PKU
2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams
improved arbitration strategy for in-bucket competition; cross-bucket conflict avoidance hashing scheme
2
3
1
2024
IEEE ICDE
PKU
Scalable Overspeed Item Detection in Streams
bucket sharing based basic speedsketch algorithm; global-clock for reducing timestamp overhead; counter-flip technique for compression
3
4
2
Straggler Analysis
Challenge: stragglers can arise from various complex factors, identifying their root causes and quantifying their impact on performance is difficult.
Year
Venue
Authors
Title
Tags
P
E
N
2019
TSC
BUAA
Straggler Root-Cause and Impact Analysis for Massive-scale Virtualized Cloud Datacenters
detailing straggler filtration based root cause analysis; DoS-indexf for straggler detection
3
3
1
2020
TJSC
QMUL&NUDT
Tails in the cloud: a survey and taxonomy of straggler management within largeâscale cloud data centres
taxonomy of straggler causes; straggler management technique
3
1
1
2024
Arxiv
HKUST&Alibaba
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training
Bayesian online change-point detection algorithm; adaptive multi-level mitigation mechanism
4
4
2
2025
Arxiv
NYU&ByteDance
Understanding Stragglers in Large Model Training Using What-if Analysis
what-if analysis; dependency model based simulation; SMon monitoring system
3
4
2
Other Bursts
Year
Venue
Authors
Title
Tags
P
E
N
2023
CIKM
Edinburgh
Tight-Sketch: A High-Performance Sketch for Heavy Item-Oriented Data Stream Mining with Limited Memory Size
probabilistic decay strategy; differentiated eviction for cold and hot items
4
4
2
2024
INFOCOM
SCU
BurstDetector: Real-Time and Accurate Across-Period Burst Detection in High-Speed Networks
two-stage across-period burst detection; hierarchical cell for memory optimization
3
4
1
Network Tomography
Survey
Year
Venue
Authors
Title
Tags
P
E
N
2004
STAT SCI
Berkeley
Network Tomography: Recent Developments
tomography linear model; multicast delay distribution inference; originâdestination traffic matrix inference
3
1
1
Passive Inference
Year
Venue
Authors
Title
Tags
P
E
N
2003
IMC
AT&T Laboratories
Simple Network Performance Tomography
smallest consistent failure set algorithm; seperable performance; false positive/coverage probability estimation of bad links
3
3
3
2014
ICDCS
ZJU
Domo: Passive Per-Packet Delay Tomography in Wireless Ad-hoc Networks
FIFO/order/sum-of-delays constraints for delay reconstruction; semi-definite relaxation based optimization
4
3
2
Active Inference
Year
Venue
Authors
Title
Tags
P
E
N
2022
ICASSP
UMich
Unicast-based inference of network link delay distributions using mixed finite mixture models
dirac delta based mixed finite mixture model; EM algorithm for parameter evaluation
3
2
2
2003
IEEE TSP
Rice University
Network Delay Tomography
end-to-end packet pair link delay distribution estimation; FFT based expectation-maximization acceleration algorithm
3
3
2
2021
IEEE TNSM
QMUL
Optimal Estimation of Link Delays Based on End-to-End Active Measurements
active network monitoring framework; ILP/heuristic/meta-heuristic algorithm for monitoring flows selection
3
3
2
Profiling Techniques
Extended Berkeley Packet Filter
Solution: A technique used for dynamically programing the kernel for efficient networking, observability, tracing, and security.
eBPF Component Analysis
Year
Venue
Authors
Title
Tags
P
E
N
2024
eBPF
THU
Understanding Performance of eBPF Maps
eBPF map benchmark; impact of cache hotness on eBPF map; volume discount feature of eBPF program
4
4
2
2024
OSDI
ETH Zurich
Validating the eBPF Verifier via State Embedding
state embedding mechanism for eBPF verifier bug detection; SEV pratical realization
4
4
3
2025
EuroSys
UWâMadison
Revealing the Unstable Foundations of eBPF-Based Kernel Extensions
potential mismatches dataset; dependency surface/set analysis
4
4
2
eBPF Like Applications
Year
Venue
Authors
Title
Tags
P
E
N
2025
HCDS
UCSC
eGPU: Extending eBPF Programmability and Observability to GPUs
dynamic PTX injection; real-time synchronization to avoid race conditions
3
2
2
Simulators and emulators (for software/system analysis)
Challenge: how to balance the accuracy, time cost and complexity of a simulator.
Focusing on the performance modeling for general systems. The LLM performance modeling is in the LLM Performance Modeling section.
Year
Venue
Authors
Title
Tags
P
E
N
2009
CACM
Berkeley
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
operational intensity; memory bound; compute bound
2014
IISWC
ETH Zurich
Extending the Roofline Model: Bottleneck Analysis with Microarchitectural Constraints
dag-based performance model; Tomasulo's greedy algorithm; scheduled dag based bottleneck modeling
3
4
3
2021
Intelligent Computing
Berkeley
Hierarchical Roofline Performance Analysis for Deep Learning Applications
Nsight Compute based hierarchical roofline model; FP16ăFP32 extension for ERT
2025
arXiv
Google
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
per-resource throughput analysis; fine-grained performance attribution
3
2
2
Solution: LLM inference is expensive, performance modeling can help decide on the best configuration for the given system without actually running the LLM.
Year
Venue
Authors
Title
Tags
P
E
N
2024
arXiv
KAIST
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
iteration-level simulation; computation reuse optimization; heterogeneous accelerator mapping
2024
Mlsys
GIT
Vidur: A Large-Scale Simulation Framework For LLM Inference
Operation-level simulation; Using the simulator to search the best configuration for the given system
3
3
3
Year
Venue
Authors
Title
Tags
P
E
N
2025
MLSys
Cornell
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
trace-driven performance modeling and estimation toolkit; the first system to provide accurate performance models that effectively capture the execution behaviors of LLMs; modify and generate new execution graphs from existing traces
3
4
2
Benchmarking methodologies and suites
Benchmark
Solution: benchmark targeted at performance analysis and characterization.
Year
Venue
Authors
Title
Tags
P
E
N
2018
ICPP
WUSTL
Varbench: an Experimental Framework to Measure and Characterize Performance Variability
spatial/temperal variability; Resource Variability (RV) statistic
2021
IEEE Access
D-ITET
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
NDP focused workload characterization methodology; memory-bound function identification; locality-based clustering; memory bottlenecks classification
LLM Serving Benchmarks
Challenge: There is different optimize targets for different LLM serving systems. Develop a fair benchmark is crucial.
Year
Venue
Authors
Title
Tags
P
E
N
2025
arXiv
Intel
On Evaluating Performance of LLM Inference Serving Systems
introduces a practical checklist to avoid misleading benchmarks
3
3
2