Profilers (sampling, instrumentation)
Detection
Bottleneck Analysis
Challenge: Bottleneck Analysis faces challenges of high system complexity, unexpected real-world factors and the resource constraints when detecting.
Year
Venue
Authors
Title
Tags
P
E
N
2018
PPoPP
THU
vSensor: Leveraging Fixed-Workload Snippets of Programs for Performance Variance Detection
fixed-workload snippets; dependency propagation algorithm; lightweight on-line analysis algorithm
2020
SC
THU
ScalAna: automating scaling loss detection with graph analysis
program structure graph; program performance graph; backtracking root cause detection algorithm
2022
PPoPP
THU
Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications
state transition graph; fixed workload snippets identification clustering algorithm; variance breakdown model; time of factors quantification method
2024
arXiv
UGA
Performance Debugging through Microarchitectural Sensitivity and Causality Analysis
constraints propagation engine for causality analysis; differential analysis engine for sensitivity analysis
2024
SC
BUAA
GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems
asynchronous state transition graph; parameter-based workload estimation method; asynchronous event tracing technology
Variance Attribution
Year
Venue
Authors
Title
Tags
P
E
N
2014
ISPASS
Intel
A Top-Down Method for Performance Analysis and Counters Architecture
top-down bottleneck analysis method; frontend bound; bad speculation; retiring; backend bound
2016
TPDS
ICT
Understanding Big Data Analytics Workloads on Modern Processors
top-down analysis for big data workload; pipeline-characteristics basd performance implication analysis; BigDataBench benchmark
2019
SC
NCSU
Pinpointing Performance Inefficiencies via Lightweight Variance Profiling
function-level variance detection; stack based deep call chains maintain; on-the-fly binary analysis technique for calling context
Root Cause Analysis
Challenge: difficulties in dependency graph modeling, scalability of detection algorithm for large-scale applications.
Heuristic Approaches
Solution: trace system failures rely on dependency graphs, expert rules, or statistical correlations.
Year
Venue
Authors
Title
Tags
P
E
N
2003
TISSEC
IBM
Clustering Intrusion Detection Alarms to Support Root Cause Analysis
attribute-oriented induction based clustering algorithm; generalized alarm analysis
2
3
2
2017
Arxiv
Intel; CA technologies
Survey on Models and Techniques for Root-Cause Analysis
deterministic/probabilistic model; RCA learning algorithms; RCA inference algorithms
4
1
1
2021
ASE
eBay
Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings
event-graph based RCA; service dependency graph; event causality graph; pagerank based root cause ranking
4
5
2
Machine Learning
Solution: employ machine learning models, like graph neural networks, to automatically learn complex causal patterns.
Year
Venue
Authors
Title
Tags
P
E
N
2021
ASPLOS
Cornell
Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
RPC latency decomposition model; Markov based RPC latency propagation; causal bayesian network based dependency model
3
3
2
2023
ASPLOS
Alibaba
Sleuth: A Trace-Based Root Cause Analysis System for Large-Scale Microservices with Graph Neural Networks
HDBSCAN trace clustering algorithm; GNN based dependency modeling
3
3
2
2023
SIGKDD
UCF
Interdependent Causal Networks for Root Cause Localization
GNN based topological causal discovery; extreme value theory based individual causal discovery; causal integration
4
3
2
2023
SIGKDD
UCF
Incremental Causal Graph Learning for Online Root Cause Analysis
trigger point detection; incremental desentangled causal graph learning; random walk with restart based root cause localization
3
3
2
Burst Detection
Challenge: maintaining accuracy at high speed data streams, tradeoff between memory usage and detection accuracy.
Heavy Hitter Burst
Year
Venue
Authors
Title
Tags
P
E
N
2019
CloudNet
PKU
Dynamic Sketch: Efficient and Adjustable Heavy Hitter Detection for Software Packet Processing
door keeper mechanism for high memory efficiency; bucket sampling for accuracy monitoring
3
3
1
2021
SIGMOD
PKU
BurstSketch: Finding Bursts in Data Streams
running track based burst item filtering; snapshotting based burst item detection
3
3
1
2023
SIGMOD
PKU
Double-Anonymous Sketch: Achieving Top-đž-fairness for Finding Global Top-đž Frequent Items
double-anonymity technique; randomized admission policy for top-k stage; CMM sketch for count stage
3
4
2
2024
IFIP NPC
PKU
2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams
improved arbitration strategy for in-bucket competition; cross-bucket conflict avoidance hashing scheme
2
3
1
2024
IEEE ICDE
PKU
Scalable Overspeed Item Detection in Streams
bucket sharing based basic speedsketch algorithm; global-clock for reducing timestamp overhead; counter-flip technique for compression
3
4
2
Straggler Analysis
Challenge: stragglers can arise from various complex factors, identifying their root causes and quantifying their impact on performance is difficult.
Year
Venue
Authors
Title
Tags
P
E
N
2019
TSC
BUAA
Straggler Root-Cause and Impact Analysis for Massive-scale Virtualized Cloud Datacenters
detailing straggler filtration based root cause analysis; DoS-indexf for straggler detection
3
3
1
2020
TJSC
QMUL&NUDT
Tails in the cloud: a survey and taxonomy of straggler management within largeâscale cloud data centres
taxonomy of straggler causes; straggler management technique
3
1
1
2024
Arxiv
HKUST&Alibaba
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training
Bayesian online change-point detection algorithm; adaptive multi-level mitigation mechanism
4
4
2
2025
Arxiv
NYU&ByteDance
Understanding Stragglers in Large Model Training Using What-if Analysis
what-if analysis; dependency model based simulation; SMon monitoring system
3
4
2
Other Bursts
Year
Venue
Authors
Title
Tags
P
E
N
2023
CIKM
Edinburgh
Tight-Sketch: A High-Performance Sketch for Heavy Item-Oriented Data Stream Mining with Limited Memory Size
probabilistic decay strategy; differentiated eviction for cold and hot items
4
4
2
2024
INFOCOM
SCU
BurstDetector: Real-Time and Accurate Across-Period Burst Detection in High-Speed Networks
two-stage across-period burst detection; hierarchical cell for memory optimization
3
4
1
Network Tomography
Survey
Year
Venue
Authors
Title
Tags
P
E
N
2004
STAT SCI
Berkeley
Network Tomography: Recent Developments
tomography linear model; multicast delay distribution inference; originâdestination traffic matrix inference
3
1
1
Passive Inference
Year
Venue
Authors
Title
Tags
P
E
N
2003
IMC
AT&T Laboratories
Simple Network Performance Tomography
smallest consistent failure set algorithm; seperable performance; false positive/coverage probability estimation of bad links
3
3
3
2014
ICDCS
ZJU
Domo: Passive Per-Packet Delay Tomography in Wireless Ad-hoc Networks
FIFO/order/sum-of-delays constraints for delay reconstruction; semi-definite relaxation based optimization
4
3
2
Active Inference
Year
Venue
Authors
Title
Tags
P
E
N
2022
ICASSP
UMich
Unicast-based inference of network link delay distributions using mixed finite mixture models
dirac delta based mixed finite mixture model; EM algorithm for parameter evaluation
3
2
2
2003
IEEE TSP
Rice University
Network Delay Tomography
end-to-end packet pair link delay distribution estimation; FFT based expectation-maximization acceleration algorithm
3
3
2
2021
IEEE TNSM
QMUL
Optimal Estimation of Link Delays Based on End-to-End Active Measurements
active network monitoring framework; ILP/heuristic/meta-heuristic algorithm for monitoring flows selection
3
3
2
Profiling Techniques
Extended Berkeley Packet Filter
Solution: A technique used for dynamically programing the kernel for efficient networking, observability, tracing, and security.
eBPF Component Analysis
Year
Venue
Authors
Title
Tags
P
E
N
2024
eBPF
THU
Understanding Performance of eBPF Maps
eBPF map benchmark; impact of cache hotness on eBPF map; volume discount feature of eBPF program
4
4
2
2024
OSDI
ETH Zurich
Validating the eBPF Verifier via State Embedding
state embedding mechanism for eBPF verifier bug detection; SEV pratical realization
4
4
3
2025
EuroSys
UWâMadison
Revealing the Unstable Foundations of eBPF-Based Kernel Extensions
potential mismatches dataset; dependency surface/set analysis
4
4
2
2025
OSDI
UCSD
KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads
compiler-centric profiling infrastructure; multi-level IR instrumentation; region-based timing tool; trace replay for overhead correction
4
4
3
eBPF Like Applications
Year
Venue
Authors
Title
Tags
P
E
N
2025
HCDS
UCSC
eGPU: Extending eBPF Programmability and Observability to GPUs
dynamic PTX injection; real-time synchronization to avoid race conditions
3
2
2
Distributed Tracing
Solution: A technique used for monitoring and diagnosing errors in microserves systems by recording full request paths.
Year
Venue
Authors
Title
Tags
P
E
N
2015
SOSP
Brown
Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems
happened-beforejoin for arbitrary event correlation; dynamic instrumentation; metadata propagation technique baggage
4
3
2
2017
SOSP
Facebook&Brown
Canopy: An End-to-End Performance Tracing And Analysis System
tracing decouple for separate modeling and analyzing; trace feature extraction pipeline
4
4
2
Tracing Optimization
Challenge: Tradeoff between tracing storage overhead and the effectiveness of preserved data.
Year
Venue
Authors
Title
Tags
P
E
N
2015
SOSP
Brown
Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems
happened-beforejoin for arbitrary event correlation; dynamic instrumentation; metadata propagation technique baggage
4
3
2
2021
ICWS
SYSU
Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems
path vector encoding; attention score based biased sampler
3
3
2
2023
NSDI
Emoryâ&Princeton
The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems
retroactive trace sampling; trace coherence mechanism (breadcrumb); lateral tracing across requests
4
4
2
2025
ASPLOS
SYSU&Alibaba
Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis
commonality and variability based tracing strategy; pattern exraction at the span level and trace level
4
4
2
Diagnosis and Analysis
Challenge: How to use the trace data to accurately locate failures or bottlenecks, especially in largeâscale systems.
Year
Venue
Authors
Title
Tags
P
E
N
2017
SOSP
Facebook&Brown
Canopy: An End-to-End Performance Tracing And Analysis System
tracing decouple for separate modeling and analyzing; trace feature extraction pipeline
4
4
2
2023
NSDI
BUPT&ByteDance
Hostping: Diagnosing Intra-host Network Bottlenecks in RDMA Servers
loopback tests between RNICs and endpoints; bus utilization monitoring; binary network tomography inspired path analysis
3
4
2
Profiling with LLM
Solution: using LLMs to generate profile results.
Year
Venue
Authors
Title
Tags
P
E
N
2025
SOSP
UChicago
METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
LLM-based query profiler; rule-based configuration pruning; resource-aware joint scheduling; per-query configuration adaptation
3
4
2
Simulators and emulators (for software/system analysis)
Challenge: how to balance the accuracy, time cost and complexity of a simulator.
Focusing on the performance modeling for general systems. The LLM performance modeling is in the LLM Performance Modeling section.
Year
Venue
Authors
Title
Tags
P
E
N
2009
CACM
Berkeley
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
operational intensity; memory bound; compute bound
2014
IISWC
ETH Zurich
Extending the Roofline Model: Bottleneck Analysis with Microarchitectural Constraints
dag-based performance model; Tomasulo's greedy algorithm; scheduled dag based bottleneck modeling
3
4
3
2021
Intelligent Computing
Berkeley
Hierarchical Roofline Performance Analysis for Deep Learning Applications
Nsight Compute based hierarchical roofline model; FP16ăFP32 extension for ERT
2025
arXiv
Google
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
per-resource throughput analysis; fine-grained performance attribution
3
2
2
2025
ASPLOS
Georgia Tech
Forecasting GPU Performance for Deep Learning Training and Inference
tile-level kernal decomposition; fundamental performance laws bounded prediction; ML based utilization prediction
3
4
2
Solution: LLM inference is expensive, performance modeling can help decide on the best configuration for the given system without actually running the LLM.
Year
Venue
Authors
Title
Tags
P
E
N
2024
arXiv
KAIST
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
iteration-level simulation; computation reuse optimization; heterogeneous accelerator mapping
2024
Mlsys
GIT
Vidur: A Large-Scale Simulation Framework For LLM Inference
Operation-level simulation; Using the simulator to search the best configuration for the given system
3
3
3
Year
Venue
Authors
Title
Tags
P
E
N
2025
MLSys
Cornell
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
trace-driven performance modeling and estimation toolkit; the first system to provide accurate performance models that effectively capture the execution behaviors of LLMs; modify and generate new execution graphs from existing traces
3
4
2
Benchmarking methodologies and suites
Systematic Optimization Methodologies
Soluntion: general systematic optimization methods through benchmarking
Benchmark
Solution: benchmark targeted at performance analysis and characterization.
Year
Venue
Authors
Title
Tags
P
E
N
2018
ICPP
WUSTL
Varbench: an Experimental Framework to Measure and Characterize Performance Variability
spatial/temperal variability; Resource Variability (RV) statistic
2021
IEEE Access
D-ITET
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
NDP focused workload characterization methodology; memory-bound function identification; locality-based clustering; memory bottlenecks classification
LLM Serving Benchmarks
Challenge: There is different optimize targets for different LLM serving systems. Develop a fair benchmark is crucial.
Year
Venue
Authors
Title
Tags
P
E
N
2025
arXiv
Intel
On Evaluating Performance of LLM Inference Serving Systems
introduces a practical checklist to avoid misleading benchmarks
3
3
2