Skip to content

Performance Analysis

Profilers (sampling, instrumentation)

Detection

Bottleneck Analysis

Challenge: Bottleneck Analysis faces challenges of high system complexity, unexpected real-world factors and the resource constraints when detecting.

Year Venue Authors Title Tags P E N
2018 PPoPP THU vSensor: Leveraging Fixed-Workload Snippets of Programs for Performance Variance Detection fixed-workload snippets; dependency propagation algorithm; lightweight on-line analysis algorithm
2020 SC THU ScalAna: automating scaling loss detection with graph analysis program structure graph; program performance graph; backtracking root cause detection algorithm
2022 PPoPP THU Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications state transition graph; fixed workload snippets identification clustering algorithm; variance breakdown model; time of factors quantification method
2024 arXiv UGA Performance Debugging through Microarchitectural Sensitivity and Causality Analysis constraints propagation engine for causality analysis; differential analysis engine for sensitivity analysis
2024 SC BUAA GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems asynchronous state transition graph; parameter-based workload estimation method; asynchronous event tracing technology

Variance Attribution

Year Venue Authors Title Tags P E N
2014 ISPASS Intel A Top-Down Method for Performance Analysis and Counters Architecture top-down bottleneck analysis method; frontend bound; bad speculation; retiring; backend bound
2016 TPDS ICT Understanding Big Data Analytics Workloads on Modern Processors top-down analysis for big data workload; pipeline-characteristics basd performance implication analysis; BigDataBench benchmark
2019 SC NCSU Pinpointing Performance Inefficiencies via Lightweight Variance Profiling function-level variance detection; stack based deep call chains maintain; on-the-fly binary analysis technique for calling context

Root Cause Analysis

Challenge: difficulties in dependency graph modeling, scalability of detection algorithm for large-scale applications.

Year Venue Authors Title Tags P E N
2003 TISSEC IBM Clustering Intrusion Detection Alarms to Support Root Cause Analysis attribute-oriented induction based clustering algorithm; generalized alarm analysis 2 3 2
2017 Arxiv Intel; CA technologies Survey on Models and Techniques for Root-Cause Analysis deterministic/probabilistic model; RCA learning algorithms; RCA inference algorithms 4 1 1
2021 ASE eBay Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings event-graph based RCA; service dependency graph; event causality graph; pagerank based root cause ranking 4 5 2
2021 ASPLOS Cornell Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices RPC latency decomposition model; Markov based RPC latency propagation; causal bayesian network based dependency model 3 3 2
2023 ASPLOS Alibaba Sleuth: A Trace-Based Root Cause Analysis System for Large-Scale Microservices with Graph Neural Networks HDBSCAN trace clustering algorithm; GNN based dependency modeling 3 3 2

Burst Detection

Challenge: maintaining accuracy at high speed data streams, tradeoff between memory usage and detection accuracy.

Year Venue Authors Title Tags P E N
2019 CloudNet PKU Dynamic Sketch: Efficient and Adjustable Heavy Hitter Detection for Software Packet Processing door keeper mechanism for high memory efficiency; bucket sampling for accuracy monitoring 3 3 1
2021 SIGMOD PKU BurstSketch: Finding Bursts in Data Streams running track based burst item filtering; snapshotting based burst item detection 3 3 1
2024 IFIP NPC PKU 2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams improved arbitration strategy for in-bucket competition; cross-bucket conflict avoidance hashing scheme 2 3 1
2024 IEEE ICDE PKU Scalable Overspeed Item Detection in Streams bucket sharing based basic speedsketch algorithm; global-clock for reducing timestamp overhead; counter-flip technique for compression 3 4 2

Network Tomography

Survey
Year Venue Authors Title Tags P E N
2004 STAT SCI Berkeley Network Tomography: Recent Developments tomography linear model; multicast delay distribution inference; origin–destination traffic matrix inference 3 1 1
Passive Inference
Year Venue Authors Title Tags P E N
2003 IMC AT&T Laboratories Simple Network Performance Tomography smallest consistent failure set algorithm; seperable performance; false positive/coverage probability estimation of bad links 3 3 3
2014 ICDCS ZJU Domo: Passive Per-Packet Delay Tomography in Wireless Ad-hoc Networks FIFO/order/sum-of-delays constraints for delay reconstruction; semi-definite relaxation based optimization 4 3 2
Active Inference
Year Venue Authors Title Tags P E N
2022 ICASSP UMich Unicast-based inference of network link delay distributions using mixed finite mixture models dirac delta based mixed finite mixture model; EM algorithm for parameter evaluation 3 2 2
2003 IEEE TSP Rice University Network Delay Tomography end-to-end packet pair link delay distribution estimation; FFT based expectation-maximization acceleration algorithm 3 3 2
2021 IEEE TNSM QMUL Optimal Estimation of Link Delays Based on End-to-End Active Measurements active network monitoring framework; ILP/heuristic/meta-heuristic algorithm for monitoring flows selection 3 3 2

Simulators and emulators (for software/system analysis)

Challenge: how to balance the accuracy, time cost and complexity of a simulator.

General Performance Modeling

Focusing on the performance modeling for general systems. The LLM performance modeling is in the LLM Performance Modeling section.

Year Venue Authors Title Tags P E N
2009 CACM Berkeley Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures operational intensity; memory bound; compute bound
2014 IISWC ETH Zurich Extending the Roofline Model: Bottleneck Analysis with Microarchitectural Constraints dag-based performance model; Tomasulo's greedy algorithm; scheduled dag based bottleneck modeling 3 4 3
2021 Intelligent Computing Berkeley Hierarchical Roofline Performance Analysis for Deep Learning Applications Nsight Compute based hierarchical roofline model; FP16、FP32 extension for ERT
2025 arXiv Google Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion per-resource throughput analysis; fine-grained performance attribution 3 2 2

LLM Performance Modeling

Solution: LLM inference is expensive, performance modeling can help decide on the best configuration for the given system without actually running the LLM.

LLM Serving Performance Modeling

Year Venue Authors Title Tags P E N
2024 arXiv KAIST LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale iteration-level simulation; computation reuse optimization; heterogeneous accelerator mapping
2024 Mlsys GIT Vidur: A Large-Scale Simulation Framework For LLM Inference Operation-level simulation; Using the simulator to search the best configuration for the given system 3 3 3

LLM Training Performance Modeling

Year Venue Authors Title Tags P E N
2025 MLSys Cornell Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training trace-driven performance modeling and estimation toolkit; the first system to provide accurate performance models that effectively capture the execution behaviors of LLMs; modify and generate new execution graphs from existing traces 3 4 2

Benchmarking methodologies and suites

Benchmark

Solution: benchmark targeted at performance analysis and characterization.

Year Venue Authors Title Tags P E N
2018 ICPP WUSTL Varbench: an Experimental Framework to Measure and Characterize Performance Variability spatial/temperal variability; Resource Variability (RV) statistic
2021 IEEE Access D-ITET DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks NDP focused workload characterization methodology; memory-bound function identification; locality-based clustering; memory bottlenecks classification