Algorithms, Theory, and Formal Methods¶
Algorithm design and analysis¶
Solution: an algorithm is a well-defined, finite sequence of steps that solves a specific problem or accomplishes a particular task. We focus on algorithms that can solving problems.
Dynamic Graph Algorithms¶
Solution: Dynamic graph algorithms efficiently update solutions to graph problems as the graph evolves, addressing the challenge of frequent changes in structure and data.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2023 | ASPLOS | UCR | CommonGraph: Graph Analytics on Evolving Data | convert deletions to additions; common graph concept; Triangular Grid (TG) for work sharing; mutation-free representation | 3 | 4 | 4 |
ML Algorithms¶
Solution: ML algorithms are fundamental tools that enable computers to learn from data and make predictions or decisions without being explicitly programmed.
Diffusion Models¶
Solution: Diffusion models are generative models that learn to reverse a gradual noising process to generate data from noise.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2023 | arXiv | UC Berkeley | Scalable Diffusion Models with Transformers | Diffusion Transformer (DiT) architecture; replace the original U-Net with transformer blocks; adaptive layer norm (adaLN-Zero) for conditioning | 3 | 5 | 5 |
Auto Regressive Models for Image¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2024 | NeurIPS | PKU | Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction | VAR modeling with next-scale prediction; multi-scale quantization for coarse-to-fine tokenization; power-law scaling laws for visual AR models | 4 | 5 | 5 |
LLM Algorithm¶
Solution: enable ai chat with human, some people think is the way to AGI.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2020 | arXiv | OpenAI | Scaling Laws for Neural Language Models | fundamentals of LLM; increase model size and performance raise | 4 | 5 | 5 |
LLM Transformer¶
Solution: Transformer is an old algorithm, which have many problems like square complexity. These problems raise new algorithms to fix the old architecture.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2019 | arXiv | Fast Transformer Decoding: One Write-Head is All You Need | MQA; share same KV cache for all heads; multi-query attention | 1 | 4 | 3 | |
| 2024 | NeuroComputing | ZhuiYi | RoFormer: Enhanced Transformer with Rotary Position Embedding | use rotary position embedding to fix the problem of long context; nter-word dependencies decay gradually with the increase of relative distance | 3 | 4 | 3 |
| 2025 | arXiv | Qwen | Parallel Scaling Law for Language Models | enhance model's parallel ability to enhance the performance instead of increasing the model size; parallel multi output and conclude one output | 4 | 4 | 4 |
Diffusion LLMs¶
Challenge: diffusion models generate result from noise, this is different from traditional AR paradigm. Diffusion LLMs need to solve the problem of the order of text logic and the generation of diffusion's random output index.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2025 | arXiv | RUC | Large Language Diffusion Models | First using diffusion LLM model to generate; Diffusion model excels at reversal reasoning; inter-block AR while in-block diffusion | 3 | 4 | 3 |
| 2025 | arXiv | HKU | Dream 7B: Diffusion Large Language Models | based on AR model's pre-train; inter-block diffusion | 3 | 3 | 3 |
| 2025 | arXiv | THU | Survey on Diffusion Language Models | survey on training strategies, inference optimization, multimodal and applications of diffusion language models | 4 | 2 | 2 |
| 2025 | arXiv | ByteDance | Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference | two-stage training with mask-based and edit-based noise; constrained-order training by filtering optimal generation paths; direct training to reduce generation steps | 3 | 3 | 5 |
| 2025 | arXiv | RUC | UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models | Diffusion-aware NTK extrapolation for RoPE; long-context post-training with adaptive attention masking | 3 | 3 | 4 |
dLLM with other technique¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2025 | arXiv | RUC | LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning | visual instruction tuning for diffusion models; multi-stage training for multimodal reasoning; visual instruction tuning for diffusion models | 2 | 3 | 4 |
| 2025 | arXiv | Nvidia | TiDAR: Think in Diffusion, Talk in Autoregression | dLLM with spec technique; use dllm-self as the draft model to accelerate the generation process | 3 | 3 | 2 |
| 2025 | arXiv | RUC | LLaDA-MoE: A Sparse MoE Diffusion Language Model | Sparse MoE masked diffusion architecture; variable-length training intervention; multi-stage annealing pipeline | 4 | 3 | 3 |
LLM Alignment¶
Solution: LLM alignment aims to make LLM outputs more consistent with user intent. Its challenges are ensuring safety, addressing multi-modal complexities, and balancing inference ability with alignment.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2024 | arXiv | SJTU | Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation | social scene simulation; emulate realistic multiparty interactions and consequences; monopolylogue | |||
| 2025 | ICLR | Princeton | Safety Alignment Should Be Made More Than Just a Few Tokens Deep | ai-savety centered alignment; enhance sacety on deeper tokens and data | 3 | 3 | 3 |
| 2025 | ACL | PKU | Language Models Resist Alignment: Evidence From Data Compression | LLM have inner resistance to alignment; the larger scale in pre-train increase the resistance | 4 | 3 | 4 |
LLM Finetune¶
Solution: finetune adapts a pre-trained model to a specific task or domain. By doing so, the model can better fit the specific task or domain.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2021 | ICLR | Miscrosoft | LoRA: Low-Rank Adaptation of Large Language Models | split the weight matrix into two parts; reduce the number of parameters to finetune | 2 | 4 | 4 |
Coding LLM Finetune¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2024 | arXiv | UMD | HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages | large synthetic parallel programming dataset; parallel code generation; HPC AI developer tools |
LLM-Powered AI Agent¶
Challenge: How to scale agent number and how to reach human level behavior.
Agent simulation in LLM manner ¶
Compared to Agent-Less-LLM, LLM manner can simulate more complex and realistic behaviors.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2018 | CVPR | MIT | VirtualHome: Simulating Household Activities via Programs | simulation framework for home agent; interaction within the home; action planning and execution | 3 | 3 | 3 |
| 2024 | arXiv | THU | LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination | hierarchical language agent; real-time human-AI coordination; slow mind & fast mind | |||
| 2025 | arXiv | THU | Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | RAG-based LLM agent; use OpenAI api to run LLM; agent num is about 50 | 3 | 3 | 2 |
| 2025 | arXiv | THU | OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents | IO optimization for agent; prompt compression for LLM | 3 | 2 | 1 |
Agent simulation in less-LLM manner ¶
Challenge: Using LLM in million size simulation is too expensive. In these papers, LLM are only used in countable manner, the majority of agents are not powered by LLM.
Compared to Agent-LLM, less-LLM manner uses more agent counts and less LLM power.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2024 | arXiv | MIT | On the limits of agency in agent-based models | GPU optimization for agent simulation; use tensor for agent status expression; optimization for on-GPU operation | 3 | 3 | 3 |
| 2025 | arXiv | THU | AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society | large simulation for limited TCP connection port; use ray for distributed execution; change information subscribe method | 4 | 3 | 2 |
LLM Agent for social simulation¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2023 | UIST | Stanford | Generative Agents: Interactive Simulacra of Human Behavior | agent socail simulation baseline; 25 agents in town | 3 | 4 | 3 |
| 2024 | TMLR | Tencent | Affordable Generative Agents | based on Generative Aggets; system optimization for LLM behavior reuse; optimization for token consumption | 3 | 3 | 2 |
| 2025 | arXiv | AiLab | OASIS: Open Agent Social Interaction Simulations with One Million Agents | agent simulation with large scale; run LLM on local; optimization for large scale agent simulation; agent num is about 1 million | 4 | 3 | 3 |
| 2025 | arXiv | RUC | YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models | vllm based LLM running; CPU overhead optimization for communication between agents; agent num is about 100000 | 3 | 3 | 2 |
| 2024 | arXiv | Fudan | From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | dividing simulations into Individual/Scenario/Society levels; modular agent architecture analysis involving Profile-Memory-Planning-Action | 3 | 3 | 3 |
LLM Agent for tool use¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2025 | arXiv | Berkeley | The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models | tool call ability benchmark | 3 | 3 | 3 |
| 2025 | arXiv | Startup | MemTool: Optimizing Short-Term Memory Management for Dynamic Tool Calling in LLM Agent Multi-Turn Conversations | remove unused tool in multi-turn conversation; use tool selector to select the most appropriate tool | 3 | 3 | 2 |
RL Algorithms¶
Solution: RL learns from rewards or penalties received without labeled data. It takes actions that interact with the environment. It can learn optimal policies in super large config space.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2015 | Nature | DeepMind | Human-level control through deep reinforcement learning | deep reinforcement learning; human-level control; playing Atari games | 5 | 5 | 3 |
| 2025 | arXiv | DeepReinforce | CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | contrastive RL-driven CUDA optimization without human priors; LLM-based CUDA kernel optimization; reward design for CUDA kernel | 4 | 4 | 2 |
DNN Training Algorithms¶
Solution: DNN training algorithms are essential for optimizing deep neural networks, enabling them to learn from data and improve their performance on various tasks. They address challenges like convergence speed, generalization, and robustness.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2017 | ICLR | Stanford | DSD: Dense-Sparse-Dense Training for Deep Neural Networks | 3 step dense-sparse-dense training | 3 | 5 | 4 |
| 2020 | NeurIPS | MIT | Differentiable Augmentation for Data-Efficient GAN Training | Differentiable Augmentation to improve data efficiency in generative adversarial networks training | 3 | 4 | 4 |
| 2020 | CVPR | NTHU | Robust Processing-In-Memory Neural Networks via Noise-Aware Normalization | noise-aware calibration in BatchNorm statistics | 3 | 3 | 3 |
| 2025 | ASPLOS | Nvidia&CMU&MIT | GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism | graph pipeline parallelism; topology-aware stage partitioning and scheduling algorithm | 4 | 3 | 2 |
| 2025 | arXiv | ZhiCun | Extending Straight-Through Estimation for Robust Neural Networks on Analog CIM Hardware | extension of STE for complex noise environments; STE-based gradient approximation strategy | 3 | 3 | 3 |
Multi-task Learning¶
Solution: Multi-task learning (MTL) is a machine learning paradigm where multiple related tasks are learned simultaneously, leveraging shared representations to improve performance across tasks.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2018 | NeurIPS | Intel | Multi-Task Learning as Multi-Objective Optimization | Frank-Wolfe-based optimizer that scales to high-dimensional problems; provide an upper bound for the MGDA(multiple-gradient descent algorithm) optimization objective | 3 | 4 | 4 |
| 2019 | NeurIPS | CUHK | Pareto Multi-Task Learning | method to decompose a MTL problem into multiple subproblems; scalable optimization algorithm to solve all constrained subproblems | 3 | 4 | 4 |
| 2021 | NeurIPS | UTexas | Conflict-Averse Gradient Descent for Multi-task learning | Conflict-Averse Gradient descent (CAGrad); reduces the conflict among gradients while provably converges to minimum average loss | 3 | 3 | 3 |
Graph Neural Network¶
Solution: Graph Neural Network (GNN) is a model that leverages the relationships between nodes and edges in graph-structured data to perform feature propagation and representation learning, enabling the capture of complex topological dependencies and structural patterns.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2020 | TITS | CSU | T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction | GCN for spatial dependence; GRU for temporal dependence; noise based perturbation analysis | 4 | 4 | 2 |
| 2020 | ICLR | Walmart | Inductive Representation Learning on Temporal Graphs | functional time encoding; temporal graph attention layer | 3 | 4 | 2 |
| 2020 | AAAI | MIT | EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs | EvolveGCN-H / EvolveGCN-O architectures; evolving graph convolution unit; model adaptation via parameter evolution | 4 | 3 | 2 |
| 2023 | ICSE | CUHK | Eadro: An End-to-End Troubleshooting Framework for Microservices on Multi-source Data | Hawkes process; dilated causal convolution; joint detection and localization via multi-task learning | 3 | 3 | 2 |
Quantization¶
Solution: Quantization are focusing on tradeoffs of accuracy and computation/memory. The challenges are how to run models in high performance and low memory/computation cost.
Adaptive Datatype¶
Solution: Adaptive datatypes aim to optimize numerical representation by dynamically adjusting to the precision and range requirements of data. The challenge lies in balancing computational efficiency, memory usage, and accuracy across diverse tasks and hardware constraints.
For LLM¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2023 | ISCA | SJTU | OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization | outlier-victim pair that sacrifices the colocated normal values to accommodate the outliers;OVP-based quantization framework and architectural implementation | 4 | 4 | 2 |
| 2023 | ICLR | ETH Zurich | GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | Arbitrary Order Insight; Lazy Batch-Updates; Cholesky Reformulation | 4 | 4 | 3 |
| 2024 | MLSys | MIT | AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | Preserving 1% Salient Weights; Protecting Salient Weights by Activation-aware Scaling; searching to scale | 4 | 4 | 4 |
| 2025 | arXiv | Rice | 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | dynamic-length float; preserving bit-for-bit identical outputs; BFloat16 exponents carry significantly less information than their allocated bit width | 4 | 4 | 4 |
| 2025 | HPCA | SJTU | M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type | group-wise quantization for both weight and KV cache; new encoding paradigm to improve information utilization in group-wise quantization; specific processing element for encoding paradigm | 4 | 4 | 2 |
| 2025 | HPCA | Cornell | BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | introduce additional asymmetry to FP by repurposing a redundant zero value with another special value; hardware accelerator design | 3 | 3 | 3 |
For Non-LLM¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2020 | CVPR | ByteDance Inc. | AdaBits: Neural Network Quantization With Adaptive Bit-Widths | joint-quantization method applied in training;Switchable Clipping Level (SCL) between layers | 4 | 3 | 3 |
| 2022 | ICLR | Snap Inc. | F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization | variance-based fixed-point format selection for weights and activations; training algorithm for fixed-point models | 3 | 3 | 2 |
| 2022 | MICRO | SJTU | ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization | fixed-length adaptive numerical data type; combines the advantages of float and int for adapting to the importance of different values within a tensor; adaptive framework that selects the best type for each tensor | |||
| 2024 | TCAD | HKU | DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference | adaptive data representation with variablelength encoding; hardware-aware quantization framework | |||
| 2024 | arXiv | Harvard | Nanoscaling Floating-Point (NxFP): NanoMantissa, Adaptive Microexponents, and Code Recycling for Direct-Cast Compression of Large Language Models | Nanoscaling Floating-Point (NxFP); NanoMantissa; Adaptive Microexponents; Code Recycling | |||
| 2025 | ISCA | SJTU | FATE: Boosting the Performance of Hyper-Dimensional Computing Intelligence with Flexible Numerical DAta TypE | dimensional fuzzing-distance importance measure; fine-grained compression framework | 4 | 3 | 4 |
General method¶
Solution: General quantization methods aim to optimize the trade-off between model accuracy and computational efficiency. Challenges include addressing layer-specific quantization errors, enhancing fault tolerance, and finding optimal bit-width configurations.
For General LLM¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2023 | ICML | MIT | SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models | offline migrates the quantization difficulty from activations to weights | 4 | 5 | 3 |
| 2024 | ISCA | SNU | Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization | “power of 2” channel decomposition rule; Tender accelerator design | 4 | 3 | 2 |
| 2025 | arXiv | PKU | Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | ternary mpGEMM library; avoid intricate bit-level manipulations; achieving lossless inference for BitNet b1.58 | |||
| 2025 | AAAI | ByteDance | ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | block-wise distribution correction and compensation scheme; bit balance strategy | 4 | 3 | 2 |
| 2025 | ICML | Huawei,THU | FlatQuant: Flatness Matters for LLM Quantization | post-training quantization method to enhance the flatness of both weights and activations in LLMs | 4 | 4 | 3 |
KV Cache specialized¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2025 | arXiv | UVa | HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference | method without dequantization; homomorphic quantization method for matrix multiplication; requantization elimination | 2 | 2 | 3 |
| 2025 | arXiv | SJTU | MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization | a non-uniform quantization algorithm based on product quantization; leverages sparse computation and asynchronous quantization; distributes quantization power unevenly across channels | 3 | 4 | 2 |
For Non-LLM¶
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2018 | AAAI | SUTD | Adaptive Quantization for Deep Neural Network | measurement to estimate the effect of parameter quantization errors in individual layers;optimization process for finding optimal quantization bit-width for each layer | 3 | 3 | 4 |
| 2020 | ISCA | SJTU | DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration | dynamic region-based quantization algorithm; sub-feature map quantization; accelerator architecture for proposing dynamic region-based quantization | 4 | 3 | 2 |
| 2021 | MLSys | Nvidia | VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference | per-vector(≈16-64 elements) scaled quantization technique; two-level scaling scheme and algorithm; modified MAC unit in accelerator | 4 | 3 | 5 |
| 2021 | ICML | Intel | Accurate Post Training Quantization With Small Calibration Sets | layer-by-layer optimization method; integer programming; para-normalization | 3 | 3 | 3 |
| 2023 | ACML | KOBE-U | A Mixed-Precision Quantization Method without Accuracy Degradation Using Semilayers | semilayers based on whether loss difference is positive or negative | 3 | 2 | 2 |
Fault Tolerance¶
Solution: Fault tolerance in quantization ensures that models remain robust and reliable despite errors or noise
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2019 | DFT | Xilinx | Efficient Error-Tolerant Quantized Neural Network Accelerators | selective channel replication; fault-aware scheduling of processing elements for folded implementations | 3 | 2 | 3 |
| 2023 | DAC | Yonsei | RQ-DNN: Reliable Quantization for Fault-tolerant Deep Neural Networks | quantization to enhance fault tolerance caused by fault in memory; quantize to bimodal | 3 | 3 | 3 |
Quantization-Aware Training¶
Solution: Quantization-aware training (QAT) is a technique that simulates the effects of quantization during the training process, allowing the model to learn to adapt to the quantization noise.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2018 | arXiv | IBM | PACT: Parameterized Clipping Activation for Quantized Neural Networks | activation quantization scheme for finding the optimal quantization scale during training | 3 | 4 | 3 |
| 2020 | ICLR | IBM | Learned Step Size Quantization | approximate the gradient to the quantizer step size; heuristic to bring the magnitude of step size updates into better balance with weight updates | 3 | 4 | 3 |
| 2022 | CVPR | HKUST | Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation | Nonuniform-to-Uniform Quantizer (N2UQ) via learning input thresholds; Generalized Straight-Through Estimator (GSTE) to tackle intractable gradient computation in N2UQ | 3 | 3 | 3 |
| 2025 | arXiv | HKU & ByteDance | Scaling Law for Quantization-Aware Training | a mathematical model for QAT quantization error | 4 | 4 | 4 |
DNN Compression¶
Solution: DNN compression aims to reduce the size and computational requirements of deep neural networks
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2016 | ICLR | Stanford | Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding | three stage pipeline: pruning, trained quantization and Huffman coding | 4 | 4 | 4 |
| 2020 | JSTSP | Fraunhofer HHI | DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks | identify set of priors in DNN; redefine CABAC's core scheme to capture priors | 3 | 5 | 3 |
Statistical Parameter Estimation¶
Solution: infer the distribution of variables using statistical methods from observed data
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 1977 | JRSSB | Harvard | Maximum Likelihood from Incomplete Data via the EM Algorithm | incomplete data; maximum likelihood expectation algorithm | 2 | 1 | 3 |
| 2016 | Big Data | LPNU | Machine Learning, Linear and Bayesian Models for Logistic Regression in Failure Detection Problems | extreme gradient boosting classifier; generalized linear model | 2 | 1 | 2 |
| 2023 | J Process Contr | UA | Modeling and Bayesian inference for processes characterized by abrupt variations | dynamic latent variable model; variational Bayesian inference framework | 3 | 2 | 2 |
Time Synchronization¶
Solution: designing appropriate synchronization strategies, and improving the performance and adaptability of discrete event simulation
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 1993 | JACM | W&M | The Cost of Conservative Synchronization in Parallel Discrete Event Simulations | windowing mechanism based conservative synchronization; lower-bound performance analysis based on stochastic modeling | 2 | 3 | 2 |
| 2002 | TPDS | Dartmouth | Composite Synchronization in Parallel Discrete-Event Simulation | composite synchronization mechanism; mathematical model based on synchronization overhead optimization | 3 | 3 | 2 |
| 2013 | PDES | MSOE | Synchronization methods in parallel and distributed discrete-event simulation | conservative/optimistic synchronization methods; chandy-misra-bryant algorithm; time warp mechanism | 3 | 1 | 1 |
Communication Optimization¶
Solution: modeling and searching the parameter space of collective communication libraries, dynamically selecting optimal configurations under real training/analyzing workloads
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2024 | SIGCOMM | UPenn&Microsoft | Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem | traffic engineering based collective communication optimization; mixed-integer linear program; A* technique for scaling | 4 | 3 | 2 |
| 2025 | NSDI | USTC | AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training | low-level performance parameters tuning; subspace division and intra-subspace coordinate descent search algorithms | 4 | 4 | 2 |
| 2025 | SC | THU | TraceFlow: Efficient Trace Analysis for Large-Scale Parallel Applications via Interaction Pattern-Aware Trace Distribution | communication skeleton tree; interaction-aware trace distribution; communication-minimized trace shuffling | 4 | 4 | 2 |
Fail-Slow Detection¶
Solution: investigating how modern distributed systems detect and mitigate fail-slow behaviors, focusing on the design of detection mechanisms, threshold policies, and recovery strategies to improve system resilience under partial performance degradation
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2019 | ATC | UChicago | IASO: A Fail-Slow Detection and Mitigation Framework for Distributed Storage Services | slowdown detection based on peer score; sub-root causes for five kinds of root causes | |||
| 2023 | FAST | SJTU & Alibaba | PERSEUS: A Fail-Slow Detection Framework for Cloud Storage Systems | outlier data detection; regression model for detection threshold; risk evaluating algorithm | 4 | 4 | 3 |
| 2025 | ASPDAC | Xiamen University | A Fail-Slow Detection Framework for HBM Devices | outlier data detection; regression model for detection threshold; risk evaluating algorithm | 2 | 4 | 2 |
| 2025 | NSDI | SJTU & UMich | One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems | adaptive detection at runtime(ADR); slow-fault injection pipeline; danger zone analysis | 4 | 4 | 2 |
Process Variation Management¶
Solution: Use software-level techniques to manage or exploit the on-chip heterogeneity caused by process variation; use adaptive resource allocation and workload management to improve overall performance, power efficiency, and reliability.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2014 | DATE | UW-Madison | Process Variation-Aware Workload Partitioning Algorithms for GPUs Supporting Spatial-Multitasking | per-SM clocking (PSMC); process variation-aware SM-to-application assignment | 3 | 3 | 2 |
| 2016 | CSUR | ORNL | A Survey of Architectural Techniques for Managing Process Variation | PV-aware processor management; specific component targeted management | 3 | 1 | 1 |
Dynamic Voltage and Frequency Scaling¶
Solution: Develop DVFS control policies to optimize the trade-offs among performance, power consumption, and thermal constraints.
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2021 | TCAD | YNU | CARTAD: Compiler-Assisted Reinforcement Learning for Thermal-Aware Task Scheduling and DVFS on Multicores | XGBoost based intrinsic feature identification; RL-Based scheduler | 3 | 3 | 2 |
| 2023 | JETCAS | Uppsala | Game-of-Life Temperature-Aware DVFS Strategy for Tile-Based Chip Many-Core Processors | GoL temperature-aware DVFS; core/un-core performance characterizer; GoL DVFS controller | 4 | 3 | 2 |
| 2024 | ISCA | IBM&CU | BlitzCoin: Fully Decentralized Hardware Power Management for Accelerator-Rich SoCs | coin exchange algorithm; unified voltage and frequency regulation | 3 | 4 | 2 |
| 2024 | DATE | KIT | Multi-Agent Reinforcement Learning for Thermally-Restricted Performance Optimization on Manycores | RL-based thermally-restricted performance optimization; multi-agent based per-core DVFS | 4 | 3 | 2 |
Data structures¶
Solution: organizing and storing data efficiently to enable fast access, modification, and processing
Dynamic Graph Processing¶
Solution: data structures for processing dynamic graphs, which are graphs that change over time.
Architecture-specific Data Structures¶
Solution: Data structures targeting specific hardware architectures
| Year | Venue | Authors | Title | Tags | P | E | N |
|---|---|---|---|---|---|---|---|
| 2023 | TKDE | PKU | An Efficient Data Structure for Dynamic Graph on GPUs | leveled packed memory array; redundancy-free top-down re-balancing method; con-concurrent strategy Opera | 4 | 4 | 3 |
| 2024 | VLDB | PKU | Towards Sufficient GPU-Accelerated Dynamic Graph Management: Survey and Experiment | topology structure; attribute storage; auxiliary structures | 4 | 4 | 2 |
Computational complexity¶
Solution: analyzing and classifying how the time and space requirements of an algorithm grow as the input size increases.
Computability theory¶
Solution: helping to identify the fundamental limits of what can be computed, regardless of time or space constraints.