✨ NeurIPS 2025 Spotlight ✨
We present LeMiCa, a training-free and efficient acceleration framework for diffusion-based video generation. While existing caching strategies primarily focus on reducing local heuristic errors, they often overlook the accumulation of global errors, leading to noticeable content degradation between accelerated and original videos. To address this issue, we formulate cache scheduling as a directed graph with error-weighted edges and introduce a Lexicographic Minimax Path Optimization strategy that explicitly bounds the worst-case path error. This approach substantially improves the consistency of global content and style across generated frames. Extensive experiments on multiple text-to-video benchmarks demonstrate that LeMiCa delivers dual improvements in both inference speed and generation quality. Notably, our method achieves a 2.9× speedup on the Latte model and reaches an LPIPS score of 0.05 on Open-Sora, outperforming prior caching techniques. Importantly, these gains come with minimal perceptual quality degradation, making LeMiCa a robust and generalizable paradigm for accelerating diffusion-based video generation. We believe this approach can serve as a strong foundation for future research on efficient and reliable video synthesis.
Existing cache methods rely on local error metrics (e.g., L1 relative distance between adjacent timesteps) and fixed thresholds to decide caching, which suffers from two key limitations:


LeMiCa formulates cache scheduling as a global optimization problem on a directed acyclic graph (DAG). Each node represents a timestep, and each edge (i → j) means full inference is done at i and j while reusing cached results in between. The goal is to control global error accumulation while reducing computation.
$$L_1^{glob}(i \rightarrow j) = \frac{1}{N} \| x^{cache(i \rightarrow j)}_0 - x^{original}_0 \|_1$$
Here, \(x^{original}_0\) is the output without caching, and \(x^{cache(i \rightarrow j)}_0\) is the output after reusing intermediate caches. This metric reflects the real downstream impact of caching rather than local step-wise differences.
Given a computation budget \(B\), LeMiCa finds a path from the start node \(s\) to the end node \(t\) that minimizes the worst error along the path:
$$\min_{P \in \mathcal{P}^{(B)}_{s \to t}} \mathrm{LexMax}\big(\mathrm{sort}_{desc}\{w(e) \mid e \in P_{cache}\}\big)$$
The lexicographic minimax strategy first reduces the largest cache error, then the next largest, ensuring smooth and consistent generation. This global planning avoids the unstable behavior of local greedy caching and achieves both speedup and fidelity.
Comparison of inference efficiency and visual quality of LeMiCa and other acceleration methods on the Qwen-Image.
@inproceedings{gao2025lemica,
title = {LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation},
author = {Huanlin Gao and Ping Chen and Fuyuan Shi and Chao Tan and Zhaoxiang Liu and Fang Zhao and Kai Wang and Shiguo Lian},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2025},
url = {https://arxiv.org/abs/2511.00090}
}