LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation

Abstract

We present LeMiCa, a training-free and efficient acceleration framework for diffusion-based video generation. While existing caching strategies primarily focus on reducing local heuristic errors, they often overlook the accumulation of global errors, leading to noticeable content degradation between accelerated and original videos. To address this issue, we formulate cache scheduling as a directed graph with error-weighted edges and introduce a Lexicographic Minimax Path Optimization strategy that explicitly bounds the worst-case path error. This approach substantially improves the consistency of global content and style across generated frames. Extensive experiments on multiple text-to-video benchmarks demonstrate that LeMiCa delivers dual improvements in both inference speed and generation quality. Notably, our method achieves a 2.9× speedup on the Latte model and reaches an LPIPS score of 0.05 on Open-Sora, outperforming prior caching techniques. Importantly, these gains come with minimal perceptual quality degradation, making LeMiCa a robust and generalizable paradigm for accelerating diffusion-based video generation. We believe this approach can serve as a strong foundation for future research on efficient and reliable video synthesis.

LeMiCa

Motivation: Limitations of Local-Greedy Caching

Existing cache methods rely on local error metrics (e.g., L1 relative distance between adjacent timesteps) and fixed thresholds to decide caching, which suffers from two key limitations:

Temporal Heterogeneity Ignorance: Diffusion denoising has varying noise levels and semantic importance across timesteps (early steps shape structure, late steps refine details), making uniform thresholds inaccurate.
Global Error Accumulation: Local error minimization overlooks long-term error propagation, leading to content inconsistency and quality degradation in final videos.

(a) Local vs. Global Cache Error Control

(b) Segment-wise Error Trends

Figure 2: Rethinking cache reuse in denoising diffusion via error estimation. (a) The traditional Local-Greedy ($L_1^{rel}$) strategy uses fixed thresholds on local output differences between adjacent timesteps to decide when to cache. This assumes uniform temporal sensitivity, which can be misleading—for instance, caching at $t_2$ yields lower final error than $t_1$, despite $t_1$ seeming smoother locally. This highlights the role of temporal heterogeneity. (b) Our Global Outcome-Aware (segment-wise error) strategy estimates final output error when caching outputs over segments of length $len$, starting from timestep $i$. The plot shows that early caches cause greater error, supporting an outcome-sensitive, trajectory-aware strategy over fixed local heuristics.

Implementation

LeMiCa formulates cache scheduling as a global optimization problem on a directed acyclic graph (DAG). Each node represents a timestep, and each edge (i → j) means full inference is done at i and j while reusing cached results in between. The goal is to control global error accumulation while reducing computation.

$$L_1^{glob}(i \rightarrow j) = \frac{1}{N} \| x^{cache(i \rightarrow j)}_0 - x^{original}_0 \|_1$$

Here, $x^{original}_0$ is the output without caching, and $x^{cache(i \rightarrow j)}_0$ is the output after reusing intermediate caches. This metric reflects the real downstream impact of caching rather than local step-wise differences.

Given a computation budget $B$, LeMiCa finds a path from the start node $s$ to the end node $t$ that minimizes the worst error along the path:

$$\min_{P \in \mathcal{P}^{(B)}_{s \to t}} \mathrm{LexMax}\big(\mathrm{sort}_{desc}\{w(e) \mid e \in P_{cache}\}\big)$$

The lexicographic minimax strategy first reduces the largest cache error, then the next largest, ensuring smooth and consistent generation. This global planning avoids the unstable behavior of local greedy caching and achieves both speedup and fidelity.

Evaluations

Quantitative Results

Visual Results

Open-Sora

Original

Latency: 26.54s

TeaCache-slow

Latency: 17.58s

LeMiCa-slow

Latency: 17.43s

TeaCache-fast

Latency: 12.63s

LeMiCa-fast

Latency: 10.86s

Extension to Text-to-Image

Qwen-Image

Original

Latency: 31.22s

LeMiCa (B=25)

Latency: 17.93s

LeMiCa (B=17)

Latency: 13.52s

LeMiCa (B=10)

Latency: 9.84s

Comparison of inference efficiency and visual quality of LeMiCa and other acceleration methods on the Qwen-Image.

Table comparing LeMiCa with other acceleration methods on Qwen-Image

BibTeX

    
  @inproceedings{gao2025lemica,
    title     = {LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation},
    author    = {Huanlin Gao and Ping Chen and Fuyuan Shi and Chao Tan and Zhaoxiang Liu and Fang Zhao and Kai Wang and Shiguo Lian},
    journal = {Advances in Neural Information Processing Systems (NeurIPS)},
    year      = {2025},
    url       = {https://arxiv.org/abs/2511.00090}
  }

Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation

Keyframe Comparison