arxiv:2603.14769

POLCA: Stochastic Generative Optimization with LLM

Published on Mar 16

· Submitted by

Allen Nie on Mar 17

Deepmind

Upvote

Authors:

Xuanfei Ren ,

Abstract

A generative language model-based framework optimizes complex systems through stochastic generative optimization with prioritized exploration and contextual aggregation, achieving efficient convergence in both deterministic and stochastic environments.

AI-generated summary

Optimizing complex systems, ranging from LLM prompts to multi-turn agents, traditionally requires labor-intensive manual iteration. We formalize this challenge as a stochastic generative optimization problem where a generative language model acts as the optimizer, guided by numerical rewards and text feedback to discover the best system. We introduce Prioritized Optimization with Local Contextual Aggregation (POLCA), a scalable framework designed to handle stochasticity in optimization -- such as noisy feedback, sampling minibatches, and stochastic system behaviors -- while effectively managing the unconstrained expansion of solution space. POLCA maintains a priority queue to manage the exploration-exploitation tradeoff, systematically tracking candidate solutions and their evaluation histories. To enhance efficiency, we integrate an varepsilon-Net mechanism to maintain parameter diversity and an LLM Summarizer to perform meta-learning across historical trials. We theoretically prove that POLCA converges to near-optimal candidate solutions under stochasticity. We evaluate our framework on diverse benchmarks, including τ-bench, HotpotQA (agent optimization), VeriBench (code translation) and KernelBench (CUDA kernel generation). Experimental results demonstrate that POLCA achieves robust, sample and time-efficient performance, consistently outperforming state-of-the-art algorithms in both deterministic and stochastic problems. The codebase for this work is publicly available at https://github.com/rlx-lab/POLCA.

View arXiv page View PDF GitHub 11 Add to collection

Community

allenanie

Paper submitter about 13 hours ago

We found that LLM-based optimization loop can benefit from Gemini embeddings -- we proposed an epsilon-net mechanism to accept/reject candidate proposals. This simple mechanism has theoretical guarantees and also performs well empirically, beating GEPA and OpenEvolve.