arxiv:2506.02048

Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges

Published on Jun 1

Authors:

Abstract

A tool-augmented Llama-3.1-8B model, fine-tuned with GRPO, demonstrates improved performance on random-crypto CTF challenges and generalizes to external datasets through enhanced Python code synthesis and tool invocation.

AI-generated summary

Large Language Models (LLMs) still struggle with the structured reasoning and tool-assisted computation needed for problem solving in cybersecurity applications. In this work, we introduce "random-crypto", a cryptographic Capture-the-Flag (CTF) challenge generator framework that we use to fine-tune a tool-augmented Llama-3.1-8B with Guided Reinforcement Prompt Optimisation (GRPO), allowing the agent to iteratively write and execute Python inside an isolated REPL. GRPO yields a +53% absolute jump in Pass@8 on unseen "random-crypto" tasks (0.35 -> 0.88) and raises Majority@8 to 0.41. The fine-tuned agent also generalizes to an external dataset. On a subset of picoCTF cryptography problems, it improves Pass@8 by +13 pp. Ablations show the gains stem from more reliable tool invocation and code synthesis, rather than superficial prompt adaptation.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.02048 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.02048 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.