DryRUN framework generates code without public test cases
DryRUN is a new multi-agent code generation framework that eliminates the need for human-provided public test cases by having LLMs autonomously generate their own inputs and simulate execution traces to self-correct.
Score breakdown
Teams building agentic coding pipelines for real-world software engineering — where public test cases don't exist before implementation — can use DryRUN's approach to achieve competitive code generation quality without the manual overhead of authoring input-output examples.
- 01DryRUN is a multi-agent code generation framework that requires no human-provided public test cases or external execution feedback.
- 02Existing frameworks like CodeSIM depend on ground-truth input-output examples, restricting them to curated competitive programming benchmarks.
- 03The paper identifies an "overconfidence gap": reliance on public tests causes frameworks to overfit to simple examples and fail on hidden evaluations.
Existing multi-agent code generation frameworks that use simulation-driven planning and debugging — such as CodeSIM — depend on human-authored public test cases to ground their debugging and simulation loops. The paper identifies two core problems with this dependency: first, manually authoring comprehensive input-output examples is a labor-intensive bottleneck that restricts these methods to curated competitive programming benchmarks, since ground-truth examples are rarely available before implementation in real-world software engineering; second, reliance on public tests induces what the paper calls an "overconfidence gap," causing frameworks to overfit to simplistic examples and fail on hidden evaluations.
DryRUN addresses both problems by demonstrating that external sample inputs are not strictly necessary for code generation.
DryRUN addresses both problems by demonstrating that external sample inputs are not strictly necessary for code generation. The framework allows an LLM to iteratively plan, autonomously generate its own valid inputs, and simulate execution traces to self-correct — entirely without ground-truth samples or external execution feedback. Evaluations on the LiveCodeBench v6 dataset (post-March 2025) show that DryRUN matches the performance of CodeSIM, the state-of-the-art public-test-dependent framework, while also reducing output token consumption. This positions DryRUN as a more practical approach for real-world software engineering contexts where public test cases are not available in advance.
Key facts
- 01DryRUN is a multi-agent code generation framework that requires no human-provided public test cases or external execution feedback.
- 02Existing frameworks like CodeSIM depend on ground-truth input-output examples, restricting them to curated competitive programming benchmarks.
- 03The paper identifies an "overconfidence gap": reliance on public tests causes frameworks to overfit to simple examples and fail on hidden evaluations.
- 04DryRUN lets LLMs autonomously generate their own inputs and simulate execution traces to iteratively self-correct.
- 05DryRUN was evaluated on the LiveCodeBench v6 dataset (post-March 2025).
- 06DryRUN matches CodeSIM's performance while also reducing output token consumption.