GitHub - modal-labs/devlooper: A program synthesis agent that autonomously fixes its output by running tests!

`Devlooper` is a program synthesis agent that autonomously corrects its output by running tests. The project extends smol developer by giving it access to a sandbox to run tests in, iterating until all tests pass. It uses environment "templates" to define the basic setup and test harness for a given language/framework, with current templates including React + Jest, Python, and Rust. The agent runs the test command for the environment in each iteration, and if a non-zero exit code is received, the agent passes the `stdout` and `stderr` from the sandbox to the LLM to diagnose the error.

The diagnosis is used to generate a `DebugPlan` consisting of three types of actions: inspect and fix a file, install a package in the image, and run commands in the image. Users can generate their own programs by running the program with their choice of `prompt` and `template`. The project is currently a proof of concept, with future directions including allowing feedback from users in the loop, making the debugging prompt better with relevant parts of the code, and generalizing this to more LLMs.

Key takeaways:

'devlooper' is a program synthesis agent that autonomously fixes its output by running tests. It can create a Python library that generates voronoi diagrams.
The project uses environment "templates" to define the basic setup and test harness for a given language/framework. Current templates include React + Jest, Python, and Rust.
The agent runs the test command for the environment in each iteration. If a non-zero exit code is received, the agent passes the stdout and stderr from the sandbox to the LLM to diagnose the error.
The project is a proof of concept with future directions including allowing feedback from users in the loop, making the debugging prompt better with relevant parts of the code, and generalizing this to more LLMs.

GitHub - modal-labs/devlooper: A program synthesis agent that autonomously fixes its output by running tests!

Key takeaways:

Comments (0)

Newsletter