SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

The paper presents SceneCraft, a Large Language Model (LLM) Agent that transforms text descriptions into Python scripts executable in Blender, capable of rendering complex 3D scenes. SceneCraft uses advanced abstraction, strategic planning, and library learning to manage spatial planning and arrangement. It first creates a scene graph blueprint, then writes Python scripts based on this graph, converting relationships into numerical constraints for asset layout. SceneCraft also uses vision-language foundation models like GPT-V to analyze and refine rendered images.

SceneCraft includes a library learning mechanism that compiles common script functions into a reusable library, allowing for continuous self-improvement without costly LLM parameter tuning. The evaluation shows that SceneCraft outperforms other LLM-based agents in rendering complex scenes, as evidenced by its adherence to constraints and positive human assessments. SceneCraft's broader applications are demonstrated by reconstructing detailed 3D scenes from the Sintel movie and guiding a video generative model with generated scenes as an intermediary control signal.

Key takeaways:

The paper introduces SceneCraft, a Large Language Model (LLM) Agent that converts text descriptions into Python scripts for rendering complex 3D scenes in Blender.
SceneCraft uses a scene graph as a blueprint to detail spatial relationships among assets, then translates these relationships into numerical constraints for asset layout.
The model leverages the perceptual strengths of vision-language foundation models like GPT-V to analyze rendered images and iteratively refine the scene.
SceneCraft also features a library learning mechanism that compiles common script functions into a reusable library, facilitating continuous self-improvement without the need for expensive LLM parameter tuning.

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

Key takeaways:

Comments (0)

Newsletter