Author | Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfel Translator | Linstancy Editor | Produced by one | AI Technology Base Camp Abstract Physical Structure is constructed with some functions based on the principles of physical dynamics.

2025/05/3114:10:35 hotcomm 1345

Author | Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfel

Translator | Linstancy

Edit | Yiyi

Produced by

| AI Technology Base Camp (ID: rgznai100)

Abstract

Physical construction is the ability to construct objects with some functions based on the principles of physical dynamics, which is the basis of human intelligence. In this paper, inspired by the building block game, researchers introduced a series of challenging physical construction tasks, such as matching target configurations, stacking and attaching building blocks to connect objects, and creating target structures similar to shelters.

Then, the author further introduced how to complete this series of physical construction tasks through deep reinforcement learning agents. Experimental results show that agents using structured representations (such as object and scene graphs) and structured strategies (such as target center actions) can achieve better task performance than those using less structured representations. Structured agents have better generalization performance when larger scenario goals (usually exceed the scenarios used during training) require the use of reasoning.

In addition, in most physical construction problems, model-based agents that perform model planning through Monte-Carlo Tree Search can also achieve more significant performance than model-free agents. In general, for agents, combining structured representation and reasoning with powerful learning is the key to giving them rich and intuitive physics, scenario understanding and planning capabilities.

Introduction

There are many buildings in the real world, such as fortresses, pyramids, space workstations, etc. As for these physical structures, can AI agents realize it? This is also the problem that this institute needs to solve, and explore ways to learn and solve this series of tasks.

The so-called physical construction problem involves knowledge of physical dynamics, and constructs multiple elements under constraints to achieve the goal of rich functions. Figure 1 below is a process that simulates a set of physical construction tasks. This is similar to children playing with building blocks. It requires stacking and superimposing multiple building blocks to connect them into objects with various functions. For example, one task needs to stack blocks around obstacles to connect targets, while another task needs to build target blocks that shelters to cover and keep them in a dry environment. These tasks reflect the challenges encountered in real-world construction: emphasizing problem solving and its functionality rather than simply copying a given configuration for use in a new environment. This reflects the foresight and purpose of human beings during construction and is closely related to human intelligence.

Figure 1 Physical construction task.

In all tasks, dark blue objects are regular blocks, light blue blocks are sticky blocks, red objects are obstacles that cannot be touched, and gray circles represent points between blocks that are glued together. Black lines indicate the floor, used to separate the blocks below.

(a) Silhouette task (Silhouette): The agent matches the target block (depicted as a light green block) by stacking blocks.

(b) Connection task: The agent connects the small blue target to the floor by stacking blocks.

(d) Occlusion Difficult task: Similar to occlusion task, but the key is that the agent can only move a limited block at this time.

Although traditional AI technologies have been widely used in physical inference research, research on solving physical construction tasks using deep learning methods still needs further exploration. This study aims to explore the application of modern artificial intelligence in physical construction. The main contributions are:

(1) Use structured representations and scenes including vectors, sequences, images and graphics.

(2) Use absolute or target center coordinates to represent continuous and discrete actions.

(3) Model-free learning or actor-critic learning through deep Q-learning.

(4) Planning by Monte Carlo Search (MTCS).

Physical Construct Task

The simulated task environment used here is continuous and a 2D world generated by the program Unity and Box2D physics engines. Each period contains immovable obstacles, target objects and ground, as well as movable, pick-up and placement rectangular blocks.

Termination conditions for each period include:

(1) when a movable block comes into contact with an obstacle, or when it is placed in an obstacle overlap position.

(2) When the maximum number of actions exceeds.

(3) When the task specific termination condition is reached, the specific conditions for each task are as follows.

Silhouette task: As shown in Figure 1a, the agent must move the rectangular block and overlap it with the target block in the scene while avoiding contact with obstacles. The task is considered to be terminated when all target blocks have more than 90% overlap.
Connection Task: As shown in Figure 1b, the agent must stack rectangular blocks in three different positions in order to connect with the ground while avoiding arranging in the same layer as obstacles. When all target blocks are connected to the ground, the task is considered to be completed.
occlusion task: As shown in Figure 1c, the agent must build an shelter to block all obstacles without touching them. When more than 99% of the surfaces of the obstacles are blocked, the task is considered to be completed.
occlusion difficult task: As shown in Figure 1d, similar to occlusion task, here the agent also needs to build a shelter to block obstacles. But longer-term planning is needed at this time, because the movable blocks are limited, and the barriers are distributed more densely, cost more and have lower viscosity. Therefore, this task combines the limitations of the above three tasks, and its termination conditions are consistent with the occlusion task.

Agent

How to monitor and measure the status and performance of the agent? Here are several methods and indicators to observe the construction status of the agent, internal representation, learning the algorithm and action strategies, etc., as shown in Figure 2 below:

Figure 2 All agent structures

Observation format (observation format)

Each construction task will provide the target's state or image. These two forms are very important for the agent. In the end, it is hoped that the agent can use some symbolic input, such as computer-aided representation or the original input of the sensor, etc.

encoder (encoder)

uses two types of internal characterization: fixed-length vectors and directed graphs with attributes to calculate input strategies. Where the CNN encoder embeds the input image as a vector characterization, the RNN encoder will process the input vector of the target state sequentially through the RNN structure. Graph encoder Transforms a set of state input vectors into an icon and creates a node for each input target. Per-object CNN encoder generates graph-based representations from images.

policy (policy)

MLP policy: Based on the given vector representation, a multi-layer perceptron MLP policy is obtained, the output action or Q value is output, depending on the algorithm used.

GN policy: A graph-based characterization is obtained through graph encoder or per-object CNN, and then a stack network of three graph networks GN is used, where the second network processes some cyclic steps of numbers, which is consistent with the idea of "encoding-processing-decoding".

actions (actions)

Here, an absolute action form with an object-centric is proposed, called relative actions. Specifically, in the scenario, the agent can take action in consideration of the relationship between goals in the reasoning process, which is similar to the way humans think and act. Here, we mainly include the following four forms of action: continuous absolute actions, continuous relative actions, discrete absolute actions, discrete relative actions, etc. For specific explanations of each type of action, please refer to the description in the paper.

Learning algorithms (learning algorithms)

uses internal vector and graphical representations to generate actions by displaying strategies and Q functions.

RS0 Learning Algorithm : Used for continuous action output, use actor-critic learning algorithm and combines stochastic value gradient algorithm.

DQN learning algorithm : used for discrete action output, using Q-learning to implement a DQN network with edge Q value.

MCTS: Since the output of the DQN agent is discrete action, it is easy to combine other standard planning techniques such as MTCS. Here, the agent of DQN is used as a priori of MTCS and the learning experience distribution is changed through different MTCS settings.

Experimental analysis

Through a series of experiments, the effectiveness of the proposed agent in physical construction tasks was evaluated. For the effectiveness of training, curriculum learning methods are used in the experimental process to increase the complexity of tasks in each training period. For example, course learning in Silhouette task can increase the number of goals, it can increase the height of the goal in a connecting task, it can increase the height of obstacles in an occlusion task, etc.

Comparative versus absolute actions

experimental results show that agents using relational actions perform significantly better than those agents using absolute actions. In the task, almost every relational agent converges at a performance level of similar or higher medians, as shown in Figure 3a. When averaged to all course levels, the best performance of the relational agent is 1.7 times more reward value than the absolute agent, and if only the state-of-the-art level is considered, this difference value will be as high as 2.4 times, as shown in Figure 3b.

Figure 3c lists some best examples of absolute agents, while Figure 3d shows some best examples of relational agents.

Figure 3 Comparison of absolute action and relational action agents

(a) Comparison of rewards received by the two agents when averaged to all course levels.

(b) Comparison of rewards received by the two agents for the most difficult level of each course.

(c-d) Quantitative comparison of the performance of the two agents on four tasks for the most difficult level of each course.

Comparative experimental analysis of whether or not the model (model-based versus model-free)

Usually complex construction tasks require longer-term planning strategies rather than simple reactive strategies. Therefore, as mentioned above, MCTS-based strategies are used here to enhance the GN-DQN agent and evaluate their performance in a variety of different environments. The experimental results are shown in Figure 4, and it can be seen that the planning strategy is effective for the performance of agents, especially for connection and connection difficult tasks.

Figure 4 (a-d) Comparison of performance of GN-DQN-MCTS agents at different training and testing costs for the most difficult course level. Among them, the gray dotted line represents the performance of an agent with a search cost plan of 1,000. (e-h) Representative structure of GN-DQN-MCTS from task periods randomly selected from each task. Among them, the training and testing costs of silhouette and connection tasks are 0 and 50, respectively, the training and testing costs of occlusion tasks are 0 and 5, respectively, while the training and testing costs used by agents in occlusion difficult tasks are 10.

generalization performance analysis (generalization)

As shown in Figure 5, when applied to larger scenarios, GN-DQN agents, especially GN-DQN-MCTS agents, have very good generalization performance. For example, in the Silhouette task, GN-DQN-* agents can cover nearly twice the number of targets during the training phase, while the performance of other agents has significantly decreased. In multiple target connection tasks, although GN-DQN-* agents performed slightly, other agents performed nearly 0. In addition, d-f qualitatively shows the generalization performance of GN-DQN-MCTS agents. Generally speaking, through structured representation, agents can also have robust performance in more complex scenarios.

Figure 5 Zero-target generalization of multiple agents

(a) Silhouette task, the number of targets varies between 8 and 16.

(b) Connect the task and change the target position to the same level or different levels.

(d-f) GN-DQN-MCTS The performance of the generalization of the agent to a new scenario.

iterative relational reasoning analysis (iterative relational reasoning)

is spread through the information of the scene graph, and the Recurrent GN structure supports iterative relational reasoning. The ability to reason about relationships is measured by changing the number of steps of the iteration of the GN-DQN agent. Experimental results show that increasing the number of steps of information dissemination can improve the inference ability of the agent.

Conclusion and discussion

The main content of this study solves a series of physical construction task problems through RL agents. Experimental results show that through structured graphical representation, agents can achieve powerful performance and robust generalization capabilities under model-based planning and MCTS strategies. This work is the first study on agents learning physical construct tasks in complex environments, and combining rich structures and strong learning abilities is the key to problem solving. In future research, integrated methods of object detection and segmentation can be sought to learn the inference relationship between computer vision targets, and model learning and more complex search strategies can be continued.

Original link: https://arxiv.org/pdf/1904.03177.pdf

(This article is a compiled article for AI base camp. Please contact WeChat for reprinting 1092722531)