Embodied Reasoning for Discovering Object Properties via Manipulation

International Conference on Robotics and Automation, 2021

Jan Behrens*, Michal Nazarczuk*, Karla Stepanova, Matej Hoffmann, Yiannis Demiris and Krystian Mikolajczyk

[Paper]

Abstract

In this paper, we present an integrated system that includes reasoning from visual and natural language inputs, action and motion planning, executing tasks by a robotic arm, manipulating objects, and discovering their properties. A vision to action module recognises the scene with objects and their attributes and analyses enquiries formulated in natural language. It performs multi-modal reasoning and generates a sequence of simple actions that can be executed by a robot. The scene model and action sequence are sent to a planning and execution module that generates a motion plan with collision avoidance, simulates the actions, and executes them. We use synthetic data to train various components of the system and test on a real robot to show the generalization capabilities. We focus on a tabletop scenario with objects that can be grasped by our embodied agent i.e. a 7DoF manipulator with a two-finger gripper. We evaluate the agent on 60 representative queries repeated 3 times (e.g., ’Check what is on the other side of the soda can’) concerning different objects and tasks in the scene. We perform experiments in a simulated and real environment and report the success rate for various components of the system. Our system achieves up to 80.6% success rate on challenging scenes and queries. We also analyse and discuss the challenges that such an intelligent embodied system faces.

Citation

@article{behrens_nazarczuk2021,
  title={Embodied Reasoning for Discovering Object Properties via Manipulation},
  author={Behrens, Jan and Nazarczuk, Michal and Stepanova, Karla and Hoffmann, Matej and Demiris, Yiannis and Mikolajczyk, Krystian},
  journal={International Conference on Robotics and Automation (ICRA)},
  year={2021}
  }