Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
team:feroz_ahmed_siddiky [2016/02/18 11:32] frickeniteam:feroz_ahmed_siddiky [2022/07/14 11:15] – [Deep Action Obserever] siddiky
Line 3: Line 3:
 | {{:wiki:siddiky.jpg?0x180}} |||| | {{:wiki:siddiky.jpg?0x180}} ||||
 |::: ||Research Staff\\ \\ || |::: ||Research Staff\\ \\ ||
-|:::|Tel: |–49 -421 218 64000|+|:::|Tel: |–49 -421 218 64027|
 |:::|Fax: |--49 -421 218 64047| |:::|Fax: |--49 -421 218 64047|
 |:::|Room: |TAB 1.81| |:::|Room: |TAB 1.81|
 |:::|Mail: |<cryptmail>siddiky@cs.uni-bremen.de</cryptmail>| |:::|Mail: |<cryptmail>siddiky@cs.uni-bremen.de</cryptmail>|
 |:::| || |:::| ||
 +
 +==== Deep Action Obserever =====
 +Robotic agents have to learn how to perform manipulation tasks. One of the biggest challenges in this context is that
 +manipulation actions are performed in a variety of ways depending on the objects that the robot acts on, the tools it is
 +using, the task context, as well as the scene the action is to be executed in. This raises the issue of when to perform
 +a manipulation action in which way. In this paper we propose to let the robot read text instructions and watch the
 +corresponding videos illustrating how the steps are performed in order to generate symbolic action descriptions from the
 +text instructions. The text instructions are disambiguated and completed with the information contained in the videos. The
 +resulting action descriptions are close to action descriptions that can be executed by leading-edge cognition-enabled robot
 +control plans. To perform this learning task we combine two of the most powerful learning and reasoning mechanisms:
 +Deep Learning and Markov Logic Networks. Convolutional networks parameterized through deep learning recognize
 +objects, hand poses, and estimate poses and motions while the Markov logic networks use joint probability over the
 +relational structure of instructions to fill in missing information and disambiguate descriptions. Besides the combination
 +of symbolic and sub-symbolic reasoning the novel contributions include a Multi Task Network developed in a single
 +framework, optimized for computational cost, which can process 10 frames per second. We evaluate our framework on a
 +large number of video clips and show its impressive ability to interpret the manipulation tasks.
 +
 +[1] Feroz Ahmed Siddiky and Michael Beetz, "DeepActionObserver: Refining Instructions for Manipulation Actions by Watching Instruction Videos"




Prof. Dr. hc. Michael Beetz PhD
Head of Institute

Contact via
Andrea Cowley
assistant to Prof. Beetz
ai-office@cs.uni-bremen.de

Discover our VRB for innovative and interactive research


Memberships and associations:


Social Media: