Approximating the Value Function for Continuous Space Reinforcement Learning in Robot Control (bibtex)

by Sebastian Buck, Michael Beetz and Thorsten Schmitt

Abstract:

Many robot learning tasks are very difficult to solve: their state spaces are high dimensional, variables and command parameters are continuously valued, and system states are only partly observable. In this paper, we propose to learn a continuous space value function for reinforcement learning using neural networks trained from data of exploration runs. The learned function is guaranteed to be a lower bound for, and reproduces the characteristic shape of, the accurate value function. We apply our approach to two robot navigation tasks, discuss how to deal with possible problems occurring in practice, and assess its performance.

Reference:

Sebastian Buck, Michael Beetz and Thorsten Schmitt, "Approximating the Value Function for Continuous Space Reinforcement Learning in Robot Control", In Proc. of the IEEE Intl. Conf. on Intelligent Robots and Systems, 2002.

Bibtex Entry:

@inproceedings{Buc02App,
  author    = {Sebastian Buck and Michael Beetz and Thorsten Schmitt},
  title     = {{Approximating the Value Function for Continuous Space Reinforcement Learning in Robot Control}},
  booktitle = {Proc. of the IEEE Intl. Conf. on Intelligent Robots and Systems},
  year      = {2002},
  bib2html_pubtype  = {Refereed Conference Paper},
  bib2html_rescat   = {Robot Learning, RoboCup},
  bib2html_groups   = {AGILO},
  bib2html_funding  = {AGILO},
  bib2html_keywords = {Learning, Robot},
  abstract = {Many robot learning tasks are very difficult to solve: their state spaces are high dimensional,
              variables and command parameters are continuously valued, and system states are only partly
              observable. In this paper, we propose to learn a continuous space value function for reinforcement
              learning using neural networks trained from data of exploration runs. The learned function is
              guaranteed to be a lower bound for, and reproduces the characteristic shape of, the accurate value
              function. We apply our approach to two robot navigation tasks, discuss how to deal with possible
              problems occurring in practice, and assess its performance.}
}