Issue |
Natl Sci Open
Volume 3, Number 5, 2024
|
|
---|---|---|
Article Number | 20230054 | |
Number of page(s) | 18 | |
Section | Information Sciences | |
DOI | https://doi.org/10.1360/nso/20230054 | |
Published online | 22 March 2024 |
RESEARCH ARTICLE
Learning the continuous-time optimal decision law from discrete-time rewards
1
School of Automation, Guangdong University of Technology, Guangdong Key Laboratory of IoT Information Technology, Guangzhou 510006, China
2
Key Laboratory of Intelligent Information Processing and System Integration of IoT, Ministry of Education, Guangzhou 510006, China
3
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
4
111 Center for Intelligent Batch Manufacturing Based on IoT Technology, Guangzhou 510006, China
5
UTA Research Institute, the University of Texas at Arlington, Fort Worth 76118, USA
6
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville 37996, USA
7
Oak Ridge National Laboratory, Oak Ridge 37830, USA
8
Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510006, China
* Corresponding authors (emails: ci.chen@gdut.edu.cn (Ci Chen); elhxie@ntu.edu.sg (Lihua Xie); shlxie@gdut.edu.cn (Shengli Xie))
Received:
6
September
2023
Revised:
18
January
2024
Accepted:
18
March
2024
The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences. Seeking an interpretable reward for decision-making that largely shapes the system’s behavior has always been a challenge in reinforcement learning. In this work, we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws. We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards. We apply this finding to solve output-feedback design problems in power systems. The results reveal that our approach removes an intermediate stage of identifying dynamical models. Our work suggests that the discrete-time reward is efficient in search of the desired decision law, which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.
Key words: continuous-time state and action / decision law learning / discrete-time reward / dynamical systems / reinforcement learning
© The Author(s) 2024. Published by Science Press and EDP Sciences.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.