Open Access

Figure 1

image

Download original image

Schematic framework of the reinforcement learning algorithm using policy iteration for continuous-time dynamical systems. (A) At each time t=ti, for i=1, 2, …, one observes the current output y(t) and action u(t). The sampled input-output data are collected along the trajectory of the dynamical system in real-time, and are stacked over the time interval [t1, ts] as the discrete-time input-output data U and Y. (B) The input-output data of U and Y, associated with the prescribed optimization criterion, are used for updating the value estimate given in the critic module, based on which the control policy in the actor module is updated. The ultimate goal of this framework is to use the input-output data U and Y for learning the optimal decision law that minimizes the user-defined optimization criterion J(Q, R, u(t), y(t)).

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.