Open Access

Figure 4

image

Download original image

Learning from discrete-time rewards. (A, B) Rewards associated with the output y(t) and the action uk(t) at the kth iteration step. The data here represent the rewards at the 1st and 8th iteration steps, which correspond to the learning results for the first and final trials. (C) Discrete-time rewards for the different iteration steps. It suggests that as the iteration steps go larger, the value of the discrete-time reward decreases. (D) The convergence of learned control policy. The convergence error decreases as the iteration step turns large. This ensures to increase in accuracy by simply setting the iteration step large. The ratio between the gain matrices K¯k and P¯k reveals the different learning capabilities in approximating the gain matrices from data.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.