Researchers have addressed the problem of learning to control an unknown nonlinear dynamical system through sequential interactions.
They focus on achieving fast sequential learning in continuous control problems where the system dynamics depend smoothly on unknown parameters.
The study demonstrates that fast sequential learning is attainable if the optimal control policy is persistently exciting.
Additionally, they derive a regret bound which grows with the square root of the number of interactions for non-persistently exciting optimal policies.