Reinforcement Learning for the Optimization of Explicit Runge Kutta Method Parameters

Author

Mélanie Fournier

Published

September 14, 2024

Introduction

Machine learning is everywhere. It has applications in computer vision [1], robotic [2], finance [3], recommender systems [4], playing games at a high level [5] or even discovering new matrix multiplication algorithms [6]. The use of machine learning in scientific problems which has been aptly called scientific machine learning is also growing, with the most important example being combining neural network and physic laws to either discover or solve partial differential equations [7].

In this thesis, we focus on studying reinforcement learning, which is one of the three main machine learning paradigm. The three main paradigm are as follow[8, Ch. 1.1]:

Supervised learning, where we learn using data containing an input, and a desired output. Regression models are an example of a supervised learning.
Unsupervised learning, where the data only has an input but no desired output. Examples include clustering algorithms.
Reinforcement learning, in which we have an intelligent agent who learns to do something by interacting with its environment, receiving feedback in the form of rewards which the agent wants to maximize.

What sets apart reinforcement learning from its cousins unsupervised and supervised learning is the introduction of the concept of reward. The agent learns by trial and error, and wants to maximize the rewards it gets over time. This is, in essence, quite similar to how we animals learn to do things, and it is no surprise that reinforcement learning traces its roots from the field of animal learning [9]. Another important root of reinforcement learning comes from the field of optimal control, where the agent and environment of reinforcement learning are respectively the controller and controlled system in control theory [8, Ch. 1.7].

To study reinforcement learning, we need a playground. That playground could be an already established playground (such as the Gymnasium API), but in this thesis we use our own playground, which we find in the realm of numerical differential equation solvers.

Numerical methods for differential equations are amongst the most important methods in numerical analysis. All of these methods have specific strengths and weaknesses. They all have, however, some parameters that need to be chosen, if only for the step size. These parameters have to be chosen to maximize performance, and depend on the problem. In some cases they are taken using some heuristics, but they can also be searched for computationally, which is an issue, as any computation means more time to get to the solution. It would therefore be a great time saver if a computer could learn these heuristics, for example using reinforcement learning! This is the playground we use in this thesis, albeit with a reduced scope.

We start by motivating the use of numerical ODE solvers to solve linear systems. As a case study, we have a specific type of linear systems, which appears when discretizing the steady state, one dimensional convection diffusion equation \(u_{x} = bu_{xx} +1\). Doing so, we end up with two problem parameters; \(b\), which is a physical constant, and \(n\), stemming from the discretization. The studied numerical solver is an explicit Runge-Kutta method, and has two parameters, a (pseudo-) time step \(\Delta t\) and another parameter \(\alpha\), which need to be chosen. How to choose these solver parameters, as a function of the problem parameters is then left to the realm of reinforcement learning.

We then introduce through intuitive examples (and a very cute bunny) the main concepts of reinforcement learning, such as states, actions, state transitions and rewards which are then formalized as a Markov decision process. We then introduce policy gradient methods, and in particular we introduce the classical REINFORCE [10] algorithm, which we use to optimize the solver parameters for the studied linear systems.

The results, while positive, are hampered somewhat by the fact that the method used in this thesis is not a natural fit to what makes reinforcement learning so powerful. A discussion on how to redefine the problem to make better use of the strengths of reinforcement learning will follow.

[1]

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models.” 2022. Available: https://arxiv.org/abs/2112.10752

[2]

J. Kober, J. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, pp. 1238–1274, Sep. 2013, doi: 10.1177/0278364913495721.

[3]

B. Hambly, R. Xu, and H. Yang, “Recent advances in reinforcement learning in finance,” Mathematical Finance, vol. n/a, no. n/a, doi: https://doi.org/10.1111/mafi.12382.

[4]

X. Chen, L. Yao, J. McAuley, G. Zhou, and X. Wang, “A survey of deep reinforcement learning in recommender systems: A systematic review and future directions.” 2021. Available: https://arxiv.org/abs/2109.03540

[5]

D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016, doi: 10.1038/nature16961.

[6]

A. Fawzi et al., “Discovering faster matrix multiplication algorithms with reinforcement learning,” Nature, vol. 610, no. 7930, pp. 47–53, Oct. 2022, doi: 10.1038/s41586-022-05172-4.

[7]

S. Cuomo, V. S. di Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, “Scientific machine learning through physics-informed neural networks: Where we are and what’s next.” 2022. Available: https://arxiv.org/abs/2201.05624

[8]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Second. The MIT Press, 2018. Available: http://incompleteideas.net/book/the-book-2nd.html

[9]

C. Mahoney, “Reinforcement learning: A review of the historic, modern, and future applications of this special form of machine learning.” https://towardsdatascience.com/reinforcement-learning-fda8ff535bb6, 2021.

[10]

R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3, pp. 229–256, May 1992, doi: 10.1007/BF00992696.