7 Summary and Discussion

In this thesis, we started with the idea of using numerical differential equations solver as an iterative solver for a linear system. More specifically, we turned our attention to a specific RK method, which has two parameters to chose from, which we called the solver parameters. We also chose a specific type of linear equation which arises from the discretization of the steady state, one dimensional convection-diffusion equation. This linear equation depends on two parameters, which we called the problem parameters. The goal was then to see if we could optimize for the solver parameters, as a function of the problem parameters, to maximize the convergence rate of the method. To do that, we used reinforcement learning. In particular, we applied the classical REINFORCE algorithm to our problem. Using the implementation in this thesis, we observed that the implemented solution works, with limited results. In particular, if we use the parameter that we learn, it is possible for the solver to diverge for some problem parameters. There are some avenues to improve these results, in particular:

On the technical front, the implemented algorithm is very elementary, and suffers from the issue of high sample variance, being a Monte Carlo method. This issue can be addressed by more better algorithms.
The policy used was a linear function of the problem parameters. We may want to explore if choosing a policy taking into account interactions between the problem parameters, or applying some transforms to them before fitting a linear policy. It is also possible to fit a neural network to the policy.
Possible incremental improvements can also be made. This involves for example experimenting with the reward function design, or setting a decaying learning rate to improve convergence of the RL algorithm.

At last, we need not restrict ourselves to just one type of solver. We could potentially train an intelligent agent to chose which numerical solver to use, depending on the problem.

There is on the other hand one glaring issue with the way that reinforcement learning was applied to this problem. A core philosophy of reinforcement learning is that the states, actions and rewards are all interdependent. This interdependence was absent in this thesis, with the state transition being random, no matter the action taken. While it was possible to adapt this philosophy as presented in this thesis, this somewhat hampers the utility of using reinforcement learning over other methods. In particular, one may wonder if the implementation presented here is essentially “gradient descent, with extra steps”.

It is therefore preferable to change how the problem is approached. One approach could be train an agent to dynamically change the solver parameters over successive iterations for some specific set of parameters. In that case, the agent would need information about the evolution of the residual, which complicates the modeling problem. Another approach would be to make use of meta learning [1], where instead of directly finding the optimal solver parameters, we learn how to find them efficiently.