I’ve pretty much got my head around the implementation of the Advantage Actor Critic (A2C) algorithm provided by Ivan Gridin in his book Practical Deep Reinforcement Learning with Python. There are a couple of lines of code I’m a bit uncertain about, but running them through the debugger and having a good look at what changes they bring about should help clarify.
Ivan’s example trades Microsoft stock, downloaded with yfinance (which gets data from Yahoo Finance). The networks he’s using are pretty basic, as is the trading strategy he’s using, and the state he constructs also. So the task ahead is to modify it to work with any data (and specifically my crypto data), to construct more elaborate state, use a more sophisticated trading strategy, and explore more complex network architectures, perhaps including networks other than MLPs. All without breaking the code!
These changes are not particularly challenging. Actually finding good solutions might be challenging, but it’s basically trial and error from here on out. I think at this stage I can say that I know what I’m doing. Probably not a good thing to say. Can I hear Nemesis winging my way?