
I’ve been looking at Ivan’s implementation of the A2C (Advantage Actor Critic) approach to deep reinforcement learning for trading, which he said (in 2022, when the book was published) was the bees knees (my term, not his). So I applied it to my recently downloaded data for ETHUSDT. Results shown above. Not great, but it’s a start.
His construction of state is fairly basic, just the last 10 closing prices as far as I can tell. I’m sure I can do something about that. Also he’s using raw price data. Most people who talk about training models for trading recommend using returns (percent change) rather than actual prices as the latter don’t have a constant mean or variance. I’m not sure if that’s relevant to these RL models, but I have a feeling that it is. Also the trades are simple, just buy (or short) at the start of the day and sell or cover at the end. No holding until a signal to close. The actual code is going to take some study. I get the general idea of what the actor critic approach is trying to achieve, compared with the temporal difference approach which is what I have been looking at up ’til now. The devil is in the details however.
So, I’ve looked at the rather elaborate approach used by Quantra in their Deep Reinforcement Learning in Trading course, with state composed of ohlc data over several bars at three levels of granularity. Plus technical indicators, and calendar related inputs. Approach is temporal difference (I think that’s what it’s called). I’ve looked at a similar approach from DeepLizard which was created to solve a GridWorld environment, which I’ve rejigged to work with trading data. Not sure how successfully. And now this A2C approach from Ivan Gridin’s book.
I’m not at the point where I could write code to implement one of these without consulting references. Too many details that I haven’t totally internalized yet, especially concerning getting tensors into the right shape. It’s a language problem really, internalizing the grammar and vocabulary so that you can speak/write without thinking about it. I guess it’s just practice, practice, practice.