I did some reading on the use of dropout layers in RL Agents, and the general consensus is that it’s not such a good idea. So I removed them and ran the study again for 5000 episodes this time just to check for some kind of convergence to a reasonably stable result, but no luck.

This shouldn’t be too surprising really. If significant results could be extracted from a few lagged returns and a few returns over longer time periods then everyone would be doing it. My agent needs more intelligence, in the sense of information. Before I spend much time tuning the algorithm I need to ensure that there is some meaningful information to extract.
So, time to go back to working on input features, aka state. Apart from data such as other markets and sentiment, I must explore the possibility of using the output of other models as inputs to this one. One possibility is to use an unsupervised learning model to identify different clusters of trading conditions, and use this as input to my RL agent. I even have a course on the use of Unsupervised Learning in Trading. Perhaps it’s time to review it. Would this be a case of ensemble learning? Probably would.