
I’m now using my biggest data set, approx 48,000 instances for the ETHUSDT trading pair (one hour candles) over the past six or so years. I ran through this data 20 times and the above chart shows total PnL for each run, given each trade is $100. So if I used the trained model for predictions I would just about lose my $100 by the end of the six years. At least this chart does suggest some learning is taking place. This is using a Double Q Network (or Double Deep Q Network if you prefer).
So, I’ve got a fair amount of data, I’ve got some learning happening, now to get the predictions above zero! Options are to increase the number of input features, to try different network architectures, or to try different algorithms. I’m just about at the point of implementing A2C (Advantage Actor Critic) to work with my test harness. A bit more study needed first. I think I’ll try that first, as my intuition is that it is the option that promises the most improvement.