
This plot shows the A2C algorithm with my largest dataset, aprox 6 years of hourly data for ETHUSDT. If my logic is correct then the rewards are the 30 day returns, so maybe 5% per month, not too shabby. Of course very variable, and probably some overfitting as each 100 episodes runs through the entire dataset a couple of times. Anyway, something to work on.

A dropout layer in each network, with modest drop0ut setting (0.2) has smoothed things out a lot. And done away with the optimism. I’m hoping that some other, more diverse, inputs will improve matters.