I have a complete Double Deep Q Network solution to a game (Cartpole) up and running! First time achieving this. Admittedly it’s someone else’s code, only slightly reorganized by me, but I understand nearly all of it and so should be able to get it to work with other inputs, such as trading data.
I don’t know why it has been so hard to get to this point. Simply copy/pasting someone else’s code should be a no-brainer, but the process has been fraught with difficulties. Anyway, I’ll explore this solution further ’til I understand it completely, apply it to a variety of problems, and then develop things from there. The network topology used is very basic, and I’m sure can be improved for a variety of other problems. Also hyperparameter tuning might come in handy. Perhaps most important is that conceptually I now understand what’s going on, even those equations I mentioned a few posts ago make more sense now. It’s just a couple of implementation details that I haven’t quite got my head around. For one section of code even the author/presenter says to check out StackOverflow for an explanation of how it works!
Reinforcement Learning faces a dichotomy between Exploration and Exploitation, mentioned in a previous post. Exploration is fairly random, and develops knowledge of how the environment works. Exploitation involves using that knowledge to achieve real ends. Usually in RL there’s always some exploration, just in case there’s some undiscovered treasure in the environment.
Having spent a lot of time on exploration of the subject of RL, I’m now about to exploit that knowledge. However I’ll continue to explore (study) some of the time to improve my general understanding and perhaps discover some golden nugget of knowledge that takes me to a new level.