Random Actions

Average Total PnL-402
Standard Deviation135
Total PnL with Random Actions (average of 10 episodes)

Time to test out different algorithms. I’ve modified my code to use a larger dataset, both in number of items and size of the state. Also I modified the agent’s policy to always pick a random action, so the current HoldProcessor, described in the previous post, doesn’t actually do anything. I also ran the agent’s run method 10 times to get an average result (total reward, which is percentage profit or loss). I’m considering this a baseline upon which to improve.

The interpretation of the result is as follows: if I put $100 on every trade (when the agent received an action of 2 (buy) from the policy followed sometime later by 0 (sell), I would lose $402 over the six year period that the data represents.