| Average Total PnL | -402 |
| Standard Deviation | 135 |
Time to test out different algorithms. I’ve modified my code to use a larger dataset, both in number of items and size of the state. Also I modified the agent’s policy to always pick a random action, so the current HoldProcessor, described in the previous post, doesn’t actually do anything. I also ran the agent’s run method 10 times to get an average result (total reward, which is percentage profit or loss). I’m considering this a baseline upon which to improve.
The interpretation of the result is as follows: if I put $100 on every trade (when the agent received an action of 2 (buy) from the policy followed sometime later by 0 (sell), I would lose $402 over the six year period that the data represents.