From the 6 hour close prices of my synthetic data I have constructed state (a set of features) consisting of (in each row) the period return (fractional change from the previous period), the past 6 periods returns, and the total return over the past 7, 15, 30, 60, 90 and 120 days. Also what day of the week each row occurred on. This can sometimes be significant in trading. Code and dataframe are shown below:


So what is this data used for? Well, the ‘agent’ (in this case the DDQN) gets each row of data and has to work out whether to buy, hold or sell. If it buys then later sells, the profit (or loss) acts as a reward. It can also short sell, i.e. sell first and then buy later. With only this information, and initially acting completely at random, it learns how to make a profit!! Hopefully. Pretty neat, huh?