Synthetic Data

A couple of courses I’ve done have recommended testing an algorithm on synthetic data, especially data with a very simple form such as a fairly linear uptrend or a sine wave (to emulate a mean reverting asset). Each of these should be a bit noisy to be more ‘realistic’ but still, simple. This will show whether the algorithm can learn such simple patterns, and provide the opportunity to test out whether other factors are OK.

One of those ‘other factors’ is amount of data. In the course I’m currently looking at the data used is about 10 years worth of 5 minute data! And when they backtested the algorithm (a double deep Q network) it took about a year to become profitable. That’s a lot of data. Another factor is the state. The course used the whole OHLCV data, with so many lags, at different time granularities, with associated TA indicators, that it ended up with about 160 inputs. My current intention is to simplify, starting with how the input state is organized. But perhaps it’s a good idea to test it out on something easy, such as a noisy uptrend or a noisy sine wave, rather than go straight for real data.

So I made some synthetic data, 6hourly samples of a noisy uptrend, 8000 samples in total. Will my DDQN (Double Deep Q Network) be able to solve this problem?