Reinforcement Learning – Page 3

Yesss!!

I fixed all the bugs in the code that uses a DDQN to make trading decisions for ADA/USDT. I learned some important things such as how to properly use the gather method in PyTorch. Seems the error that was giving me the most trouble (and which caused me to research said gather method) was due to my specifying that my network had 2 outputs when it should have had 3. My bad, but a good lesson learned. Also, must admit that the results look promising. However in the past every trading strategy that ‘looked promising’ ended up losing me money, so I’m not going to fall for that again.

ETA: I haven’t done any validation on that promising backtest result, so it really doesn’t mean much. I’ll have to grab some more recent data and test out the model on that to see if it actually generalizes instead of overfitting the training data. Still, I might be a lot closer to Phases 3 and 4 of my plan than I realized in my last post.

That Went … Well

With some minor editing I converted the notebooks for the Quantra course on Reinforcement Learning in Trading to ordinary Python files and got it up and running in my new Docker container. And running. And running. Several hours later the screen seemed to have completely frozen, neither mouse nor keyboard had any effect, and I shut down the computer.

There was some output while it was running – time for each step. At the start it was 10 – 20 msecs per step, but by the time I shut it down it was 10 – 20 secs. I have no idea how far it actually got in processing the data, of which there was quite a bit more than my previous exercises in ML have included. I think I’ll have to include some sort of logging so I can get a bit more feedback on what’s actually happening. Or perhaps I should just use a fraction of the data. Or something. It would be nice to actually get some results. But at least the program didn’t crash, at least not after I fixed the issues that caused the first few crashes!

Being Flexible

A while back I decided to stick with the PyTorch library instead of Keras/Tensorflow for neural networks. However that decision seems to be limiting me a bit too much. One of the reasons was because trying to get Tensorflow to work after I had set PyTorch up to work with the GPU caused errors that I couldn’t resolve.

Well, there is a way to resolve them, and that is to run Tensorflow in a Docker container. I’ve already tried that a couple of times, it works, although I’m not going to try to get it to work with the GPU from inside a Docker container! It probably can be done, but not by me.

Anyway the main codebase I want to explore, using Tensorflow, is some Quantra courses involving deep learning. They use Tensorflow, and my attempts to convert to PyTorch were not as successful as I hoped they would be. I think it’s because that codebase is just a bit too complex for my current level of understanding. Quantra also use ta-lib quite a bit for indicators. That has to be built from source, and luckily I found some code on StackOverflow that does exactly that, so I now have a Docker container with both Tensorflow and TA-Lib installed and hopefully that will be enough (in addition to all the usual data science packages of course). I forgot to install Jupyter Notebook but I can live without that. The original Quantra files are all Jupyter Notebooks, but I find that’s hopeless for debugging and I prefer to reconfigure them all as ordinary py files anyway.

So after lots of very frustrating explorations of apps that never seem to work, for a range of reasons, I’m back to the original app that got me started in Reinforcement Learning. Maybe I can actually get it up and running this time, and more importantly, understand it well enough to get it to work with my own data and not just the data supplied in the course.

State

From the 6 hour close prices of my synthetic data I have constructed state (a set of features) consisting of (in each row) the period return (fractional change from the previous period), the past 6 periods returns, and the total return over the past 7, 15, 30, 60, 90 and 120 days. Also what day of the week each row occurred on. This can sometimes be significant in trading. Code and dataframe are shown below:

So what is this data used for? Well, the ‘agent’ (in this case the DDQN) gets each row of data and has to work out whether to buy, hold or sell. If it buys then later sells, the profit (or loss) acts as a reward. It can also short sell, i.e. sell first and then buy later. With only this information, and initially acting completely at random, it learns how to make a profit!! Hopefully. Pretty neat, huh?

Synthetic Data

A couple of courses I’ve done have recommended testing an algorithm on synthetic data, especially data with a very simple form such as a fairly linear uptrend or a sine wave (to emulate a mean reverting asset). Each of these should be a bit noisy to be more ‘realistic’ but still, simple. This will show whether the algorithm can learn such simple patterns, and provide the opportunity to test out whether other factors are OK.

One of those ‘other factors’ is amount of data. In the course I’m currently looking at the data used is about 10 years worth of 5 minute data! And when they backtested the algorithm (a double deep Q network) it took about a year to become profitable. That’s a lot of data. Another factor is the state. The course used the whole OHLCV data, with so many lags, at different time granularities, with associated TA indicators, that it ended up with about 160 inputs. My current intention is to simplify, starting with how the input state is organized. But perhaps it’s a good idea to test it out on something easy, such as a noisy uptrend or a noisy sine wave, rather than go straight for real data.

So I made some synthetic data, 6hourly samples of a noisy uptrend, 8000 samples in total. Will my DDQN (Double Deep Q Network) be able to solve this problem?

Plan B

Or is that plan Z? Anyway, several months ago I purchased a course on Quantra about Reinforcement Learning in Trading. I found it pretty heavy going, especially as some of the explanations seemed a little ‘light on’. Well, I’ve just been going through it again (videos, text and code in the form of Jupyter Notebooks) and it makes a lot more sense now that I’ve filled in the details from other sources.

So my plan now is to recreate the template that it develops but in a simpler form. I can always add complexity later. I’ll be using PyTorch instead of Keras/Tensorflow as well, but that change should be pretty trivial now that I’m more familiar with both.

I plan to use ADA as the asset, having already settled on that for my previous plan (development of a ML assisted momentum strategy). Once again I doubt that I’ll actually use this in trading, but it’s a field I’m familiar with so I can focus on setting up the Double Deep Q Network instead of concerning myself with how that relates to the task. I feel pretty confident that I can get something up and running, with lots of scope for improving it after that, and lots of opportunity to test it out on real world data once its working. Definitely seems like a plan.

Found It

I’m very particular in how I like information to be presented to me. Not too fast, not too slow, right level of difficulty – you get the idea. I’ve finally found a reference that presents an explanation of the implementation of a Deep Q Network to solve a basic Reinforcement Learning problem (Frozen Lake 4×4) that fits the bill. Here it is, on YouTube.

Roadblocks

Whatever resource I’m studying I run into roadblocks. Following the development of a topic, A -> B ->C and all good, but suddenly E appears with no obvious transition from what went before. Perhaps the author/tutor plans to cover C -> D and D -> E later, but I have difficulty when I lose track of an argument/development of an idea. Perhaps I’m just not very flexible.

My usual response is to go to another resource/reference. At the moment I’ve gone back to a Quantra course that I purchased quite a few months ago on Deep Reinforcement Learning in Trading. I think I abandoned that when I first looked at it because it uses TensorFlow for the NN part and I want to stick with PyTorch. These days I’m a little more comfortable with rewriting code that is designed for TF into the PyTorch version. It’s not really that hard.

Anyway I’m making progress with that course and don’t anticipate any further serious roadblocks. Pity I don’t plan to actually do any (short term) trading going forward. At the moment it’s just an intellectual exercise.

Exploration, Exploitation

Finished another chapter, discussing one of the most basic problems in human life. Decisions. Specifically, how to transition from collecting information required to make a decision, to actually implementing that decision.

In ML terms the initial phase is called exploration. In the case of our one armed bandits (see previous post) it involves trying them all out to see which one (if any) gives the best payout. Exploitation is then using that best bandit to make money. Problem is if you spend too much time on exploration you’re spending a lot of money on losing machines. If you spend too little time/money on exploration then the ‘best’ one may be just a fluke, and not profitable in the long term.

A couple of the policies regarding the transition discussed in the book look at a soft transition from exploration to exploitation. During the exploitation phase occasionally recheck the ‘losing’ machines to gather a bit more data to confirm the decision. And if the ‘winning’ machine starts to produce bad results in the long term recheck the others more often. Rather like a bad marriage I guess.

Bandits

I’ve worked through the chapter on OpenAI Gym, even got the scripts working, adjusting for the changes to the API since the book was published. This is a problem with all IT related stuff, changes are so frequent that just about any tutorial/book material is out of date to some extent. Makes learning harder. Anyway, the approach to those games wasn’t very sophisticated. Just demonstrating that if one selects from available actions at random one doesn’t get very far.

So now we’ve moved on to One Armed Bandits, a nickname for slot machines. Looking at strategies for exploring a set of such machines to determine if any has a better probability of reward than the others. This could be relevant to trading. Given a situation where funds could be allocated to a range of strategies, how to determine the most profitable course of action. Especially if you don’t have past history to work with.

Maintaining Interest

I’m still searching for something that holds my interest for more than a few days. I’ve spent a lot of time on quantitative trading over the past few years, getting nowhere. or rather, going backwards (losing money). More recently I’ve spent time on Machine Learning, hoping it could improve my trading in some way. Not sure I will even need to do trading anymore. If I have enough to live on after buying an apartment I won’t actually need to make any income.

So, what to do? I’m coming around to revisiting reinforcement learning, which is a very interesting area of ML, and could be related to trading should I ever go back to that. It’s a complex area of ML, but I think I’ve got enough of a handle on the basics to take another shot at it. And once I’ve got it set up, finding the right data will be the biggest challenge. Garbage In, Garbage Out, they say, so identifying useful inputs will be the challenge. Or I could input everything and let the algorithm sort out what’s useful. That’s the beauty of reinforcement learning. The code does all the work.

So, I’m re-reading Practical Deep Reinforcement Learning with Python by Ivan Gridin. He gives examples in both PyTorch and TensorFlow, so I should be able to follow along fairly easily, at least as far as the ML library is concerned. However I have had problems with Reinforcement Learning in the past because of changes to other libraries commonly used. A favourite is openai.gym which gives me some headaches. Setting up a system that actually works with the code is a real headache in this area of ML. Let’s see how it goes this time.

So (my favourite word for starting paragraphs), I could develop an RL (reinforcement learning) model that has various trading strategies as actions, PnL as reward, and an environment that includes all the data that might be relevant. Perhaps the simplest starting point would be to concentrate on opening/closing positions on a single instrument. After that I could move on to managing a portfolio.