It’s Done

Finally got a very basic implementation of the RL model up and running, basic in that the input data is just a few lags of the cryptocap chart. Much work to be done, but I do have an XGBoost model predicting the prospects specifically for ADAUSDT over the next few days, and an RL model using broader market data to tell whether it’s ‘a good time to trade’. As I said, much work to be done, but the basic system is in place. From here on – refinement. I’ve been working on this for quite a while now (mostly learning how to implement Deep Reinforcement Learning in the context of trading) and I’m very happy to reach this point at last.

At the moment both models are telling me to trade, but my intuition says otherwise. I think I’ll give it another 12 hours or so to see which way the wind is blowing. Is there still room for intuition? Maybe.

Back to Square One

During the week I had a few small wins, and a larger loss (8%) that wiped out those wins and brought me back to square one. Almost. So, business as usual. Actually I’m 10 cents down on my initial 100USDT, but close enough.

I still haven’t got the RL model up and running, but I have been tweaking the XGBoost model a bit. I’m also changing my basic trading method, now using trailing stops that should prevent larger losses but will probably mean more smaller losses. I’ve come to hate stops over the years because they always have meant that the price drops to hit my stop, sell me out at a loss, than bounce back up again so I can’t buy back in at the lower price. Has this changed? Probably not but I’ll see I guess.

A few insights are emerging from my explorations. XGBoost has a handy feature to print a chart of which features are most important in its decision making, and it seems recent price movements are the least important. The most important features seem to be the total change and the volatility over the past 90 days. I guess those are showing the overall trend. I’m not sure I need an ML model for that.

More Progress

I’ve restructured the RL trading app the way I like it, and sorted out the obvious bugs to the point where it runs without error. Of course there might still be logic errors in the code, and the performance is not great. There seems to be a lot of reshaping, squeezing and unsqueezing tensors along the way, probably more than is actually necessary. I’m going to have to examine parts of the code in more detail. Anyway, running it on my 6hr data for ADAUSDT produced the following plot

Each of the 50 episodes was once through the entire dataset. Probably some serious overfitting there, although no obvious learning took place. However the average reward (return from trade) was greater than zero most of the time.

So writing this the way I want, and fixing all the errors, has improved my understanding of how it all works quite a bit, and put me in a position to explore different variations. I’m feeling pretty happy with progress. I’ve been working on this for several months now.

I added another layer to the neural network. There seem to be fewer results below zero, and not so far below zero. Probably a better result.

Moving Right Along

I’ve written code for my ADA trading environment, and an Agent that decides to sell, hold or buy using the integers 1, 0 and 2. Here’s the code that currently makes that important decision

    def select_action(self, state):
        """select action and pass to environment"""
        action = random.randint(0, 2)
        self.current_action = action    # needed for replay buffer
        reward, next_state, trade_closed = self.env.receive_action(action)
        return reward, next_state, trade_closed

The third line just selects a number from 0 to 2 randomly! If I run my app with 8863 lines of data I do usually get about 0.02% average return, which might just cover the trading fees.

I need to replace that line with a neural network or two. I’m going to attempt to do that with as little reference to other people’s code as possible. A real test of my understanding of how these things work. What mark will I get for this assignment? Well, that could be the profit that my code manages to produce, if I ever use it for live trading. Learning Reinforcement Learning is itself an exercise in Reinforcement Learning. How meta!

ETA: Have an NN making decisions, but not actually learning yet. Sorted out numerous issues converting lists <-> np.ndarray <-> torch.tensor, all with the right number of dimensions! And no integers amongst the floats. Overall profits about the same as selecting random actions. Now, on to the actual learning.

Feeding the Beast

I’m expanding the input features to my ADA neural network. Have added day of the week (one hot encoded), a measure of the range of each period (low/high), and an RSI indicator courtesy of the TA-lib. Plus I updated the data from Binance and now have over 8000 six-hour periods. The spreadsheet with the data looks quite impressive. Below is a screenshot showing the first 80 periods, or 20 days, one percent of the total.

Some of the column headings are not quite accurate. 30dayret is actually 30periodret, where the period is 6 hours, not 1 day. In the past I’ve mostly worked with daily data so it’s a habit to refer to everything as 30day, 60day, etc.

Running my training script on this data gave me an average return per trade of 0.5% (before transaction costs), and maybe 100 such trades per year. I guess if I cleared 0.3% per trade on 100 trades that would be about 30% per year, which is not too shabby. Still, rosy test results have cost me quite a bit in the past. Those trading gods are fickle, if not downright malicious.

I guess I’ll have to redo my hyperparameter tuning now that I’m using an altered data set, and some validation of course. And maybe explore different network topologies, more nodes, more layers, potentially different kinds of layer such as RNN or CNN.

So far I’m only looking at a long-only strategy. I could expand this to a long-short strategy, but that’s harder to actually trade now that Binance doesn’t allow margin trading (in Australia). Perhaps I should check out 1inch or similar. Binance was so convenient. Not going to get too excited. If all goes well I might put $100 into trading the strategy, just to maintain some interest.

An interesting possibility is that the model trained on ADA could be used on other coins. Seems to be a common practice, using pre-trained models for similar problems. I doesn’t take that long to train a model though. Currently about an hour for 1,000,000 episodes (each episode is one period of data)

Hyperparameter Tuning

Hyperparameter tuning sounds such a fancy term, but in reality it’s just adjusting a couple of variables to get the best result possible. Like finding the perfect temperature to cook crepes (I’ve been seasoning a new carbon steel crepe pan lately, with mixed resuls).

I’ve been exploring various values of the learning rate. An ML algorithm starts of by ‘guessing’ how important any given input feature is in determining the final result, and adjusting the importance depending on how wrong the predictions are. The size of the adjustment is the learning rate, and the best rate has to be determined on a case by case basis. So, try a whole bunch of values, within a range that ‘seems reasonable’, and find the best by trial and error. A lot of that in machine learning.

Another ‘hyperparameter’ commonly experimented with is the optimizer algorithm used to go from first guess to best result. I’ve tried Adam, SGD (Stochastic Gradient Descent) and RMSprop. Also AdamW, which is supposed to be an improved Adam but in my case gave worse results. I don’t intend to modify the actual network much until I get some more consistent results. So far they’re very variable. I think I need a wider range of inputs.

I haven’t found that using a gpu is faster than using the cpu, however I’ve seen charts that show that for large, complicated problems it is faster, but not necessarily for smaller, simpler problems. However one big disadvantage of using the cpu for my machine learning problem is that it uses all the processing power and I can’t do anything else on the computer while waiting for the training to complete. And when you’re doing trial and error, that can take a long time. Using the gpu for training leaves me with enough processing power on the cpu to do most of the other things I use my computer for. Definitely the way to go.

Too Good to be True

My early tests on my RL trading app gave promising results, which at the time I thought were a little ‘too good to be true’. Well, that feeling was justified, as I later discovered that I had written the code in a way that it repeatedly learned from a small subset of the data, thus essentially ‘rote learning’ (called overfitting in ML lingo) rather than learning more general principles that could generalize to unseen data.

So after rewriting the code, and fixing many other errors besides (for which the logging I’ve incorporated has proved somewhat helpful), and also downloading 6 hour data from Binance for the entire 6 years that they’ve had ADAUSDT on their exchange, I’ve been running the app again with a variety of optimizers but always getting a similar result, which is that almost no learning takes place!! My average return over several thousand trades is approx 0.03%. Not enough to even cover the fees (which I haven’t included in the algorithm).

There’s not much point trying to do further optimization, or explore a range of different network topologies, when the baseline is so close to zero. I think I’m going to have to address the ‘what data to use’ issue first up, until I do actually get some learning, and then try to improve on it. That Quantra course used quite a lot of input features, including several technical indicators and what day of the week it was. I’m going to have to enlarge my ‘state space’ a bit. It’s interesting to see what other people (authors of books/courses) are using for their features. Ideally I should be using some measure of market sentiment. Perhaps I need to learn how to scrape X for tweets (?) relating to crypto. Another day, perhaps.

Yesss!!

I fixed all the bugs in the code that uses a DDQN to make trading decisions for ADA/USDT. I learned some important things such as how to properly use the gather method in PyTorch. Seems the error that was giving me the most trouble (and which caused me to research said gather method) was due to my specifying that my network had 2 outputs when it should have had 3. My bad, but a good lesson learned. Also, must admit that the results look promising. However in the past every trading strategy that ‘looked promising’ ended up losing me money, so I’m not going to fall for that again.

ETA: I haven’t done any validation on that promising backtest result, so it really doesn’t mean much. I’ll have to grab some more recent data and test out the model on that to see if it actually generalizes instead of overfitting the training data. Still, I might be a lot closer to Phases 3 and 4 of my plan than I realized in my last post.

Plan B

Or is that plan Z? Anyway, several months ago I purchased a course on Quantra about Reinforcement Learning in Trading. I found it pretty heavy going, especially as some of the explanations seemed a little ‘light on’. Well, I’ve just been going through it again (videos, text and code in the form of Jupyter Notebooks) and it makes a lot more sense now that I’ve filled in the details from other sources.

So my plan now is to recreate the template that it develops but in a simpler form. I can always add complexity later. I’ll be using PyTorch instead of Keras/Tensorflow as well, but that change should be pretty trivial now that I’m more familiar with both.

I plan to use ADA as the asset, having already settled on that for my previous plan (development of a ML assisted momentum strategy). Once again I doubt that I’ll actually use this in trading, but it’s a field I’m familiar with so I can focus on setting up the Double Deep Q Network instead of concerning myself with how that relates to the task. I feel pretty confident that I can get something up and running, with lots of scope for improving it after that, and lots of opportunity to test it out on real world data once its working. Definitely seems like a plan.

ADA

Started a newish project, implementing the approach from my course using Cardano (ADA) as the asset. Focus at the moment is using XGBoost ML algorithm to help determine whether to go long or short using a momentum strategy. I haven’t used XGBoost before, so something to learn I guess. If the strategy seems profitable, as all the strategies I’ve lost money on have, I might hazard a few digital dollars just to keep the interest going. So here’s the code to fetch the data (ADA-USD) from Yahoo Finance:

import yfinance as yf
import pandas as pd

ada_data = yf.download("ADA-USD", start="2017-01-01",  end="2024-06-02")
ada_data.index = pd.to_datetime(ada_data.index)
print(ada_data)

And here’s the beginning and end of the data received

                Open      High       Low     Close  Adj Close     Volume
Date                                                                    
2017-11-09  0.025160  0.035060  0.025006  0.032053   0.032053   18716200
2017-11-10  0.032219  0.033348  0.026451  0.027119   0.027119    6766780
2017-11-11  0.026891  0.029659  0.025684  0.027437   0.027437    5532220
2017-11-12  0.027480  0.027952  0.022591  0.023977   0.023977    7280250
2017-11-13  0.024364  0.026300  0.023495  0.025808   0.025808    4419440
...              ...       ...       ...       ...        ...        ...
2024-05-28  0.467963  0.468437  0.453115  0.456990   0.456990  418594476
2024-05-29  0.456990  0.463107  0.450914  0.450995   0.450995  350482630
2024-05-30  0.450995  0.454546  0.443807  0.446581   0.446581  356151973
2024-05-31  0.446581  0.454957  0.444461  0.447461   0.447461  290913148
2024-06-01  0.447461  0.452584  0.445254  0.449975   0.449975  167918462

[2397 rows x 6 columns]

I guess I didn’t need that Adjusted Close column, as crypto doesn’t really do dividends and splits the way equities so. Earliest date on Yahoo Finance seems to be 2017-11-09. I wonder if that really was the date Cardano went live. Or perhaps there wasn’t a USD trading pair available.

Next up: Target