Backtesting is Boring

My trading over the past five years has consistently lost me money. I’m still afloat because my long term investment has worked out a lot better. But that doesn’t require that I do anything, so boredom is a problem.

I’ve done a lot of backtesting over those years, and it hasn’t helped. I’m not convinced that ‘more thorough’ backtesting will improve matters. Fortunately I may not need to rely on trading to generate some income over the next few years.

So, how to cope with boredom? I’m thinking that learning something interesting, even if it won’t actually prove useful, might be the way to go. I’m thinking of trying to really get on top of machine learning. Not to help with trading, as far more competent traders than me have said it really doesn’t help much. But just because I’m interested.

So, I’ve started, or rather retaken, the road to frustration. Have just tried to set up a Docker image which includes Pytorch. I’ll check out which of my ebooks on Neural Networks used Pytorch instead of Keras/Tensorflow, and spend some time with that. I expect to discover all sorts of problems with running Pytorch from Docker, even though I don’t intend to use the GPU just yet. I’ll probably just end up playing more AoE.

So the book is Learning Pytorch 2.0 by Matthew Rosch, one of my many Kindle books. And Pytorch does work inside Docker, but it’s not the GPU version of course.

Time to Roll

I’ve selected 11 coins to work with initially, and have cointegration values now for 110 pairs, each of those 11 coins by each of the 10 others. I’ve checked in both directions (A/B and B/A) because I’m going to use Chris Conlan’s backtesting code and that’s set up for long only trades, so instead of shorting A/B if the spread goes up, I can long B/A because the spread will have gone down. Interestingly the ratios are not simple reciprocals of each other, due to the way the regression algorithm finds the Least Square Error (I think that’s what it’s finding, my statistics is not that far up there).

Anyway, I need to calculate the spreads for each pair, as that is what I’m actually trading. Then set up a portfolio with those spreads as the instruments, then work out some way to actually represent placing a trade on a pair, as there will be a long and a short within the pair. From experience working out how to represent the trades is a bit tricky. Perhaps I should do some research on this. The ratio between the coins tells you nothing about the actual size of one unit of each, in dollar terms.

Among my 110 pairs there are a few with quite decent p values, some even less than 0.01, although out of 100+ pairs one of these is probably purely by chance.

Fundamental Stationarity

Tutors who teach courses on pair trading generally recommend finding instruments that have some fundamental relationship, are ‘in the same business’ so to speak, as these will provide a greater chance of being cointegrated and staying that way.

Achal, an Indian guy who discusses this on a course on Udemy (course on Pinescript) uses shares in two major Indian automotive manufactures as his example. The first guy I encountered on this subject, also on Udemy, was comparing Australian, Canadian and South African commodity ETFs. People on Quantra often use a Gold ETF vs a Gold Miners ETF as an example.

So what kind of segmentation is there in the crypto space? I must admit I’ve never really looked closely at this, as development of actual use cases has always seemed a way off. But I am aware of some. There are several forks of BTC for example, although I don’t want to use BTC itself as one of my pair instruments for broader tax reasons. Smart contract platforms? I guess that includes Ethereum and Cardano, amongst others. Oracles? I held LINK for quite a while, now there’s also iExec and BAND I think. DeFi? That seems pretty big and I must admit I haven’t really looked into it too much. Plus I’ve heard of Layer 1 and Layer 2 solutions, not too sure what those are.

A while back I used Unsupervised Learning to identify clusters amongst various coins without too much success. I did identify some clusters, but couldn’t find any pairs within a cluster worth trading from a cointegration point of view. Maybe this is another area where the whole crypto space needs time to become more mature. As I’ve mentioned before, it’s still pretty much the wild west.

Look Ahead Bias

An issue frequently discussed in Machine Learning is look ahead bias, or training a model which has access to data from beyond the training period. I plan to use a training set for the backtest, and a test set for validation, rather than training (backtesting in this case) on all available data and then validating by paper trading (a form of forward testing).

So look ahead bias could be introduced by checking for cointegration on the whole data series. If I’m backtesting on, say, three out of four years of data, checking for cointegration on all the data involves using data that did not exist during the training data set. Any statistics calculated, even just a mean or std, will be calculated on data that should not have been available.

So, to be rigorous, I should probably split my data into the training period (I’ll probably use 2020 – 2022), a test period (2023 – current) and save them as separate files so there is no possibility of introducing look ahead bias. If backtesting on the training period produces a strategy, or more importantly the parameters for the strategy, since I’ll be using mean reversion, that continues to work with the following year + of data, then perhaps it will continue to work a while longer.

With all the ‘cointegration’ that I’ve encountered that turns out not to be, it raises the question of whether there are any pairs that exhibit persistent stationarity, and not just the ephemeral variety. By using only mature coins (ADA, EOS, LTC, etc) and testing over long periods, and then checking for ongoing stationarity over further relatively long periods I hope to find a few pairs that are a little more robust than most of what I’ve encountered so far. No doubt there will be very few such pairs, so diversification will not be an option, but clearly something has to give.

The coins I’m starting with are ADA, ALGO, ATOM, EOS, ETC and LTC. I’ll probably expand this list after initial testing, but hopefully I’ll find something out of that batch. I’ve decided to use 6hr data beginning Jan 1st 2020. Something to get started with.

Persistent Stationarity

Another chart that illustrates the problem that I’m having with pair trading, like the image I posted a couple of days ago. The pair is stationary by two tests for cointegration to the 99% level, on a year of daily data. Then it diverges significantly, right after I place a trade. So just how persistent is this quality called stationarity?

I’m looking for a strategy that will work in any market. When everything is trending up I just buy and hold. I guess if everything is trending down I could just sell (short) and hold. But I want to be able to generate some income when nothing much is happening, which is actually most of the time in most markets. So pair trading seems ideal, and as I’ve mentioned before with a long and a short it pays for itself and also includes some hedge against market crash (I believe that’s called systemic risk).

Problem is it seems to be an illusion, like seeing pictures in clouds. Try to rely on it (i.e. trade) and it disappears. So what to do? Many tutors on quantitative trading use equities or commodity ETFs as their examples, usually with 10 or more years of data. But crypto is a fairly rapidly changing field, and I’m constantly aware of the possibility of ‘regime changes’, one meaning of which is that patterns that have worked in the past no longer apply. Some people use it in a more limited sense of going from trending to range-bound, but it seems that a more general change in the main drivers in a market is a more useful concept. Anyway, perhaps one solution would be to use a longer timeframe, in which case the number of available pairs will be reduced (many coins are fairly new to Binance Cross Margin wallet) but if the results are more reliable, then well and good. Of course that would reduce diversification, but again, if it works better…

Another issue to consider, especially with regard to backtesting, is what timeframe (as in daily, hourly, etc) to get the data. As backtesting usually uses close prices perhaps something like 4 hourly would be good. I can monitor markets more than once a day, or even set up a bot. More data usually means more reliable results, and 4 hourly gives me six times as much data as daily.

Having had a re-read of Chris Conlan’s book (although it warrants some further sutdy) I’m at the point of actually doing some backtesting. So what instrument to start with? I’m not good at making such decisions. But maybe I should start with downloading 4 hour data from a selection of long-standing coins on Binance Cross Margin (I use USDT pairs) and run some cointegration tests on that. Shorter sampling timeframe, longer total timeframe, established coins. Way to go.

ETA: The most efficient way to download data that I have found is to export a chart from TradingView. All the data shown on the screen will be exported, including OHLC and Vol if it is visible, and any indicators visible on the chart. By scrolling out I managed to get four years of 6 hour data in one file, and then several others (each in own file) by simply selecting a different coin from the watchlist and exporting again. Date is in Unix timestamp format, easy to convert to pandas datetime in python. Downloading kline data from the Binance REST API is more tedious due to the 1000 candle limit per download.

Debugger

I’m reading through the Simulator code (from Conlan’s book) and trying to work out what it actually does and how it actually works. I’ll need this understanding to customise it to my own needs. I find the best way to do this is use the debugger, to pause execution after a couple of lines and check what has actually been produced (lists, DataFrames, etc) and what their structure is. Seeing the result of the code makes it a lot easier to understand what it’s doing. I like to think I’m an abstract thinker but I must admit seeing concrete results improves my understanding enormously. I’m frequently confronted with evidence that I’m not as smart as I like to think I am. Or is it just age, with cognitive abilities on the decline?

Later…

Chris Conlan certainly does things differently from all other tutors on quantitative trading that I’ve seen. For example creating a dataframe of trading signals (-1, o, 1) on 10 years of data for 100 tickers using a Bollinger Band strategy, without leaving any trace of the Bollinger Bands remaining. The whole thing is done in a function that takes a series as input, creates the bands as local variables, and just returns the series of signals. And this function is just apply -ed to the whole df of prices, where each column is one ticker, and voila! I must try it out on a much smaller dataset to make sure that I know how it works, but certainly very interesting.

Extremes

Extremes are not uncommon in crypto. This chart has Bollinger Bands at 3, 4 and 5 SDs from the mean (360 day lookback) and the chart has gone past 5 SDs three times in the past three months. Perhaps I should be looking to trade these extreme moves rather than the regular moves (enter at 2SD, exit at the mean). Of course there’s the problem that any coin that makes such a move is no longer available for shorting as the lending pool as gone to zero. Moves like this are nearly always because one coin in the pair has pumped, rather than a catastrophic drop in the other coin. However if I borrow at 2 SDs (which I usually could) but wait to short until it moves a lot further (if it does) then I might start to make some profit.

Trading Analysis

I’ve been re-viewing a Udemy Course Trading Tactics by Triumph at Trading. It’s all about backtesting. Actually it’s not ALL about backtesting, just why you should do it, some best practices, and a handy spreadsheet to keep track of and analyze results. He doesn’t even discuss specific strategies. So basically it’s about analysing backtest results by various metrics.

So it got me thinking, since I’m seriously reviewing my approach to backtesting before launching into a variety of strategies. Many courses I’ve done, on Udemy and elsewhere (Quantra being the other main site I use) have backtesting and analysis code provided, but it’s all a bit haphazard. Some time ago I bought a book by Chris Conlan called Algorithmic Trading with Python which includes some fairly well structured code for analysing individual backtests and entire portfolios. He adopts an OO approach too, which is a bit rare with Python programmers but is very familiar to me since I taught Java for 16 years! He even uses type hinting for everything. So if I’m going to use some prepackaged code for analysis I might just go with that. Anyway, time to review the book. Actually that’s a bit of a challenge since most of my books are still in boxes after moving here six months ago, and I don’t really want to unpack them all looking for that one book. I can just review all the code examples and refresh my memory of Compound Annual Growth Rate (CAGR) and Sortino Ratio, and all those other metrics so beloved of financial analysts. The book basically explains the concepts and then acts as a manual for his codebase. If I get stuck with the code I guess I can just start unpacking those boxes.

So, I’ve opened the directory containing all the code examples and files for his ‘application’ as a project in PyCharm, assigned an appropriate Docker image as a remote interpreter, and away we go. I’m glad I’ve sorted out my environment/package management issues at last, hoping I don’t run into further problems with that.

ETA: I decided I need the book after looking at the code. Luckily I found it in the second (of eight) boxes, so not too many piles of books spread around on the floor. I think I’ll need to write a couple of classes implementing the strategies I want to use, but that should be fairly straight-forward. I’m not really sure what might be different with the backtesting this time round, compared with all those previous times when I got good backtest results but poor actual results. I’m probably just intending to do the same thing as before but expect a different result. Not very smart.

BackTesting

Working through my taxes for the last financial year I find myself constantly asking myself why I made the trades I did. What strategy was I using? I haven’t been keeping the kinds of record that some tutors recommend. Which brings me to something very fundamental to trading strategies – backtesting.

I’ve done a lot of backtesting in the past. All by the book, coming up with excellent Sharpe ratios and significant projected profits. But when I actually start trading, not so much. In fact I’ve become convinced that backtesting, at least in crypto, is a complete waste of time.

And it’s not just backtesting. My current strategy is a standard stat arb mean reversion, using pairs that are cointegrated at the 99% level by two different tests on the past year of data. I enter a trade when it’s 2 SDs from the mean, only to have it move more than 4 SDs from the mean almost immediately, in the wrong direction. And not just one pair. I’ve decided to diversify as much as the available pairs will allow, and out of 20 such pairs about 16 of them have exhibited the above behaviour, while the profits I make on the few properly behaved pairs are tiny in comparison. Some of my pairs have been running for months, waiting for mean reversion. Well perhaps the losses I eventually make will allow for some offsets of my BTC profits on my next tax return.

Anyway, I’m wondering if I’m not doing backtesting right. More likely I’m not doing my strategy right. Perhaps I should be including stops, and use backtesting to determine what the best level for a stop is. But given that future behaviour is completely unrelated to past behaviour in nearly everything I trade, is this even a worthwhile exercise?

Crypto is a wild space. Nearly everything is speculative. My overall hope is that as the crypto space matures it will move closer to more traditional asset classes, and approaches that have worked for, say, equities, will eventually start working for crypto as well. Perhaps I should stick with the most mature crypto projects. In the interests of diversity I’m trading a lot of coins that I hadn’t heard of before this current iteration of my strategy. There are a lot of coins available on Binance Cross Margin that weren’t there a year ago. Looks like that might be a mistake.

Documenting the Process

My current attempts at pair trading are not working so well, and I’m intending to make one more serious effort to somehow harness ML to help with that. It might be a good idea to actually document this process to help keep me on track, and this is as good a place as any to do that.

Over the last couple of posts I mentioned a book by Jason Brownlee of machinelearningmastery.com – Deep Learning Time Series Forecasting. Jason follows a rigorous process of incremental improvement in performance by starting with classical models (of supervised learning) and then trying to improve on ‘the best so far’ with other models, such as various deep learning algorithms. He actually has another book on time series forecasting that does not include (primarily) deep learning, called Time Series Forecasting with Python, so I’ve decided to review that first so I can better follow his incremental improvement approach. I don’t remember much about ARIMA, for example, which he described at length in that earlier work.

I’ll probably stick with pair trading because the whole stationarity thing is easier for any model to work with. Also pair trading has the huge advantage that it’s nearly cost neutral, the shorts pay for the longs, and I don’t need to actually invest any further capital, just use the margin I already have. The general context for this is trading crypto in the Binance cross margin wallet, using my BTC as margin (collateral).

Anyway, I first need to spend some time sorting out my tax from last financial year, so progress on the ML might be a bit slow.

I’m Going to Need a Better Computer

Reading through that ebook I mentioned in my previous post, I’m once again reminded that machine learning is an experimental science. The general approach to solving a problem is to throw everything at it and see what works best. And by ‘everything’ I mean multiple variations of multiple algorithms within multiple families, plus a few ensemble solutions for dessert.

That’s a lot of processing. My most powerful PC is out of action. It has an RTX2080Ti graphics card which, while a bit old now, is still well regarded for ML processing. I’m thinking of building a new machine. In the past I’ve had a couple of machines built by friends who are more hardware savvy than I am, including the Linux box I’m currently using. Over the years I have built a lot of different things (not computers), some of them quite challenging. It should be easy.

So, build a dedicated ML PC. It won’t be for a while because I still have so much to learn. And I’m getting older. Maybe it will happen one day. Or perhaps I can just get someone to do a custom buiid for me. Do I really need to do it myself? I’m getting to that stage in my life where it’s easier just to pay someeone else to do things for me. Pity.

Reading a bit further, Jason suggest cloud-based resources, such as AWS. Perhaps I should give that serious consideration.

Hidden Gem?

I’ve been clearing out old emails and came across one that I had completely forgotten about – receipt and download link for Deep Learning Time Series Forecasting by Jason Brownlee, of machinelearningmastery.com. I have several of Jason’s books (pdfs plus code) and I find him very readable and usually pitched at just the right level for me. I like his style.

So this one is about using deep learning for time series forecasting, obviously, with an emphasis on CNNs and RNNs for supervised learning problems, plus some hybrid systems. He’s vey big on data preparation, and I must admit this is an area I struggle with. For trading there are so many possible inputs, and Ernie Chan has even suggested a system that uses hundreds or thousands of inputs, without however being very specific about what those inputs are.

So, another detour. I’ll put the unsupervised learning study aside for a while and have a quick read through this book. Perhaps it will give me some ideas, and is at least directly relevant to my main concern, ie time series forecasting.