I’ve read a couple of chapters and skimmed through the rest of Deep Learning for Time Series Cookbook by Vitor Cerqueira and Luis Roque and I must admit I’m impressed. It fits my needs both in content and style, so I intend to study it for a few months instead of just jumping around from book to online course like I usually do. I’m not sure that I’ll ever be able to use ML to help with trading, in fact I’ve given up short term trading altogether except for a couple of stat arb pairs that I intend to close over the next day or two. However I am more interested in time series problems as a result of my trading, so best to go where the interest takes me.
The whole concept of neural networks is quite fascinating. Just using a bunch of weights and a bunch of biases, given a random starting position (values of the above), a way to measure how good predictions are (loss function), and a way to adjust the numbers to improve the predictions (gradient descent), an optimal solution can be found. Sometimes. Anyway, it gives me something to do with my time.
A while back I rediscovered Jason Brownlee’s book on deep learning for time series forecasting, however he uses Keras/Tensorflow for his examples and I’ve decided to stick with PyTorch, hence this new book that has become the focus of my study.
The code for the book I’m currently reading is on GitHub, in the form of py files. For some reason I can’t just get the code by pointing Google Colab at the GitHub repo, like I did with the last book – I get a message that there are ‘no results’ despite many repos being listed under PacktPublishing. So the process to get one of these repos into Google Colab is as follows:
from google.colab import drive
drive.mount('/content/drive')
Clone the repo into Google Drive using the script (notebook) above (I have it in the Notebooks directory).
Create a new notebook from the appropriate directory
By clicking on the directory icon (left panel) then Add Drive button at the top of the directory list, add Google Drive to the directories available to the notebook
Google drive is found under the ‘content’ directory
To access data from the drive use the directory listing on the left to navigate to the data directory (somewhere in the cloned repo) and right click to copy path
Use the path to access the data from the new notebook
Perhaps there’s an easier way, but this works. Because they are py files in this case I have to open the py file and copy/paste the contents into the new notebook in order to run it.
Actually I probably don’t need to run that first block of code since I can mount Google Drive with the click of a button. So I should only need to run the bash commands to clone the repo. Must try that with the next repo I’m working from.
I’ve always preferred to do things on my own computer but I’m reaching the point where cloud-based solutions just might be a better fit. The book I’m currently reading is Deep Learning with PyTorch Step by Step by Daniel Voigt Godoy. He has all the code for his book on GitHub in the form of Jupyter Notebooks, and all the data of course. One option to run this is just to point Google Colab at his repository and voila! My current internet plan is not unlimited data, and working with ML frequently requires installation of big packages and downloading large datasets, so using Google Colab, especially as it can access the repo directly without my having to download the files and then upload them to Google Colab, is a huge plus. And I can easily use GPUs on Google Colab. I’m definitely warming to the whole deal. I might not want to do it to code I develop myself, but in a throwaway situation like working through some book examples, why not?
So I skimmed a lot of that book (by Matthew Rosch). Actually I’ve worked through it before. It has quite a nice complete worked example at the end of a Linear Regression problem. Also, after describing the basic steps in setting u a feed forward network (multi layer perceptron) he also describes RNN, CNN and a couple of variants, which I’ll consult if I’m ever working with one of those.
Somewhere he writes that to solve a problem with a neural network requires intuition, domain knowledge, and experience. I guess that cuts me out of most use cases. Oh, and lots of trial and error of course. As with all things ML. Still, I’ll explore a little further. I haven’t seen any guidelines on the architecture of a neural network, how many nodes, when to use what activation function. All just trial and error I guess. No hints? Oh well…
Is it because of April Fools Day (1st April) or am I really stupid? I have constant issues with axes in Python. In pandas and, apparently, pytorch, axis 0 is supposed to refer to rows, and axis 1 to columns in a 2D data structure. But these are often reversed, or so it seems to me. For instance is the following code
Now the book says “Concatenate the tensors along dimension 0”. Dimension 0 is supposed to be the rows, so I’m imagining that this will create two rows, one for each tensor. But no, the output is as follows:
tensor([1, 2, 3, 4, 5, 6])
So it concatenated by adding columns, which I thought was supposed to be dimension 1. I’m sure there’s some point of view from which this all makes sense, but clearly I’m not seeing it from that point of view.
Actually he didn’t say ‘axis 0’ but ‘dimension 0’. So in a data structure with only one dimension, dimension 0, one would be simply adding more elements to that dimension. But in a 2D data structure, does axis and dimension mean the same thing? I’m clearly not a mathematician.
My trading over the past five years has consistently lost me money. I’m still afloat because my long term investment has worked out a lot better. But that doesn’t require that I do anything, so boredom is a problem.
I’ve done a lot of backtesting over those years, and it hasn’t helped. I’m not convinced that ‘more thorough’ backtesting will improve matters. Fortunately I may not need to rely on trading to generate some income over the next few years.
So, how to cope with boredom? I’m thinking that learning something interesting, even if it won’t actually prove useful, might be the way to go. I’m thinking of trying to really get on top of machine learning. Not to help with trading, as far more competent traders than me have said it really doesn’t help much. But just because I’m interested.
So, I’ve started, or rather retaken, the road to frustration. Have just tried to set up a Docker image which includes Pytorch. I’ll check out which of my ebooks on Neural Networks used Pytorch instead of Keras/Tensorflow, and spend some time with that. I expect to discover all sorts of problems with running Pytorch from Docker, even though I don’t intend to use the GPU just yet. I’ll probably just end up playing more AoE.
So the book is Learning Pytorch 2.0 by Matthew Rosch, one of my many Kindle books. And Pytorch does work inside Docker, but it’s not the GPU version of course.
I’ve selected 11 coins to work with initially, and have cointegration values now for 110 pairs, each of those 11 coins by each of the 10 others. I’ve checked in both directions (A/B and B/A) because I’m going to use Chris Conlan’s backtesting code and that’s set up for long only trades, so instead of shorting A/B if the spread goes up, I can long B/A because the spread will have gone down. Interestingly the ratios are not simple reciprocals of each other, due to the way the regression algorithm finds the Least Square Error (I think that’s what it’s finding, my statistics is not that far up there).
Anyway, I need to calculate the spreads for each pair, as that is what I’m actually trading. Then set up a portfolio with those spreads as the instruments, then work out some way to actually represent placing a trade on a pair, as there will be a long and a short within the pair. From experience working out how to represent the trades is a bit tricky. Perhaps I should do some research on this. The ratio between the coins tells you nothing about the actual size of one unit of each, in dollar terms.
Among my 110 pairs there are a few with quite decent p values, some even less than 0.01, although out of 100+ pairs one of these is probably purely by chance.
Tutors who teach courses on pair trading generally recommend finding instruments that have some fundamental relationship, are ‘in the same business’ so to speak, as these will provide a greater chance of being cointegrated and staying that way.
Achal, an Indian guy who discusses this on a course on Udemy (course on Pinescript) uses shares in two major Indian automotive manufactures as his example. The first guy I encountered on this subject, also on Udemy, was comparing Australian, Canadian and South African commodity ETFs. People on Quantra often use a Gold ETF vs a Gold Miners ETF as an example.
So what kind of segmentation is there in the crypto space? I must admit I’ve never really looked closely at this, as development of actual use cases has always seemed a way off. But I am aware of some. There are several forks of BTC for example, although I don’t want to use BTC itself as one of my pair instruments for broader tax reasons. Smart contract platforms? I guess that includes Ethereum and Cardano, amongst others. Oracles? I held LINK for quite a while, now there’s also iExec and BAND I think. DeFi? That seems pretty big and I must admit I haven’t really looked into it too much. Plus I’ve heard of Layer 1 and Layer 2 solutions, not too sure what those are.
A while back I used Unsupervised Learning to identify clusters amongst various coins without too much success. I did identify some clusters, but couldn’t find any pairs within a cluster worth trading from a cointegration point of view. Maybe this is another area where the whole crypto space needs time to become more mature. As I’ve mentioned before, it’s still pretty much the wild west.
An issue frequently discussed in Machine Learning is look ahead bias, or training a model which has access to data from beyond the training period. I plan to use a training set for the backtest, and a test set for validation, rather than training (backtesting in this case) on all available data and then validating by paper trading (a form of forward testing).
So look ahead bias could be introduced by checking for cointegration on the whole data series. If I’m backtesting on, say, three out of four years of data, checking for cointegration on all the data involves using data that did not exist during the training data set. Any statistics calculated, even just a mean or std, will be calculated on data that should not have been available.
So, to be rigorous, I should probably split my data into the training period (I’ll probably use 2020 – 2022), a test period (2023 – current) and save them as separate files so there is no possibility of introducing look ahead bias. If backtesting on the training period produces a strategy, or more importantly the parameters for the strategy, since I’ll be using mean reversion, that continues to work with the following year + of data, then perhaps it will continue to work a while longer.
With all the ‘cointegration’ that I’ve encountered that turns out not to be, it raises the question of whether there are any pairs that exhibit persistent stationarity, and not just the ephemeral variety. By using only mature coins (ADA, EOS, LTC, etc) and testing over long periods, and then checking for ongoing stationarity over further relatively long periods I hope to find a few pairs that are a little more robust than most of what I’ve encountered so far. No doubt there will be very few such pairs, so diversification will not be an option, but clearly something has to give.
The coins I’m starting with are ADA, ALGO, ATOM, EOS, ETC and LTC. I’ll probably expand this list after initial testing, but hopefully I’ll find something out of that batch. I’ve decided to use 6hr data beginning Jan 1st 2020. Something to get started with.
Another chart that illustrates the problem that I’m having with pair trading, like the image I posted a couple of days ago. The pair is stationary by two tests for cointegration to the 99% level, on a year of daily data. Then it diverges significantly, right after I place a trade. So just how persistent is this quality called stationarity?
I’m looking for a strategy that will work in any market. When everything is trending up I just buy and hold. I guess if everything is trending down I could just sell (short) and hold. But I want to be able to generate some income when nothing much is happening, which is actually most of the time in most markets. So pair trading seems ideal, and as I’ve mentioned before with a long and a short it pays for itself and also includes some hedge against market crash (I believe that’s called systemic risk).
Problem is it seems to be an illusion, like seeing pictures in clouds. Try to rely on it (i.e. trade) and it disappears. So what to do? Many tutors on quantitative trading use equities or commodity ETFs as their examples, usually with 10 or more years of data. But crypto is a fairly rapidly changing field, and I’m constantly aware of the possibility of ‘regime changes’, one meaning of which is that patterns that have worked in the past no longer apply. Some people use it in a more limited sense of going from trending to range-bound, but it seems that a more general change in the main drivers in a market is a more useful concept. Anyway, perhaps one solution would be to use a longer timeframe, in which case the number of available pairs will be reduced (many coins are fairly new to Binance Cross Margin wallet) but if the results are more reliable, then well and good. Of course that would reduce diversification, but again, if it works better…
Another issue to consider, especially with regard to backtesting, is what timeframe (as in daily, hourly, etc) to get the data. As backtesting usually uses close prices perhaps something like 4 hourly would be good. I can monitor markets more than once a day, or even set up a bot. More data usually means more reliable results, and 4 hourly gives me six times as much data as daily.
Having had a re-read of Chris Conlan’s book (although it warrants some further sutdy) I’m at the point of actually doing some backtesting. So what instrument to start with? I’m not good at making such decisions. But maybe I should start with downloading 4 hour data from a selection of long-standing coins on Binance Cross Margin (I use USDT pairs) and run some cointegration tests on that. Shorter sampling timeframe, longer total timeframe, established coins. Way to go.
ETA: The most efficient way to download data that I have found is to export a chart from TradingView. All the data shown on the screen will be exported, including OHLC and Vol if it is visible, and any indicators visible on the chart. By scrolling out I managed to get four years of 6 hour data in one file, and then several others (each in own file) by simply selecting a different coin from the watchlist and exporting again. Date is in Unix timestamp format, easy to convert to pandas datetime in python. Downloading kline data from the Binance REST API is more tedious due to the 1000 candle limit per download.
I’m reading through the Simulator code (from Conlan’s book) and trying to work out what it actually does and how it actually works. I’ll need this understanding to customise it to my own needs. I find the best way to do this is use the debugger, to pause execution after a couple of lines and check what has actually been produced (lists, DataFrames, etc) and what their structure is. Seeing the result of the code makes it a lot easier to understand what it’s doing. I like to think I’m an abstract thinker but I must admit seeing concrete results improves my understanding enormously. I’m frequently confronted with evidence that I’m not as smart as I like to think I am. Or is it just age, with cognitive abilities on the decline?
Later…
Chris Conlan certainly does things differently from all other tutors on quantitative trading that I’ve seen. For example creating a dataframe of trading signals (-1, o, 1) on 10 years of data for 100 tickers using a Bollinger Band strategy, without leaving any trace of the Bollinger Bands remaining. The whole thing is done in a function that takes a series as input, creates the bands as local variables, and just returns the series of signals. And this function is just apply -ed to the whole df of prices, where each column is one ticker, and voila! I must try it out on a much smaller dataset to make sure that I know how it works, but certainly very interesting.
Extremes are not uncommon in crypto. This chart has Bollinger Bands at 3, 4 and 5 SDs from the mean (360 day lookback) and the chart has gone past 5 SDs three times in the past three months. Perhaps I should be looking to trade these extreme moves rather than the regular moves (enter at 2SD, exit at the mean). Of course there’s the problem that any coin that makes such a move is no longer available for shorting as the lending pool as gone to zero. Moves like this are nearly always because one coin in the pair has pumped, rather than a catastrophic drop in the other coin. However if I borrow at 2 SDs (which I usually could) but wait to short until it moves a lot further (if it does) then I might start to make some profit.