Trading Analysis

I’ve been re-viewing a Udemy Course Trading Tactics by Triumph at Trading. It’s all about backtesting. Actually it’s not ALL about backtesting, just why you should do it, some best practices, and a handy spreadsheet to keep track of and analyze results. He doesn’t even discuss specific strategies. So basically it’s about analysing backtest results by various metrics.

So it got me thinking, since I’m seriously reviewing my approach to backtesting before launching into a variety of strategies. Many courses I’ve done, on Udemy and elsewhere (Quantra being the other main site I use) have backtesting and analysis code provided, but it’s all a bit haphazard. Some time ago I bought a book by Chris Conlan called Algorithmic Trading with Python which includes some fairly well structured code for analysing individual backtests and entire portfolios. He adopts an OO approach too, which is a bit rare with Python programmers but is very familiar to me since I taught Java for 16 years! He even uses type hinting for everything. So if I’m going to use some prepackaged code for analysis I might just go with that. Anyway, time to review the book. Actually that’s a bit of a challenge since most of my books are still in boxes after moving here six months ago, and I don’t really want to unpack them all looking for that one book. I can just review all the code examples and refresh my memory of Compound Annual Growth Rate (CAGR) and Sortino Ratio, and all those other metrics so beloved of financial analysts. The book basically explains the concepts and then acts as a manual for his codebase. If I get stuck with the code I guess I can just start unpacking those boxes.

So, I’ve opened the directory containing all the code examples and files for his ‘application’ as a project in PyCharm, assigned an appropriate Docker image as a remote interpreter, and away we go. I’m glad I’ve sorted out my environment/package management issues at last, hoping I don’t run into further problems with that.

ETA: I decided I need the book after looking at the code. Luckily I found it in the second (of eight) boxes, so not too many piles of books spread around on the floor. I think I’ll need to write a couple of classes implementing the strategies I want to use, but that should be fairly straight-forward. I’m not really sure what might be different with the backtesting this time round, compared with all those previous times when I got good backtest results but poor actual results. I’m probably just intending to do the same thing as before but expect a different result. Not very smart.

BackTesting

Working through my taxes for the last financial year I find myself constantly asking myself why I made the trades I did. What strategy was I using? I haven’t been keeping the kinds of record that some tutors recommend. Which brings me to something very fundamental to trading strategies – backtesting.

I’ve done a lot of backtesting in the past. All by the book, coming up with excellent Sharpe ratios and significant projected profits. But when I actually start trading, not so much. In fact I’ve become convinced that backtesting, at least in crypto, is a complete waste of time.

And it’s not just backtesting. My current strategy is a standard stat arb mean reversion, using pairs that are cointegrated at the 99% level by two different tests on the past year of data. I enter a trade when it’s 2 SDs from the mean, only to have it move more than 4 SDs from the mean almost immediately, in the wrong direction. And not just one pair. I’ve decided to diversify as much as the available pairs will allow, and out of 20 such pairs about 16 of them have exhibited the above behaviour, while the profits I make on the few properly behaved pairs are tiny in comparison. Some of my pairs have been running for months, waiting for mean reversion. Well perhaps the losses I eventually make will allow for some offsets of my BTC profits on my next tax return.

Anyway, I’m wondering if I’m not doing backtesting right. More likely I’m not doing my strategy right. Perhaps I should be including stops, and use backtesting to determine what the best level for a stop is. But given that future behaviour is completely unrelated to past behaviour in nearly everything I trade, is this even a worthwhile exercise?

Crypto is a wild space. Nearly everything is speculative. My overall hope is that as the crypto space matures it will move closer to more traditional asset classes, and approaches that have worked for, say, equities, will eventually start working for crypto as well. Perhaps I should stick with the most mature crypto projects. In the interests of diversity I’m trading a lot of coins that I hadn’t heard of before this current iteration of my strategy. There are a lot of coins available on Binance Cross Margin that weren’t there a year ago. Looks like that might be a mistake.

Documenting the Process

My current attempts at pair trading are not working so well, and I’m intending to make one more serious effort to somehow harness ML to help with that. It might be a good idea to actually document this process to help keep me on track, and this is as good a place as any to do that.

Over the last couple of posts I mentioned a book by Jason Brownlee of machinelearningmastery.com – Deep Learning Time Series Forecasting. Jason follows a rigorous process of incremental improvement in performance by starting with classical models (of supervised learning) and then trying to improve on ‘the best so far’ with other models, such as various deep learning algorithms. He actually has another book on time series forecasting that does not include (primarily) deep learning, called Time Series Forecasting with Python, so I’ve decided to review that first so I can better follow his incremental improvement approach. I don’t remember much about ARIMA, for example, which he described at length in that earlier work.

I’ll probably stick with pair trading because the whole stationarity thing is easier for any model to work with. Also pair trading has the huge advantage that it’s nearly cost neutral, the shorts pay for the longs, and I don’t need to actually invest any further capital, just use the margin I already have. The general context for this is trading crypto in the Binance cross margin wallet, using my BTC as margin (collateral).

Anyway, I first need to spend some time sorting out my tax from last financial year, so progress on the ML might be a bit slow.

I’m Going to Need a Better Computer

Reading through that ebook I mentioned in my previous post, I’m once again reminded that machine learning is an experimental science. The general approach to solving a problem is to throw everything at it and see what works best. And by ‘everything’ I mean multiple variations of multiple algorithms within multiple families, plus a few ensemble solutions for dessert.

That’s a lot of processing. My most powerful PC is out of action. It has an RTX2080Ti graphics card which, while a bit old now, is still well regarded for ML processing. I’m thinking of building a new machine. In the past I’ve had a couple of machines built by friends who are more hardware savvy than I am, including the Linux box I’m currently using. Over the years I have built a lot of different things (not computers), some of them quite challenging. It should be easy.

So, build a dedicated ML PC. It won’t be for a while because I still have so much to learn. And I’m getting older. Maybe it will happen one day. Or perhaps I can just get someone to do a custom buiid for me. Do I really need to do it myself? I’m getting to that stage in my life where it’s easier just to pay someeone else to do things for me. Pity.

Reading a bit further, Jason suggest cloud-based resources, such as AWS. Perhaps I should give that serious consideration.

Hidden Gem?

I’ve been clearing out old emails and came across one that I had completely forgotten about – receipt and download link for Deep Learning Time Series Forecasting by Jason Brownlee, of machinelearningmastery.com. I have several of Jason’s books (pdfs plus code) and I find him very readable and usually pitched at just the right level for me. I like his style.

So this one is about using deep learning for time series forecasting, obviously, with an emphasis on CNNs and RNNs for supervised learning problems, plus some hybrid systems. He’s vey big on data preparation, and I must admit this is an area I struggle with. For trading there are so many possible inputs, and Ernie Chan has even suggested a system that uses hundreds or thousands of inputs, without however being very specific about what those inputs are.

So, another detour. I’ll put the unsupervised learning study aside for a while and have a quick read through this book. Perhaps it will give me some ideas, and is at least directly relevant to my main concern, ie time series forecasting.

Sleeping on it

After sleeping on it I’ve decided to stick with my original resolve. The book I’m working through is Hands-On Unsupervised Learning using Python by Ankur Patel (an O’Reilly book) via Kindle. At least with unsupervised learning I don’t have to make decisions about what to use as targets (classes, labels, etc), and with PCA I can relax a bit about my feature selection/engineering. All very good for a person who has trouble making decisions (and sticking with them). So, a good book, an IDE and environment setup that works for this kind of stuff, example code downloads as usual with this kind of book, and I’m set for a couple of months.

or Third…

I mentioned in a previous post that I had a Quantra course on Unsupervised Learning in Trading. Fact is that it attempted to cluster different financial instruments, with a view to pair trading instruments in the same cluster. It is generally recommended that one pair trade with instruments that have some fundamental feature in common. However I was unable to group crypto instruments in any way that produced good results. Sharpe ratios from backtesting were abysmal. I did try.

So, should I try again? Can I improve my understanding of unsupervised learning to the point where I can achieve useful results? Seems I’ve failed too many times to achieve any significant results over the past few years. What to do?

On Second Thought…

I’m already questioning my decision from the previous post, to focus on Unsupervised Learning. In the back of my mind is the advice from Ernie Chan, that ML is most useful for metalabelling – deciding whether to trade (or not) where actual signals have come from a non-ML strategy. I guess this is a supervised classification problem. My new book on Unsupervised Learning has started with a chapter on Supervised Learning, which is what has reminded me of all this.

What I’m actually thinking of is some kind of weird hybrid system, where a classification of ‘good time to trade’ is in fact a buy signal for whatever data feed into the trained model. So combining strategy with metalabelling. What could go wrong?

Focus

I frequently feel overwhelmed by the sheer volume of tutorial material I now have. I need to focus on one thing, and stick with it long enough that I don’t need to come back to it again. So for better or worse I’ve decided to focus on unsupervised learning. I even bought a shiny new (Kindle) book despite already having a couple on this subject. Retail therapy.

So this one uses Tensorflow for it’s examples, and I set up a new Docker environment with Tensorflow and the standard data science libraries. All the code files that come with the book are in Jupyter Notebook format, and I don’t like that much because I can’t easily inspect the values of variables (and check the types) as I go. Currently my PyCharm is not set up to run notebooks from Docker. I believe it’s possible (something about exposing ports) but not a high priority for me at the moment. I can just use Google Colab if I really want to run the notebooks.

Anyway, it’s pretty easy to open the notebook in PyCharm even if I can’t run it there, and copy the code into a py file with minor adjustments where needed to get that to work, such is inserting print statements rather than just having the notebook print the last returned value in a cell.

Will Unsupervised Learning help me with my trading? Who can say? I do actually have a Quantra course on that very subject, but I’m hoping to gain a more holistic view of unsupervised learning so that I can explore it in more depth.

Docker

When I started programming in Python about four years ago I used the Anaconda distribution for package and environment management. This worked well most of the time, but I had some issues, notably with TA-lib. I gradually discovered that there weren’t as many packages (or specific versions of packages) available through conda as there were on pypi using pip. While some people used both conda and pip with gay abandon, I did see some warnings not to do that. Very confusing. Perhaps the issue was related to using Spyder as my IDE. It had to be installed in each environment, and perhaps had problems with pip-installed packages. I switched to PyCharm as an IDE but still had problems once I started on Machine Learning, especially confilcts between PyTorch/TensorFlow and CPU/GPU installs.

Lately I’ve been using Docker for environment management, which seems to work well although I’m not attempting to use GPU versions of the ML libraries. Biggest problem is that one can’t install new packages in an image without rebuilding the image. Anyway process at the moment is to use a Docker container as a remote interpreter, with the actual python project (or tutorial) files in some directory outside the container. I have to use PyCharm Pro for this, as the Community Edition doesn’t support Docker based remote interpreters.

Giving Up on Pair Trading

Over the past few months I’ve tried to tighten up my pair trading. Firstly I’m picking pairs that are cointegrated at 99% confidence level by two different tests, and then by visual inspection of charts reach trigger conditions and actually revert to mean often enough to be worth trading. Also I’m using many small position sizes (approx $500USD each long and short). Most such pairs go way off pattern as soon as I put a trade on, and usually don’t return again. At least with the smaller position sizes I’m not losing too much, but it’s certainly time to reconsider using this strategy. Pity, as I’ve put a lot of work into it (and money) over the past few years.

Finding the Path

I’ve accumulated a lot of instructional material (for Python) over the past couple of years, and it’s proving a challenge to develop a doable self instruction course. I don’t want to waste time on stuff I mostly know already, and also don’t want to get bogged down in material that is a bit too challenging at my current skill level. Also should I concentrate on pure Python programming, or on trading strategies, using Python to help make decisions? And what about web scraping or GUI development, that might help getting information from the world at large, and organizing my scripts in an easy-to-use graphic interface?

Anyway, I’ve recently decided to spend time with an actual book, Python Cookbook by David Beazley and Brian Jones. It’s not specifically geared towards trading, or even data science, but seems a good fit to improve my basic understanding of Python. The issue that I face is not that I need to develop sophisticated code, but that I need to be able to understand the sometimes quite sophisticated code that some authors/trainers/course creators use in their examples. So I need to be able to read code at a much higher level than I am ever likely to write.

I’ve pretty much decided to spend nearly all my time on this enterprise. I’ll be moving soon to a probably more expensive home and feel the need to actually generate some income, so trading is becoming a job rather than just a hobby. I’m still intending to let statistics (and maybe ML) inform my decisions, although I don’t plan to automate anything just yet.