After sleeping on it I’ve decided to stick with my original resolve. The book I’m working through is Hands-On Unsupervised Learning using Python by Ankur Patel (an O’Reilly book) via Kindle. At least with unsupervised learning I don’t have to make decisions about what to use as targets (classes, labels, etc), and with PCA I can relax a bit about my feature selection/engineering. All very good for a person who has trouble making decisions (and sticking with them). So, a good book, an IDE and environment setup that works for this kind of stuff, example code downloads as usual with this kind of book, and I’m set for a couple of months.
Month: February 2024
or Third…
I mentioned in a previous post that I had a Quantra course on Unsupervised Learning in Trading. Fact is that it attempted to cluster different financial instruments, with a view to pair trading instruments in the same cluster. It is generally recommended that one pair trade with instruments that have some fundamental feature in common. However I was unable to group crypto instruments in any way that produced good results. Sharpe ratios from backtesting were abysmal. I did try.
So, should I try again? Can I improve my understanding of unsupervised learning to the point where I can achieve useful results? Seems I’ve failed too many times to achieve any significant results over the past few years. What to do?
On Second Thought…
I’m already questioning my decision from the previous post, to focus on Unsupervised Learning. In the back of my mind is the advice from Ernie Chan, that ML is most useful for metalabelling – deciding whether to trade (or not) where actual signals have come from a non-ML strategy. I guess this is a supervised classification problem. My new book on Unsupervised Learning has started with a chapter on Supervised Learning, which is what has reminded me of all this.
What I’m actually thinking of is some kind of weird hybrid system, where a classification of ‘good time to trade’ is in fact a buy signal for whatever data feed into the trained model. So combining strategy with metalabelling. What could go wrong?
Focus
I frequently feel overwhelmed by the sheer volume of tutorial material I now have. I need to focus on one thing, and stick with it long enough that I don’t need to come back to it again. So for better or worse I’ve decided to focus on unsupervised learning. I even bought a shiny new (Kindle) book despite already having a couple on this subject. Retail therapy.
So this one uses Tensorflow for it’s examples, and I set up a new Docker environment with Tensorflow and the standard data science libraries. All the code files that come with the book are in Jupyter Notebook format, and I don’t like that much because I can’t easily inspect the values of variables (and check the types) as I go. Currently my PyCharm is not set up to run notebooks from Docker. I believe it’s possible (something about exposing ports) but not a high priority for me at the moment. I can just use Google Colab if I really want to run the notebooks.
Anyway, it’s pretty easy to open the notebook in PyCharm even if I can’t run it there, and copy the code into a py file with minor adjustments where needed to get that to work, such is inserting print statements rather than just having the notebook print the last returned value in a cell.
Will Unsupervised Learning help me with my trading? Who can say? I do actually have a Quantra course on that very subject, but I’m hoping to gain a more holistic view of unsupervised learning so that I can explore it in more depth.
Docker
When I started programming in Python about four years ago I used the Anaconda distribution for package and environment management. This worked well most of the time, but I had some issues, notably with TA-lib. I gradually discovered that there weren’t as many packages (or specific versions of packages) available through conda as there were on pypi using pip. While some people used both conda and pip with gay abandon, I did see some warnings not to do that. Very confusing. Perhaps the issue was related to using Spyder as my IDE. It had to be installed in each environment, and perhaps had problems with pip-installed packages. I switched to PyCharm as an IDE but still had problems once I started on Machine Learning, especially confilcts between PyTorch/TensorFlow and CPU/GPU installs.
Lately I’ve been using Docker for environment management, which seems to work well although I’m not attempting to use GPU versions of the ML libraries. Biggest problem is that one can’t install new packages in an image without rebuilding the image. Anyway process at the moment is to use a Docker container as a remote interpreter, with the actual python project (or tutorial) files in some directory outside the container. I have to use PyCharm Pro for this, as the Community Edition doesn’t support Docker based remote interpreters.
Giving Up on Pair Trading
Over the past few months I’ve tried to tighten up my pair trading. Firstly I’m picking pairs that are cointegrated at 99% confidence level by two different tests, and then by visual inspection of charts reach trigger conditions and actually revert to mean often enough to be worth trading. Also I’m using many small position sizes (approx $500USD each long and short). Most such pairs go way off pattern as soon as I put a trade on, and usually don’t return again. At least with the smaller position sizes I’m not losing too much, but it’s certainly time to reconsider using this strategy. Pity, as I’ve put a lot of work into it (and money) over the past few years.