May 2024 – christina norwood

What have I missed?

So I’m doing this Udemy course which revises high school/college/uni maths. I’m hoping for some insight into the notation that seems to be in common use at the moment, which looks quite different to what I’m used to. Mind you my studies in mathematics were late 1960s as part of a B.Sc. and later in 2001/2002 as part of a comp sci program. I referred to the Bellman equation in a previous post as containing some unfamiliar notation.

So I’m only a couple of lessons into the math revision course and it hits me with something that looks like this:

Extremum P(h | k)

where that vertical stroke is called the pipe character in programming. This relates to the ‘vertex form’ of a quadratic equation. OK, I can see how the vertex form works, no big deal there. But expressing the turning point (minimum/maximum) in the above form, wtf? So h and k are parameters(?) to the equation, OK, but it seems that in mathematics the symbol h | k is usually interpreted as h given k, sort of y = 3 given x = 1 or something like that, or in the case of reinforcement learning s2 is the state transition to given that the action is a1 (s2 | a1). But how does that relate here? And what exactly is P in this case. A search for mathematical symbols doesn’t give any clear answer in this context. So right at the outset of this revision course I’m faced with the problem I started with, unfamiliar notation with no explanation of what it’s supposed to mean. I guess there’s been some major change in the teaching of mathematics in schools in the last 20 years, perhaps I should check out a basic high school maths text.

Not Going To Lie

I find the above content challenging. I can get some sense of it but I don’t recall using this kind of notation in any of my studies of mathematics in any of my university programs, in which maths was not the main focus. I’m not sure how important it is for me to actually understand it, after all I can no doubt code up a Double Deep Q Network without understanding it, but possibly being a bit autistic I find it hard to just let it go. Besides, it’s the concepts I want but they’re expressed mathematically. Anyway I’ve just signed up for a refresher math course on Udemy which I hope will contain some content that relates to this kind of symbology. It’s a bit hard to tell from reading course descriptions what is actually in them.

When Life Gives You Lemons…

Good to have a jar of preserved lemons (lemons and salt) on hand. My current jar is getting a bit low so time to make a new batch. Takes about a month before they are ready (its a fermentation process, like sauerkraut). I have a great chicken, olives and preserved lemon recipe which I make in a tagine. Delish.

I found a photo of my tagine for reference:

The Whole Picture

I often wonder why I jump around from one reference to another when I’m trying to learn something. Of course I know why I do it – I get stuck. But why do I get stuck? Well, it seems to me that any single resource leaves out important information when developing an idea. Learning how reinforcement learning works is fairly complex, involving concepts, terminology, mathematics, programming, and a big picture framework that needs to be completely understood. An overall idea might be fine for some purposes, but I want to be able to implement it to solve real problems. A solid understanding is required for that. When I get stuck with one resource it’s because the author has left something out (maybe assumed some underlying knowledge that I don’t actually have) or has explained it in a way that I don’t understand (the math is a bit beyond me). The solution is to go to another resource where that particular problem is resolved with different language, fewer assumptions, better analogies, or whatever.

I was a teacher of STEM and IT subjects for nearly 30 years. From the surveys of student satisfaction conducted by my employer over the years I know that I didn’t rate very highly as a teacher by my students. I wonder which of the above sins I was committing on a regular basis. Anyway, no one has to put up with my ineptitude anymore. One thing I like about trading crypto is that it doesn’t involve anyone else, at least not on a personal level. Just ‘the market’.

Bricks and Mortar

Once upon a time, when bricks and mortar bookshops were a thing, I would wander down to such a place and browse a shelf of books on my current subject of interest. I would read a page or two to see if the general level matched where I was at, not too simple, not too difficult. I would check some topic I was aware of to see how it was covered, and generally walk out with a selected book that matched my needs pretty well.

Now that bookshops are all online, and the books that I buy generally digital, it’s a bit harder to gauge what is suitable. Sure, most books (e.g. Kindle books from Amazon) give a preview, but this is generally the introductory material which is not very suitable for making the kind of judgement I’m interested in. Result is that I have a lot of ebooks that don’t really suit my need. Convenient, sure. Appropriate? Not so much.

Anyway, I have yet another book on Reinforcement Learning, and I’m really liking this one. In the days of bricks and mortar, this is the book I would have come away with. Some people have suggested I may be somewhere ‘on the spectrum’, not too far along to be sure, but as I’ve mentioned previously I’m pretty inflexible in what makes a good match for learning material.

Found It

I’m very particular in how I like information to be presented to me. Not too fast, not too slow, right level of difficulty – you get the idea. I’ve finally found a reference that presents an explanation of the implementation of a Deep Q Network to solve a basic Reinforcement Learning problem (Frozen Lake 4×4) that fits the bill. Here it is, on YouTube.

Roadblocks

Whatever resource I’m studying I run into roadblocks. Following the development of a topic, A -> B ->C and all good, but suddenly E appears with no obvious transition from what went before. Perhaps the author/tutor plans to cover C -> D and D -> E later, but I have difficulty when I lose track of an argument/development of an idea. Perhaps I’m just not very flexible.

My usual response is to go to another resource/reference. At the moment I’ve gone back to a Quantra course that I purchased quite a few months ago on Deep Reinforcement Learning in Trading. I think I abandoned that when I first looked at it because it uses TensorFlow for the NN part and I want to stick with PyTorch. These days I’m a little more comfortable with rewriting code that is designed for TF into the PyTorch version. It’s not really that hard.

Anyway I’m making progress with that course and don’t anticipate any further serious roadblocks. Pity I don’t plan to actually do any (short term) trading going forward. At the moment it’s just an intellectual exercise.

End of an Era

I’ve been trading quantitatively for four years, mostly trying to get a statistical arbitrage strategy to work. Lots of programming (Python) running various statistical tests and backtesting various pairs with a range of parameters. Lots of online courses (Udemy and Quantra).

The whole thing has depended on being able to set up pair trades fairly easily. Binance’s Cross Margin wallet has been ideal for this. Sell something I don’t have, it’s borrowed automatically (as long as I have adequate collateral of course). And lots of coins available for margin trading. Plus I could use the Binance REST API to check the positions whenever I wanted by running a simple script.

But Binance is no longer allowing margin trading in Australia. Admittedly I decided a couple of months ago not to try pair trading any longer due to not being profitable, so the fact that I can’t anyway is not such a big deal. Perhaps it’s a good thing I never got it to work effectively, or I’d be very pissed off at the moment. Holding a margin long position on BTC is a lot simpler, just borrow a stablecoin somewhere and trade it for BTC somewhere else. Not so much busy work managing a dozen open positions. So pair trading is just something else I’ve tried that hasn’t worked out. Still, it was interesting while it lasted. Maybe the BTC will pay for all. Hope springs eternal, they say. Perhaps I should get myself on the waiting list for public housing, just in case.

Exploration, Exploitation

Finished another chapter, discussing one of the most basic problems in human life. Decisions. Specifically, how to transition from collecting information required to make a decision, to actually implementing that decision.

In ML terms the initial phase is called exploration. In the case of our one armed bandits (see previous post) it involves trying them all out to see which one (if any) gives the best payout. Exploitation is then using that best bandit to make money. Problem is if you spend too much time on exploration you’re spending a lot of money on losing machines. If you spend too little time/money on exploration then the ‘best’ one may be just a fluke, and not profitable in the long term.

A couple of the policies regarding the transition discussed in the book look at a soft transition from exploration to exploitation. During the exploitation phase occasionally recheck the ‘losing’ machines to gather a bit more data to confirm the decision. And if the ‘winning’ machine starts to produce bad results in the long term recheck the others more often. Rather like a bad marriage I guess.

Bandits

I’ve worked through the chapter on OpenAI Gym, even got the scripts working, adjusting for the changes to the API since the book was published. This is a problem with all IT related stuff, changes are so frequent that just about any tutorial/book material is out of date to some extent. Makes learning harder. Anyway, the approach to those games wasn’t very sophisticated. Just demonstrating that if one selects from available actions at random one doesn’t get very far.

So now we’ve moved on to One Armed Bandits, a nickname for slot machines. Looking at strategies for exploring a set of such machines to determine if any has a better probability of reward than the others. This could be relevant to trading. Given a situation where funds could be allocated to a range of strategies, how to determine the most profitable course of action. Especially if you don’t have past history to work with.

Maintaining Interest

I’m still searching for something that holds my interest for more than a few days. I’ve spent a lot of time on quantitative trading over the past few years, getting nowhere. or rather, going backwards (losing money). More recently I’ve spent time on Machine Learning, hoping it could improve my trading in some way. Not sure I will even need to do trading anymore. If I have enough to live on after buying an apartment I won’t actually need to make any income.

So, what to do? I’m coming around to revisiting reinforcement learning, which is a very interesting area of ML, and could be related to trading should I ever go back to that. It’s a complex area of ML, but I think I’ve got enough of a handle on the basics to take another shot at it. And once I’ve got it set up, finding the right data will be the biggest challenge. Garbage In, Garbage Out, they say, so identifying useful inputs will be the challenge. Or I could input everything and let the algorithm sort out what’s useful. That’s the beauty of reinforcement learning. The code does all the work.

So, I’m re-reading Practical Deep Reinforcement Learning with Python by Ivan Gridin. He gives examples in both PyTorch and TensorFlow, so I should be able to follow along fairly easily, at least as far as the ML library is concerned. However I have had problems with Reinforcement Learning in the past because of changes to other libraries commonly used. A favourite is openai.gym which gives me some headaches. Setting up a system that actually works with the code is a real headache in this area of ML. Let’s see how it goes this time.

So (my favourite word for starting paragraphs), I could develop an RL (reinforcement learning) model that has various trading strategies as actions, PnL as reward, and an environment that includes all the data that might be relevant. Perhaps the simplest starting point would be to concentrate on opening/closing positions on a single instrument. After that I could move on to managing a portfolio.

DeFi

Exploring DeFi. The exchange I have been using for some crypto trading no longer allows the kind of trading I was doing for reasons of ‘compliance’, so time to check out some decentralized, permissionless options.