Finding a Framework

The various books and courses I study all organize code differently. Even something as simple as the Replay Buffer I’ve seen four or five different implementations of, from a separate list for each item in the saved ‘experience’ (state, action, reward, next state, done) to named tuples stored in a deque. I’m hunting around to see where each required task actually gets done, and what kind of data is being used. If I want to rejig an example to work with different data it can be a job to make sure that everything is in the same format. I need to know what is a list, what is a numpy array, what is a torch tensor (and what is a pandas DataFrame, my preferred data type for the loaded data, and where the necessary conversions take place.

I need some kind of standard template, and convert anything I’m studying to that template so I know that everything is covered and everything fits. I’m inspired by this basic image of what RL involves

Simple and straightforward. So I should have an Environment class, an Agent class, and an App that creates each and sets the ball rolling. Everything else should be hidden inside those two classes. So I can use different environments such as games or trading environments or some other, and it will be transparent to the main App. And different agents using different strategies, but once again, transparent.

I’ll start with a couple of simple examples, maybe a GridWorld using state/action tables, and then move on to neural networks, while trying to maintain the same basic framework. Hopefully it will make my task easier.

So here’s a very bare bones start. I decided to make Agent and Environment Abstract Base Classes, so I’ll have to subclass to get concrete implementations. Having taught Java for 16 years this feels right at home. I’ve kept all program logic out of the app that acts as starting point. I guess the Agent will end up being pretty busy, as it has to make all the decisions (and improve it’s ability to make good decisions!) Only one point of interaction of Agent with Environment, i.e. Agent provides an action and the environment provides the required responses – the next state, the reward, and whether the ‘game’ is complete.

For the sake of completeness I’ve added the above trivial concrete implementations of those two abstract base classes.