I’ve implemented the UML Sequence diagram I described in a previous post. The Processor I used, which I’ve called HoldProcessor, decides to hold (action 1) whatever the state is. And doesn’t learn anything.
from processor_base import Processor
import numpy as np
class HoldProcessor(Processor):
def __init__(self):
super(HoldProcessor, self).__init__()
def get_action(self, state: np.ndarray) -> int:
return 1 # hold
def learn(self, batch: tuple):
pass
Even with this primitive decision maker I usually make a profit when I run the app using the last years worth of daily data for ADAUSDT. Probably because ADA has generally been going up for the past year, so random trades are more likely to make a profit than a loss.
Also I have a policy that selects a random action half the time, and consults the processor the other half, so there is a chance of the occasional buy and sell. This is to implement the explore/exploit requirement for the processor to actually learn anything, if it could. I didn’t include the policy in the sequence diagram (an oversight) but it’s just a function that figuratively ‘tosses a coin’ and if it comes down heads it picks a random action (0 to sell, 1 to hold, 2 to buy) otherwise it gets the action from the processor as shown in the sequence diagram.
In the code shown the HoldProcessor inherits from Processor. This is an abstract base class that specifies that any subclass must have a get_action method and a learn method. So as long as my processor classes that use DDQNs or Actor-Critic networks implement these methods I should be able to swap this one for one of those with no other changes. The beauty of coding to an interface!
from abc import ABC, abstractmethod
import numpy as np
class Processor(ABC):
@abstractmethod
def get_action(self, state: np.ndarray) -> int:
pass
@abstractmethod
def learn(self, batch: tuple):
pass