What am I Missing?

TL:DR Feeling stupid today

Not the first time I’ve asked myself this question, for sure. For the kind of algorithm I’m looking at now (policy gradient) a word keeps popping up – parameterized. The new policy is to be parameterized. By the weights of the network, apparently. But wait. The old policy had network weights. Wasn’t that ‘parameterized’? It’s not so much the use of the word, but how it differs from what went before. As a programmer I have an understanding of what parameters are in the programming context. Is it different in the mathematical context? I recall refreshing my memory of linear algebra a year or two ago and that the linear equations could be parameterized, don’t exactly recall the specifics. But they seem to making a big deal about what? No idea.

There seem to be some fine distinctions in this field of study that I’m not seeing. I guess for someone who understands it those distinctions are not fine but broad. I’m trying to understand what the provided code is trying to achieve, without much success. The mathematical explanations don’t make much sense to me. For me the question What are you trying to achieve? is important to understand when learning how that aim is to be realized. If the provided answer is too general or expressed in a language I don’t understand very well, I get stuck.

The level of explanation I’m looking for is as follows, with regard to Q learning. From state A one can move to state B or state C. There’s an immediate reward for each, probably different. But Q learning provides an answer to the question once I’m at B or C, what can I achieve from there? So the Value, or Q value of a state depends on what it can lead to, and this look ahead functionality allows one, with sufficient exploration, to reach the final goal easily on later attempts at the task, because these ‘landmarks’ have been set up. Talking about the recursive nature of the Bellman equation is not so intuitive.