Decisions, Decisions

I’m really bad at making decisions. I try to avoid having to. However with this ML project there are plenty of decisions to be made. I’m intending to compare several different models for predicting the price of BTC. However time series models usually use lags – values from previous days are moved to the same row as the value you are trying to predict. Problem is, for every day further back the lag is, you lose that row of data from the dataframe, because all previous days have NaNs in that column. For example the first row that can have a value from 7 days ago is day 8 (or row 8). So unless the decision is made up front as to how far back lags will go in any model, different data will be used for each model.

Perhaps in a learning exercise this isn’t very important. Perhaps the most important thing is to be aware that the problem exists. No doubt a ‘final’ model will be the result of tuning hyper parameters on the model that appears to be significantly better than all others, so such fine-grained issues in initial exploration is completely irrelevant. Probably. Still, I hate making decisions.