Baseline Models

While studying the WEKA application I encountered the concept of a baseline model, a very simple model that provides a prediction, to be used as a starting point to evaluate the effectiveness of other models. ZeroR seems to be the preferred baseline algorithm for supervised learning – it simply takes the average of all target values in the training data and predicts that for all test instances. For classification it finds the most populous class and predicts that for all test data.

Jason Brownlee argues that for time series forecasting it is better to use an algorithm that takes the sequential nature of time series into account. He proposes a persistence algorithm, which simply states that whatever happened yesterday will happen today. A bit like the weather. If you predict that today’s weather will be the same as yesterday’s weather you’ll be right more often than not.

Leave a Reply

Your email address will not be published. Required fields are marked *