Simple Linear Regression

RMSE = 0.02478

Using the same data as the persistence model (i.e. daily returns with one lag as input variable) I trained a Linear Regression model (from scikit-learn) on the training set, made predictions on the test set, and calculated the mean squared error and then the root mean squared error (RMSE), result shown above. This is better than the persistence model, so making progress. Code is as follows:

import numpy as np
import pandas as pd
from sklearn import linear_model
from sklearn.metrics import mean_squared_error

df = pd.read_csv('data/btc.csv', usecols=['date', 'close'], index_col='date', parse_dates=True)

df_returns = df['close'].to_frame().pct_change()
df_returns.rename(columns={'close': 't'}, inplace=True)
df_returns.insert(0, 't-1', df_returns['t'].shift(1))

df_returns.dropna(inplace=True)

X = df_returns['t-1'].to_numpy().reshape(-1, 1) # matrix required
y = df_returns['t'].to_numpy()

test_limit = 700

X_train, X_test = X[:test_limit], X[test_limit:]
y_train, y_test = y[:test_limit], y[test_limit:]

model = linear_model.LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

loss = mean_squared_error(y_test, y_pred)
print(loss)
print(np.sqrt(loss))