Quantitative Finance – christina norwood

ADA

Started a newish project, implementing the approach from my course using Cardano (ADA) as the asset. Focus at the moment is using XGBoost ML algorithm to help determine whether to go long or short using a momentum strategy. I haven’t used XGBoost before, so something to learn I guess. If the strategy seems profitable, as all the strategies I’ve lost money on have, I might hazard a few digital dollars just to keep the interest going. So here’s the code to fetch the data (ADA-USD) from Yahoo Finance:

import yfinance as yf
import pandas as pd

ada_data = yf.download("ADA-USD", start="2017-01-01",  end="2024-06-02")
ada_data.index = pd.to_datetime(ada_data.index)
print(ada_data)

And here’s the beginning and end of the data received

                Open      High       Low     Close  Adj Close     Volume
Date                                                                    
2017-11-09  0.025160  0.035060  0.025006  0.032053   0.032053   18716200
2017-11-10  0.032219  0.033348  0.026451  0.027119   0.027119    6766780
2017-11-11  0.026891  0.029659  0.025684  0.027437   0.027437    5532220
2017-11-12  0.027480  0.027952  0.022591  0.023977   0.023977    7280250
2017-11-13  0.024364  0.026300  0.023495  0.025808   0.025808    4419440
...              ...       ...       ...       ...        ...        ...
2024-05-28  0.467963  0.468437  0.453115  0.456990   0.456990  418594476
2024-05-29  0.456990  0.463107  0.450914  0.450995   0.450995  350482630
2024-05-30  0.450995  0.454546  0.443807  0.446581   0.446581  356151973
2024-05-31  0.446581  0.454957  0.444461  0.447461   0.447461  290913148
2024-06-01  0.447461  0.452584  0.445254  0.449975   0.449975  167918462

[2397 rows x 6 columns]

I guess I didn’t need that Adjusted Close column, as crypto doesn’t really do dividends and splits the way equities so. Earliest date on Yahoo Finance seems to be 2017-11-09. I wonder if that really was the date Cardano went live. Or perhaps there wasn’t a USD trading pair available.

Next up: Target

Not So Neat

I’m editing a Jupyter Notebook that came with my latest course to use BTC/USD instead of SPY (S&P500). Not only is this more relevant to my interests but it helps me examine the code closely. Still, the screen can get pretty crowded. Good thing I’ve got a 4K monitor. I discovered that I can get BTC price data back 10 years through one call on yfinance (Yahoo Finance API). Only daily data though.

I need to be mindful of differences between crypto trading and equities trading. For instance a stock exchange typically trades about 252 days per year (Monday to Friday, excluding public holidays) whereas a crypto exchange trades 365.25 days per year. This is important when annualizing metrics such as Sharpe Ratio. In the case of the code shown it’s reflected in the rolling one year return value.

Backtest 01

This code backtests a long only strategy using the lower Bollinger Band for entry and mean for exit. It follows the value of one unit of capital over multiple positions. No consideration is given to position size, just all in. The entry/exit is made on the close price so the position column is shifted so the position is applied to following days prices. Lookback period (for the mean and std rolling windows) and threshold (for the number of stds deviation for entry) are set at the top of the script.

''' 
    Backtest of long only strategy using Bollinger Band(s).
    Follows subsequent value of 1 unit of capital in opening position.
'''
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

lookback = 20
threshold = 1.0

# Read data and calculate statistics/indicators

df= pd.read_csv('data/ADA.csv', header=0, index_col=0, parse_dates=True)
df['mean'] = df['close'].rolling(lookback).mean()
df['stdev'] = df['close'].rolling(lookback).std()
df['bband'] = df['mean'] - threshold * df['stdev']

# Position - calculated on day's close so applies to following days

df['entry'] = df['close'] < df['bband']   # condition - T/F
df['exit'] = df['close'] >= df['mean']    # condition - T/F

df['position'] = np.nan   # create and initialise the column

df.loc[df['entry'], 'position'] = 1   # set value according to condition
df.loc[df['exit'], 'position'] = 0    # set value according to condition

df = df.fillna(method='ffill')

# Returns

df['diff'] = df['close'] - df['close'].shift(1)
df['daily_returns'] = df['diff'] / df['close'].shift(1)

# position was entered at END of previous day
df['strategy_returns'] = df['position'].shift(1) * df['daily_returns']

df['cumret'] = (df['strategy_returns'] + 1).cumprod()

# Plot

df.cumret.plot(label='ADAUSDT', figsize=(12, 6))
plt.xlabel('Date')
plt.ylabel('Cumulative Returns')
plt.legend()
plt.show()

Here’s a slightly different version using BBANDS from ta-lib

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import talib.abstract as ta

lookback = 20
threshold = 1.0

# Read data and calculate statistics/indicators

df= pd.read_csv('data/ADA.csv', header=0, index_col=0, parse_dates=True)
bb = ta.BBANDS(df, timeperiod=lookback, nbdevdn=threshold)
df['mean'] = bb.middleband
df['bband'] = bb.lowerband

# Position - calculated on day's close so applies to following days

df['entry'] = df['close'] < df['bband']   # condition - T/F
df['exit'] = df['close'] >= df['mean']    # condition - T/F

df['position'] = np.nan   # create and initialise the column

df.loc[df['entry'], 'position'] = 1   # set value according to condition
df.loc[df['exit'], 'position'] = 0    # set value according to condition

df = df.fillna(method='ffill')

# Returns

df['diff'] = df['close'] - df['close'].shift(1)
df['daily_returns'] = df['diff'] / df['close'].shift(1)

# position was entered at END of previous day
df['strategy_returns'] = df['position'].shift(1) * df['daily_returns']

df['cumret'] = (df['strategy_returns'] + 1).cumprod()

# Plot

df.cumret.plot(label='ADAUSDT', figsize=(12, 6))
plt.xlabel('Date')
plt.ylabel('Cumulative Returns')
plt.legend()
plt.show()

The Technical Analysis (TA) Library

There’s a library of fast technical analysis indicator functions for which the Python wrapper is talib. I found a course I did a couple of years ago that used this library because I was having some trouble with the docs. Quite easy really. Here I’ve added a MACD indicator to a time series.

import talib.abstract as ta
import pandas as pd

df = pd.read_csv('data/ADA.csv', index_col=0, parse_dates=True)
macd = ta.MACD(df)

# Information about the class/function
print(ta.MACD)

and here’s the resultant dataframe (part of it anyway)

date,macd,macdsignal,macdhist
2021-03-21,0.06283678579937213,0.06975973998121408,-0.006922954181841953
2021-03-22,0.051024886860298935,0.06601276935703106,-0.014987882496732122
2021-03-23,0.04237687616033803,0.06128559071769245,-0.01890871455735442
2021-03-24,0.03133963841922505,0.055296400257998965,-0.02395676183877392
2021-03-25,0.024455424936367764,0.049128205193672725,-0.02467278025730496
2021-03-26,0.02824408474397111,0.0449513811037324,-0.016707296359761294
2021-03-27,0.027905466818372693,0.04154219824666046,-0.013636731428287766
2021-03-28,0.028775604149389844,0.03898887942720634,-0.010213275277816493

So we get the macd value, the macdsignal value and the macdhist value as a DataFrame. Nice.

The website can be found here. Usage is described in another post.

Stationarity

Testing for stationarity of a time series is important for certain trading strategies, particulary mean-reversion strategies. Stationarity means that the (rolling) mean and variance are fairly constant. There are various Python libraries that provide the necessary functionality. The example below demonstrates the use of the Augmented Dickey Fuller (ADF) test from the statsmodels package on a year’s worth of daily data for the ADABTC trading pair.

import pandas as pd
import numpy as np
import requests
import json
from datetime import datetime
from statsmodels.tsa.stattools import adfuller

root_url = 'https://api.binance.com/api/v1/klines'
symbol = 'ADABTC'
interval = '1d'
limit = '360'

url = root_url + '?symbol=' + symbol + '&interval=' + interval + '&limit=' + limit
data = json.loads(requests.get(url).text)
df = pd.DataFrame(data)
df.columns = ['date', 'open', 'high', 'low', 'close', 'v', 'close_time', 'qav',
              'num_trades', 'taker_base_vol', 'taker_quote_vol', 'ignore']

df['date'] = [datetime.utcfromtimestamp(x/1000.0).date() for x in df.date]
df = df.set_index('date')

df['close'] = df['close'].astype(float)

# extracting only the close prices using values attribute of the DataFrame/Series
values = df['close'].values

# passing the extracted close prices to adfuller function.
res = adfuller(values)

# Printing the statistical result of the adfuller test
print('p-value: %f \n' % res[1])
print('Augmneted Dickey_fuller Statistic: %f \n' % res[0])

# printing the critical values at different alpha levels.
print('critical values at different levels:')
for k, v in res[4].items():
	print('\t%s: %.3f' % (k, v))

The generally accepted level is 95% confidence level, in this case being a p value of 0.05 as the null hypothesis is that the series is not stationary.

Append Data Directly to an Existing File

There are several ways to update a CSV file with new data. This method shows how to write the new data directly to file.

''' Append data directly to an existing CSV file '''
from datetime import datetime, timezone
import requests        # for making http requests to binance
import json            # for parsing what binance sends back to us
import pandas as pd    # for storing and manipulating the data we get back


root_url = 'https://api.binance.com/api/v1/klines'

coin = 'BTC'
symbol = coin + 'USDT'

interval = '1d'
limit = '1000'

date = datetime(2020, 9, 27, 0, 0, 0, tzinfo=timezone.utc)  
startDate = int(datetime.timestamp(date)) * 1000
startTime = str(startDate)
    
url = root_url + '?symbol=' + symbol + '&interval=' + interval + '&startTime=' + startTime + '&limit=' + limit
data = json.loads(requests.get(url).text)
df = pd.DataFrame(data)
df.columns = ['date', 'open', 'high', 'low', 'close', 'v', 'close_time', 'qav',
                'num_trades', 'taker_base_vol', 'taker_quote_vol', 'ignore']

df['date'] = [datetime.utcfromtimestamp(x/1000.0) for x in df.date]
   
df = df.set_index('date')

with open('data/BTCUSDT.csv', 'a') as f:
    for i, row in df.iterrows():
        date = i.strftime("%Y-%m-%d")
        f.write(date + ',' + row['close'] + '\n')

Getting kline data from Binance REST API

The following code can be used to get kline data from Binance.

from datetime import datetime, timezone
import requests        # for making http requests to binance
import json            # for parsing what binance sends back to us
import pandas as pd    # for storing and manipulating the data we get back


root_url = 'https://api.binance.com/api/v1/klines'

coin = 'BTC'
symbol = coin + 'USDT'

interval = '1d'
limit = '1000'

date = datetime(2020, 9, 3, 0, 0, 0, tzinfo=timezone.utc)  
startDate = int(datetime.timestamp(date)) * 1000
startTime = str(startDate)
    
url = root_url + '?symbol=' + symbol + '&interval=' + interval + '&startTime=' + startTime + '&limit=' + limit
data = json.loads(requests.get(url).text)
df = pd.DataFrame(data)
df.columns = ['date', 'open', 'high', 'low', 'close', 'v', 'close_time', 'qav',
                'num_trades', 'taker_base_vol', 'taker_quote_vol', 'ignore']

df['date'] = [datetime.utcfromtimestamp(x/1000.0) for x in df.date]
   
df = df.set_index('date')

In the root URL data can replace api. Apparently v1, v2, v3 or v4 all function the same. If no start date is provided the most recent entries are sent. Default LIMIT is 500, 1000 is the maximum for one request. Consult the API docs for available frequencies.