In this first chapter, we will take the very first steps towards becoming a code-driven, quantitative investor. This means that we will choose a financial asset that we want to invest in, we will download historical price data for this asset, and then learn to work with that data in Python.
At the end of this chapter, you will know what adjusted close prices are, how to calculate the past profitability and the risk of individual financial assets, and how to compare them based on these metrics.
Throughout this chapter, we will use so-called EOD ("end-of-day") price data of stocks, stock indices, exchange-traded funds (ETFs), and commodities. The EOD price reflects the price of an asset at the end of the trading hours on that day. Note that EOD prices from different exchanges may have different timestamps because some markets close earlier than others. Most vendors of daily price data not only supply the close price of the trading session, but a few different values:
Taken together, this type of data is also called OHLCV-data, for the five different values contained in the data set. There are many vendors for OHLCV-data on stocks, some are free, some are quite expensive. In out code examples, we will use Yahoo Finance as our data source. It provides free OHLCV data on a large universe of international stocks and other assets. The data quality is mostly good, but we also find some glitches especially in data of rather unknown, smaller companies that we need to watch out for. Note: The Yahoo Finance API is intended for personal use only. When you run the code snippets below on your own PC, you will download the data from Yahoo Finance, but if you run the code here on this site, the function will actually load a file from our servers, not from Yahoo Finance.
In Python, we can use the library yfinance to download price data from Yahoo finance. As a starting point, we want to take a closer look at the price data of SPY, an exchange-traded fund that tracks the S&P500 index. This is a weighted average price of the 500 largest US companies (measured and weighted by the companies' market capitalization).
Buying shares of the ETF thus allows you to invest in a broad set of 500 companies without trading all the individual stocks (which would be cumbersome and costly, due to minimum transaction costs that most brokers charge). The following Python snippet loads SPY price data:
Open High ... Adj Close Volume Date ... 1993-01-29 43.968750 43.968750 ... 24.684097 1003200 1993-02-01 43.968750 44.250000 ... 24.859665 480500 1993-02-02 44.218750 44.375000 ... 24.912329 201300 1993-02-03 44.406250 44.843750 ... 25.175673 529400 1993-02-04 44.968750 45.093750 ... 25.281013 531500 ... ... ... ... ... ... 2024-07-25 541.349976 547.460022 ... 538.409973 61158300 2024-07-26 542.280029 547.190002 ... 544.440002 53763800 2024-07-29 546.020020 547.049988 ... 544.760010 39515800 2024-07-30 546.260010 547.340027 ... 542.000000 46487100 2024-07-31 548.979980 551.710022 ... 550.799988 16644479 [7932 rows x 6 columns]
The data is stored as a Pandas DataFrame with the following columns:
Index(['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')
Apart from the OHLCV-columns that we expected, we further see a column
called Adj Close—short for adjusted close price. This price corrects
for two effects:
In the case of SPY, we see that the gap between raw prices and adjusted prices seems to gradually shrink over time, we do not see any abrupt changes. This indicates that SPY regularly pays dividends, but there has not been any split. Note that high- and low-prices are not adjusted by default! If we want to use those prices for some statistical analysis, we have to keep in mind that they contain gaps, but we may use the ratio between the raw close price and the adjusted close price to correct the other columns as well.
We now know how to load price data and its format, and which price to use for analysis. With these basic insights, we can finally start assessing the performance of different financial assets!
Whenever we think about investing money in a certain asset, for example a stock index (via the ETF SPY), or gold (via the ETF GLD), or government bonds (via the ETF TLT), or in a single stock like Apple (AAPL), one immediately wants to look a the chart of historical prices of the asset to obtain a first impression of its past profitability and the risk inherent to the asset.
One way of comparing the past profitability of different assets is to ask "What would I have today if I invested $1 in this asset 10 years ago?". 10 years of course is an arbitrary choice here and you should adjust it to your own investment horizon, i.e. how long you want to leave your money in the investment before selling it again. Let us load price data for the different assets mentioned above:
Adj Close ... Volume
AAPL GLD ... SPY TLT
Date ...
1980-12-12 0.099058 NaN ... NaN NaN
1980-12-15 0.093890 NaN ... NaN NaN
1980-12-16 0.086999 NaN ... NaN NaN
1980-12-17 0.089152 NaN ... NaN NaN
1980-12-18 0.091737 NaN ... NaN NaN
... ... ... ... ... ...
2024-07-25 217.490005 218.330002 ... 61158300.0 44999100.0
2024-07-26 217.960007 220.630005 ... 53763800.0 34423800.0
2024-07-29 218.240005 220.320007 ... 39515800.0 25801300.0
2024-07-30 218.800003 222.520004 ... 46487100.0 28133500.0
2024-07-31 222.539993 224.009995 ... 16644479.0 15078120.0
[10999 rows x 24 columns]As you can see, the dataframe we get now has a two-level structure: first, we can specify which column we want to look at (e.g. close price, or volume), and then we can either specify a single asset (e.g. SPY), or we take the whole sub-dataframe with all assets. Below, we select the adjusted close prices of all assets:
AAPL GLD SPY TLT Date 1980-12-12 0.099058 NaN NaN NaN 1980-12-15 0.093890 NaN NaN NaN 1980-12-16 0.086999 NaN NaN NaN 1980-12-17 0.089152 NaN NaN NaN 1980-12-18 0.091737 NaN NaN NaN ... ... ... ... ... 2024-07-25 217.490005 218.330002 538.409973 92.269997 2024-07-26 217.960007 220.630005 544.440002 92.989998 2024-07-29 218.240005 220.320007 544.760010 93.489998 2024-07-30 218.800003 222.520004 542.000000 93.849998 2024-07-31 222.539993 224.009995 550.799988 94.690002 [10999 rows x 4 columns]
You may notice that some values in this dataframe are set to NaN (short for "not a number"). NaN indicates missing values and we will see this value quite often, as not all assets were available to trade from the same point in time. Let's carry out our first estimation of past profitability by
computing the fate of $1 dollar invested in each asset 10 years ago:
AAPL 10.526767 GLD 1.815463 SPY 3.408017 TLT 1.061935 dtype: float64
The results vary quite a bit: while an investment in Apple ten years ago would have increased by ten-fold and an investment in the S&P500 stock index would have increased more than three-fold, Gold has only appreciated about 80% of value, and the long-term government bond ETF has only gained a few percent in value. Now you may mention that the 10-year lookback period was chosen arbitrarily, and indeed, the performance of different assets may change significantly over longer periods, for example due to macro-economic changes (rising inflation), or due to management changes in companies (Steve Jobs leaving Apple), or due to investor sentiment (viewing Gold as a "safe haven" during economically unstable times).
To obtain a better picture of the all-time performance of the ETFs and the Apple stock, we compute the average annualized return. We do this by first computing the average daily relative price change of the assets, and then multiplying this daily relative change by the number of trading days per year (which are on average 252, excluding weekends and holidays on which most exchanges are closed):
AAPL 27.656885 GLD 9.778839 SPY 11.610749 TLT 5.076016 dtype: float64
Looking at the annualized returns (times 100 to show percentage points), we see a similar picture as before, with Apple leading the race with an impressive average annual return of 27% since the start of our data series in 1980! Note that using ETFs such as SPY to estimate long-term performance of the underyling stock index S&P500 is sub-optimal, as the SPY ETF only began trading in 1993 whereas the S&P500 index exists since 1957. For comparison, let us load data from the index itself, and compare the annualized return:
^GSPC 9.039335 dtype: float64
As we can see, using only the data of the SPY ETF since 1993 actually overestimates the longterm performance of the S&P500, with 11.6% vs. 9.0%. This example highlights the importance of good price data that spans multiple periods of varying macroeconomic conditions to obtain a realistic performance estimate that will hold in the future.
But wait, if Apple "only" makes 27% per year, how come our initial calclulation of the 10-year performance stated that we almost increased our bankroll by 10-fold? The annualized return that we calculated above does not account for compounding, i.e. the 27% annualized return is what you get if you withdraw all your profits after each year (by selling part of your investment). Longterm investors, however, usually do not withdraw profits for extended periods of time, letting profits accumulate. We can simulate this process of compounding by the following calulation: Assuming an annual return of 27%, one dollar invested will be $1.27 after one year, and when reinvested for another year, will yield $(1.27 * 1.27), then $(1.27 * 1.27 * 1.27), and so on. After 10 years, we get:
10.915338530733937
This nicely approximates the 10-fold increase that we estimated using only the data from the last ten years. When choosing between investments, the annualized return may not be the perfect metric to quantify past performance as it does not account for compounding! In Chapter 2, we will discuss the effects of compounding in detail (including expected best and worst outcomes of investments), and introduce the Compound Annual Growth Rate (CAGR) as a metric that properly accounts for the effect of compouding.
In the examples above, we have seen that one can significantly increase their wealth by just buying and holding certain assets over a long period of time. But why is that? Why do stocks (and other assets) systematically gain value over time? One way of approaching this question is to think about the risk that one takes by buying and holding these assets: Apple may have failed early on if competitors had driven it out of the market, and even broad stock indices may lose a significant fraction of their value during financial crises, goverments may default on their debt, and commodities such as gold are susceptible to supply and demand shocks. The returns we get for holding these assets may thus be seen as a compensation for exposing ourselves to the risks that these assets bear!
Now you may counter that you do not see yourself exposed to any real risk since you always think about the longterm aspect of investments, and that any drawdown can simply be waited out, and in the end the value will go up again (because on average, you are rewarded for taking risk). Crypto-affine investors may know this argument by the term "just HODL". While this argument may hold some truth for an idealized investor who holds assets for an infinite amount of time, it is not a particularly practical argument. At some point, you will need to withdraw a certain fraction of your investment, otherwise, what is the purpose of the investment? If at that point, the value of your asset has suffered a significant loss, you are actually punished for taking the risk, not rewarded. Likewise, some investments may end in ruin, i.e. they will lose all their value (companies going bankrupt, goverments defaulting on debt). Since there is no recovery from ruin, there is no chance of getting compensated for the risk you've taken by investing in a company that went bankrupt after it did so. As a non-idealized, investor who lives in the real world, you need to account for this risk, for example by not putting all your eggs in one basket, by diversifying across a number of assets to not also end in ruin yourself if one single investment does so.
If just holding assets indefinitely is not an option (and also does not make them riskless), we need a way to quantify the risk that we expect to be compensated for, such that we can choose how much risk we are willing to take for a certain expected compensation. A common metric to measure the risk of holding an asset is the volatility of the asset. Volatility is most commonly calculated as the standard deviation of daily returns, and then annualized. Note that annualizing volatility is done by multiplying the daily volatility by the square root of the number of trading days in a year, not by the number of trading days itself (as we have done for the daily returns). In Chapter 2 we will dicuss the underlying mathematical logic behind this scaling in more detail, but for now, we go on and compute the annualized volatility of our four example assets:
AAPL 44.275672 GLD 17.571917 SPY 18.659282 TLT 14.479753 dtype: float64
The output annualized_volatility*100 can be interpreted as follows: If SPY's annualized volatility is 18%, then you should expect the price to fluctuate by 18% during one year. It thus gives you an idea of what ups and downs you should expect when holding a position over the time of a year. Note that volatility is a measure for average fluctuations, not for worst case fluctuations, so you should be able to bear seeing 18% fluctuations regularly, and still expect more drastic events to happen, albeit less frequently. These fluctuations will overlay the positive longterm trend and will probably—from time to time—make you question whether your investment was the correct choice (when you see a downward fluctuation) or make you overconfident in your investment decision (when you experience an upwards fluctuation).
In general, we see that the annualized volatility follows our intuition: the lowest value belongs to the government bonds, which are often called a safe haven despite also experiencing large price fluctuations under certain market conditions (especially during times of rapid rate changes by federal banks). Another supposed safe haven, gold, actually is about as risky as a broad stock index. Investors may still benefit in having a position in gold, as its price movements may counteract stock crashes (as we will analyze in more detail in Chapter 3). The highest volatility value of 44% belongs to Apple, our only single stock in the group of assets. But why does investing in Apple expose an investor to more risk than investing in the S&P500 index? It's all stocks after all, isn't it? By buying a single stock, we expose ourselves to idiosyncratic risks regarding Apple, i.e. failed product launches, missed earning estimates, changes in management, etc. In contrast, when investing in the whole S&P500, we diversify this risk, as it's unlikely that all 500 companies will miss earnings estimates, or launch a bad product. Or to put it differently: By investing in 500 companies, we remove the risk of picking a single bad investment, at the cost of losing the excess returns of a single good investment. Stock picking—the art of choosing only highly profitable companies to invest in, is harder than most people believe, and only few have mastered it in the longterm (including Warren Buffett).
An important fact about volatility is that it can be highly dynamic and change over time. During market crashes we usually see volatility spikes, because daily price movements are larger than average as more market participants act on the market as they interpret new information that becomes available. If we plot the rolling volatility of Apple using only the past 20 consecutive trading days to estimate the volatility at any point in time, we get the following picture:
As we can see, the rolling annualized volatility spikes several times up to 200% (during the dot-com bubble burst in 2001)! Apart from the individual large spikes, there is also a more gradual change in the level of the volatility. But here we are faced with another problem: how many days should we base our estimates of volatility on to capture fast changes but also get reliable estimates of volatility? In Chapters 2 and 4 we will explore state-of-the art methods to detect systematic changes in volatility (often called regime changes), but also show the pitfalls of common models used to predict volatility changes, e.g. GARCH models.
Coming back to the average volatility of 44% for Apple, we may ask whether this value is high enough to avoid investing in it? It has an astonishing annualized return of 27% to compensate us for that risk, right? Let us plot the annualized return of our four assets over their annualized risk to get a better overview:
In general, our intuition is confirmed: the riskier an asset, the higher the expected return. This makes sense since an investor wants to be properly compensated for the risk they are taking. But while the stock index, the government bond and gold lie almost perfectly on a line, Apple seems to lie below an imagined line through the other three assets—indicating that it pays less profit per unit of risk taken. How can we quantify a risk-adjusted metric of return? This takes us to the so-called Sharpe ratio:
In the formula above, denotes the Sharpe ratio, denotes the expected, annualized return of our asset, and denotes the risk-free rate of return (the annualized return that you can achieve without taking any considerable risk, e.g. by buying very short-dated government bonds or depositing money in an instant access savings account). denotes the annualized volatility of our asset. The nominator of the formula is also called the excess return of the asset, i.e. how much more profit you will make compared to a risk-free investment. For simplicity, we will set the risk-free rate to zero in the following analysis (assuming you leave money that you do not invest in a bank account that does not pay any interest). However, in Chapter 4 we will do more detailed analysis and incorporate historical data on risk-free rates for a more accurate calcluation of the Sharpe ratio. Readers from the field of physics or engineering may recognize the Sharpe ratio (with a zero risk-free rate) as a kind of signal-to-noise ratio, where the expected return is our signal, and the volatility takes the role of the noise. In the long-run, most assets, including broad stock indices, will exhibit a Sharpe ration smaller than one. Trading strategies with long-term Sharpe ratios larger then 1 or even 2 are achieved by some hedge funds or by trading strategies that target illiquid markets and cannot be scaled to a lot of capital. With this coarse overview in mind, what Sharpe ratios do our four assets achieve?
AAPL 0.624652 GLD 0.556504 SPY 0.622251 TLT 0.350560 dtype: float64
As expected, the Sharpe ratios of our individual assets are below 1, with Apple and the entire S&P500 index leading the way with a Sharpe of 0.62. Compared to the other assets, stocks seem to provide the superior risk-adjusted return in this example. Returning to our plot of annualized return vs. annualized volatility, the Sharpe ratio can be interpreted as the slope of a line through the origin. These results indicate that actually, it's AAPL and SPY that lie on a common line, and TLT and GLD fall behind with respect to risk-adjusted return:
However, keep in mind that the amount of historical data may be limited as we mostly use ETF data here. In Chapter 2 will explore how we can quantify the uncertainty in estimating the Sharpe ratio, leading to a much more robust performance metric that can used to assess the performance of fairly new assets for which the historical track record is short (e.g. ARK funds and cryptocurrencies). In Chapter 3 we will learn that combining two low-Sharpe investments in a portfolio can yield a Sharpe ratio that is higher than both individual ratios, depending on the correlation between the assets (plus non-linear effects).
This section has provided a very brief overview on how to evaluate the performance of financial assets. We will see that the concepts introduced here will return repeatedly in the following chapters, and it will also become clear that we only scratched the surface here. Still, knowing these basics will help us venture onwards, and the next three sections will introduce common pitfalls that investors encounter when developing investment strategies: regime-switches that hide risk when we combine multiple assets in a portfolio, the costs of trading that ruin some very clever active trading strategy ideas, and why leveraged ETFs are not the answer to all of our problems.