In this section we will take a closer look at one of the most popular long-term investment portfolios: the 60/40 portfolio. This name derives from the two allocations in the portfolio, namely 60% in stocks (via a broad stock index ETF like SPY) and 40% in fixed income instruments (via a long-term government bond ETF like TLT). We will learn how to compute the combined returns of a portfolio containing more than one asset, and we will learn that for a long time, fixed income returns provided a way of risk mitigation against stock market crashes—but only until 2022, when this risk mitigation mechanism broke down. This example will teach us that statistical properties of historical data should not be taken for granted, and that regime-changes in market dynamics force investors to contiuously update their view on markets.
To illustrate long-term relations between different asset classes, we first download suitable data. For this first example, we will again use the ETFs SPY (as a tradable asset that replicates the S&P500 index) and TLT (as a tradable asset that invests in long-term US government bonds), even if we only get data for both starting in mid-2002, so we only get a lookback period of about 20 years. In Chapter 3, we will learn to use proxy data from different, but very related assets to extend the lookback, but still mimic the same fees and returns of the asset that we actually trade today. For example, for stock index data, we could instead use the ticker symbol ^GSPC (which denotes the S&P500 index itself, not the tradable ETF). This means that we will ignore tracking errors of the ETF and that we will have to correct the data for any fees that the ETF would charge. For long-term bond price data, we could use the ticker VUSTX, a tradable fund by Vanguard that invests in long-term US government bonds. The advantage over TLT is that VUSTX has been trading since 1986, providing us with a much larger lookback period, but not with the exact same returns. But for now, we are happy with downloading the SPY and TLT data, and we only keep time stamps for which we have data on both assets:
SPY TLT Date 2002-07-30 60.080936 39.072857 2002-07-31 60.226238 39.556957 2002-08-01 58.653877 39.782238 2002-08-02 57.339134 40.189655 2002-08-05 55.343956 40.366993 ... ... ... 2024-07-25 538.409973 92.269997 2024-07-26 544.440002 92.989998 2024-07-29 544.760010 93.489998 2024-07-30 542.000000 93.849998 2024-07-31 550.799988 94.690002 [5539 rows x 2 columns]
Let's plot the cumulative return of both assets on a log-scale to get a better overview of how the value of these two assets has fluctuated over the years. To do this, we divide our price dataframe by the first row, to let both price series start at a value of one:
We notice multiple things here:
Now that we have an overview on the individual assets, let us simulate our first combined portfolio of multiple assets in the next section!
The 60/40 portfolio is a popular long-term investment strategy that is simple to describe: you invest 60% of your money in a broad stock index fund, and the remaining 40% in a long-term bonds fund. At first sight, this portfolio seems to be a fire-and-forget investment that you don't have to touch after setting it up initially. But wait, the prices of the two assets will fluctuate differently over time, so the 60/40 ratio will also change over time! For example, if stocks appreciate in value by 10% in one year, but bonds do not gain or lose anything, your portfolio will contain 62.3% (=(60*1.1)/(60*1.1 + 40)) stocks and only 37.7% bonds at the end of the year. If the stock market crashes next year, you will be more exposed to this risk than the average 60/40 investor. This example shows that we need to rebalance our portfolio regularly to not let one asset accumulate and leave us over-exposed to certain kinds of risks. This process of rebalancing includes selling shares of assets that have overperformed others in the portfolio (i.e. taking profit) and buying additional shares of under-performing assets (i.e. hoping for a reversion to the long-term performance).
If rebalancing is so important, then we should do it often, right? Maybe every day? Well, choosing a suitable rebalancing interval also depends on the fees that your broker charges you for trading, and on the bid/ask spread (the difference in price depending on whether you buy or sell an asset) of the asset. Too much buying/selling to adjust small allocation deviations may be more expensive than the additional return you get by doing it. In addition, daily rebalancing requires you to be at your laptop sending out trades every day, and your time also has value (which may very well exceed the additional return you will get from your portfolio). Long-term investors often only adjust monthly, quarterly, once a year, or only if the actual weights of the assets in the portfolio deviate enough from their target values. In Chapter 3, we will devise simulation methods that will allow you to test different rebalancing methods to find the right one for your investment needs, but for now, we simply go with monthly rebalancing.
The easiest way to simulate a portfolio with fixed allocation weights (60% stocks and 40% bonds) and monthly rebalancing is to first compute monthly returns, i.e. by how much the price of the assets changes relative to last month's prices:
SPY TLT Date 2002-07-31 NaN NaN 2002-08-30 0.006802 0.055132 2002-09-30 -0.104853 0.042592 2002-10-31 0.082284 -0.036943 2002-11-29 0.061681 -0.009162 ... ... ... 2024-03-29 0.032702 0.007829 2024-04-30 -0.040320 -0.064555 2024-05-31 0.050580 0.028870 2024-06-28 0.035280 0.018171 2024-07-31 0.012091 0.034988 [265 rows x 2 columns]
The .resample('BM').last() method select the last price of each business month (in case the last day of a month falls on a weekend), and the .pct_change() method computes relative price changes. Note that the first row of the resulting dataframe contains NaN values, as there is no row prior to the first one to compute relative change from. We can get rid of that empty first row by dropping NaN values:
SPY TLT Date 2002-08-30 0.006802 0.055132 2002-09-30 -0.104853 0.042592 2002-10-31 0.082284 -0.036943 2002-11-29 0.061681 -0.009162 2002-12-31 -0.056570 0.045255 ... ... ... 2024-03-29 0.032702 0.007829 2024-04-30 -0.040320 -0.064555 2024-05-31 0.050580 0.028870 2024-06-28 0.035280 0.018171 2024-07-31 0.012091 0.034988 [264 rows x 2 columns]
Looking at these values, we know for example that stocks lost over 10% in September 2002, whereas bonds gained over 4% in value in the same period (seconf row from the top). But how do we combine these returns to obtain the returns of our combined portfolio? It's fairly simple at this point: for each row of our dataframe, we compute the weighted average return, and the weights are our allocation weights. With our static weights of 60% for stocks and 40% for bonds, this calculation is carried out as follows:
Date
2002-08-30 0.026134
2002-09-30 -0.045875
2002-10-31 0.034593
2002-11-29 0.033344
2002-12-31 -0.015840
...
2024-03-29 0.022753
2024-04-30 -0.050014
2024-05-31 0.041896
2024-06-28 0.028437
2024-07-31 0.021250
Freq: BM, Length: 264, dtype: float64Still, we're not quite happy with these portfolio returns, as we are interested in the long-term return of our investments, not really in the monthly fluctuations. To obtain the equity curve, i.e. the portfolio value relative to its starting value over time, we need to accumulate returns and account for compounding of returns. After all, we aim to simulate the case that we do not withdraw profits, but let them run over the whole investment period. To accomplish that, we do the following calculation:
Date
2002-08-30 1.026134
2002-09-30 0.979060
2002-10-31 1.012929
2002-11-29 1.046704
2002-12-31 1.030124
...
2024-03-29 5.807165
2024-04-30 5.516728
2024-05-31 5.747855
2024-06-28 5.911304
2024-07-31 6.036917
Freq: BM, Length: 264, dtype: float64Let's look at the individual steps of this calculation in a bit more detail:
1+portfolio_returns to obtain a factor that tells us how much money we will have at the end of a month, assuming that we have $1 at the beginning of the month. For example, if the portfolio gains 5% in a month, we will have (1+0.05)=1.05 dollars at the end of the month.np.cumprod does all the multiplication steps for all the individual monthly returns for us, and it keeps all the intermediate results for all timestamps (that is why it's called cumprod, short for cumulative product). This way, we get a series of values that tell us how much our portfolio is worth in each month, assuming that we started with $1 at the very beginning.Let's plot the resulting portfolio equity curve:
As we can see, the 60/40 portfolio combines benefits of both investments: you get the stability of the bonds, but also some outperformance of the stocks. Plus, large stock crashes like during the US financial crisis of 2008 are dampened by the counter-movement of bonds prices. Still, since the Corona crash, the 60/40 portfolio has lost some popularity, mostly because many investors did not anticipate the risk of holding long-term bonds when interest rates are rising quickly, resulting in unusually large value drawdowns bond funds.
We have seen that the 60/40 portfolio really generates a benefit for investors during times when stocks and bonds move contrary to each other (effectively hedging eachother), but can be a problem when this nice relationship between assets breaks (like after the Corona crash, when governments worldwide increased interest rates to fight inflation). So how can we quantify the relationship between assets, whether they move together or contrary to each other?
We can do this by computing the correlation coefficient between them. A correlation coefficient takes the value of 1 if the returns of the assets are perfectly positively correlated, meaning that every time that the SPY exhibits a return that is larger than its average return, TLT will also exhibit a return that is larger than its average return, and vice versa. If the correlation coefficient is positive but smaller than one, there is still a tendency that above-average returns and below-average returns coincide for both assets, but not every time. If the correlation coefficient is zero, we detect no relation between the returns of the two assets (which does not necessarily mean that there is no relation, just that our correlation coefficient cannot detect the relation). If the correlation coefficient is negative, we see "anti-correlation", i.e. there is a tendency that above-average returns of SPY are accompanied by below-average returns of TLT, and vice versa. This negative, or anti-correlation between assets is what investors are seeking, as they can then combine the long-term growth of multiple assets, but reduce the overall risk of large price fluctuations of their portfolio.
The most commonly used correlation coefficient is the Pearson correlation coefficient, which assumes a linear relation between two variables, meaning that it implies that the returns of SPY, , and the returns of TLT, , should fluctuate around a straight line when plotted against each other. Let's have a look at this:
Well, that does quite look like the points really lie on a straight line! Of course, we would only expect to see that the points form a narrow straight line if the correlation was close to -1 or 1. So what this oblated point cloud already tells us is that the correlation between SPY and TLT is rather weak when measured over the whole historical period. Still we may see that the point cloud is tilted downwards towards the right, indicating negative correlation if any. To quantify this further, we use the scipy.stats Python package to calculate the Pearson correlation coefficient:
PearsonRResult(statistic=-0.11968226633040632, pvalue=0.05209278993279599)
As we can see, we do not only get back one value (the value of the estimated correlation coefficient), but a second value, the p-value. The p-value tells us how trustworthy the estimated value of the correlation coefficient really is. Importantly, the smaller the p-value, the more we can trust the result! It takes values between 0 (absolutely trustworthy) and 1 (do not trust at all). Think about it this way: If we only had 1 year of historical data on SPY and TLT, so only 12 monthly returns, and they aligned perfectly, i.e. the greater the returns of SPY, the smaller the returns of TLT, would you believe the resulting correlation coefficient of -1 and bet money on that perfect correlation? Probably not, it feels too uncertain as the perfect negative correlation could be a product of pure chance, and the next few months could easily negate our result. Let's try this example and compute the correlation coefficient from just the first 12 monthly returns in our data set:
PearsonRResult(statistic=-0.3082014294118599, pvalue=0.32974817350594693)
As you can see, using only 12 months of data, we get a larger p-value of 0.33, compared to the value of 0.05 when we use all available data. So we can see that the p-value somehow indicates how likely it is that in reality we have zero correlation, but due to the fact that we only have a finite amount of data, we see some spurious correlation (Note: statisticians often insist on the fact that the p-value is not equal to the probability that an effect is created by random chance, and they are certainly right, but we do not want to lose ourselves in mathematical details here). But what p-value is a good p-value, at which point do we start to believe that there is in fact a non-zero correlation between SPY and TLT?
That is not an easy question to answer; in scientific projects, a common cutoff is , or . Our estimate of the Pearson correlation coefficient misses both those values (even if just by a bit). If we could add some proxy data to extend our historical data by more years, however, we could see that the negative correlation between SPY and TLT becomes statistically significant. It is worth noting, though, that in finance, you can make profit by exploiting non-significant correlations, but also the opposite is true: in Chapter 2, we will see that extreme values in financial time series can lead us in believing certain statistical properties that are in fact not true and will be invalidated when the next crash or extreme event happens!
To complicate things a bit more, there is not just the Pearson correlation coefficient, but also other coefficients that compute correlation in slightly different ways! One example is Spearman's rank correlation coefficient, implemented in scipy.statsas spearmanr. In contrast to the Pearson correlation coefficient, Spearman does not compare whether individual values in both series are above or below their respective average, but whether their ranks are above or below the average. So it's about the position of a value in the list of sorted values, rather than the value compared to the average value. This makes Spearman's rank correlation coefficient more robust against outlier values (which we definitely have in financial time series, wait for Chapter 2 to see just how extreme these values can get) and allows it to capture (at least some types of) non-linear correlations. Non-linear correlations are any kind of correlation where the points of both series form a certain pattern, but this pattern does not approximate a straight line, see these illustrative examples on wikipedia.
Let's compute Spearman's rank correlation coefficient for the monthly returns of SPY and TLT to see how the result differs from the Pearson correlation coefficient:
SignificanceResult(statistic=-0.1294543106653014, pvalue=0.03553081400917159)
We can see that the value of the correlation coefficient is approximately the same (a tiny bit stronger compared to Pearson), and the p-value is a bit smaller, this time below the common threshold of for statistical significance! But how can one coefficient reach significance while the other one does not? Well, Pearson tests exclusively for linear correlation, whereas Spearman may also captures non-linear correlation and is more robust to outliers. So Spearman is a more genenral test, and Pearson is a more specific test. Thus, we expect that if our data were indeed linearly correlated, then Pearson—being the more specfic test—should yield a more desicive result, whereas if the correlation is of non-linear nature or if outlier values are present—as is the case here—then Spearman may yield significance when Pearson does not. In the end, the most trustworthy results should always be confirmed using multiple different approaches. If one has to explicitly search for a single test to be positive, one is probably hunting a ghost in the machine and not a real signal!
Ok, so there are different types of correlation metrics, and correlation values based on too few data points can be spurious, but still, we have ignored another effect that can hinder us from estimating the correlation between two assets correctly: regime shifts (also called regime changes, regime switches or break points)! When macroeconomic policies changes, the relation between assets may also change, including the correlation of returns. Essentially, this means that we have to assume that correlation between assets changes over time—sometimes gradually, sometimes abruptly. In later chapters, we will learn how to properly handle time-varying parameter in our models while accounting for statistical significance and all the subtle details, but for a first demonstration, we may simply compute the Pearson correlation coefficient based on a rolling window of the trailing 12 months. We can do this using the Pandas methods rolling and corr:
Of course, the rolling window estimates of the correlation coefficient fluctuates quite a bit over time, but we have expected that as we only base the correlation estimate on 12 data points at each point in time. Still, we may see a greater picture emerge from the fluctuations: Prior to 2021, the correlation estimate fluctuated mostly within the negative range, from -1 to slightly above 0. After 2021, however, correlation increases quickly and reaches high positive values. This regime change is due to drastic changes in interest rate policies across the globe, and it eradicates the nice negative correlation on which many investors of the 60/40 portfolio relied on for risk mitigation. To see the effect of this regime change on portfolio performance, we will plot the portfolio volatility, also within a rolling window of 12 months:
As we can see, the rolling portfolio volatility (and thus the magnitude of fluctuations we see every day in our brokerage account) usually hover below or at 10% annualized volatility, except for two occasions: First, during the financial crisis of 2008/09, when correlation was breifly positive and the stock market volatility skyrocketed, and second since 2021, when the correlation between SPY and TLT turned positive again.
Positive correlation magnifies portfolio fluctuations! This is why most portfolio optimization approaches not only select allocation weights based on past performance, but also based on the correlation between different assets. Ideally, one would find many different assets with zero or even negative correlation to each other, such that one can profit from uncorrelated returns, while the random fluctuations (partially) counteract each other, resulting in a smooth, upwards pointing equity curve. Finding different, uncorrelated sources of returns is the main goal of many hedge funds, which invest not only in stocks or bonds, but also other (more or less non-correlated) markets such as commodities (Gold, Aluminium, Cattle, Soybeans, ...), currencies, volatility indicies (betting on whether stock market fluctuations will become stronger or weaker), exotic option markets (with very non-linear payouts), or even sports betting.
For now, we are content with our insight that correlations between different assets matter a lot when it comes to overall portfolio performance, and that picking stocks or other assets based solely on their past performance is not a good idea, one has to have the whole portfolio in mind—a holistic picture of portfolio optimization. In Chapter 3, we will revisit this idea and study the traditional mean-variance approach based on Markowitz, but also a more modern, simulation-based approach that can account for non-linear relations between assets. In Chapter 4 we will then build dynamic allocation techniques that can react to market conditions (such as changing correlations), thus mitigating at least some of the risk that such regime shifts like the one discussed here hold to an investor.