Portfolio Optimization and Correlation

As frequent readers of our blog may have noticed 'Physics Friday' has more or less become a 'Physics Weekend' post - hope despite this some of you still enjoy them.

In our last post we discussed some of the questions which may arise when working with random matrices. But I still owe you an explanation way random matrix theory may affect your life in quant finance. Today I try to answer parts of your question and we will make a short tour in portfolio optimization.

Suppose the task is to build a portfolio of N assets, then the daily variance of the portfolio return is given by



Here Cij is the correlation matrix and in order to measure and optimize risk in a portfolio it is essential to obtain a reliable estimate for the correlation matrix.In general this is difficult as the number of assets in the portfolio N may not be significantly larger than the number of days T in the time series (4 years of data give 1000 entries in the time series and the typical size of a portfolio is several hundred assets). The order of entries to estimate for the correlation matrix is N2/2. An accurate estimation of the correlation matrix would require q=N/T to be significantly smaller than 1.
An optimal (Markowitz) portfolio using this empirical correlation matrix would have

The gain of the portfolio is 


with gi the predicted gains of a single asset.
If r is the daily stock return at time t the empirical variance of each stock is


and the empirical correlation matrix is obtained as




as the corresponding risk (the risk of the portfolio over the period used to construct it). We call this the in sample risk. Now assume there is a "true" correlation matrix C which is perfectly known resulting in a risk


We now use this perfect correlation matrix to draw past and future x then we can construct a portfolio which risk is given by


The subscript out refers to the fact, that the risk is constructed using E but observed on the next period (Remember that we can draw future samples of x!).

So we have three possible estimates and it remains to understand their biases. One can use convexity arguments for the inverse of positive definite matrices to show that the out-of-sample risk of an optimized portfolio is larger (and in practice, this can be much larger) than the in-sample risk, which itself is an underestimate of the true minimal risk. This is a general situation: using past returns to optimize a strategy always leads to over-optimistic results because the optimization adapts to the particular realization of the noise, and is unstable in time [Potters et al, Financial Applications of Random Matrix Theory: Old Laces and New Pieces].
Only in the limit q going to 0 these quantities will coincide, since in this case the measurement noise disappears.

The question is how to "clean" the empirical correlation matrix to avoid (f possible) such biases in the estimation of future risk. And here RMT enters the game - how read again next "Physics Weekend" post.