Cleaning Matrices

After last weeks overview over biases occurring in the calculation of the empirical correlation Matrix,we try to answer the question how one can "clean" matrices to avoid, at least up to a certain degree, such biases in the estimation of future risk.

In a first step we rewrite the Markowitz solution  in terms of the eigenvalues l and eigenvectors V of the correlation matrix (take a look at this weeks Mathematics Wednesday, where Andreas Binder's blog post also revealed some traps in the calculation of correlation matrices).


Taking a closer look at the formula above we see, that the first term corresponds to the naive solution: one invests proportionally to the expected gains. The second term leads to a suppression of eigenvectors  with l>1 and an enhancement of weights where l<1. This can lead to the situation that in the optimal Markowitz solution large weights are allocated to small eigenvalues, which may be dominated by measurement noise.

Perhaps before going on we should think over the meaning of the eigenvectors V corresponding to the large eigenvalues. The largest eigenvector corresponds to a collective market mode, whereas other large eigenvectors are sector modes. So a simple way to avoid the instability would be to project the largest eigenvalues out. This approach would lead to quite a good portfolio in terms of risk, since most of the volatility is contained in the market and sector modes.

More elaborated ways aim to use all the eigenvectors and eigenvalues but only after "cleaning them".  In [L. Laloux, P. Cizeau, J.-P. Bouchaud and M. Potters, Phys. Rev. Lett. 83, 1467 (1999); L. Laloux, P. Cizeau, J.-P. Bouchaud and M. Potters, Risk 12, No. 3, 69 (1999)] it has been suggested that one should replace all low lying eigenvalues with a unique value and to keep the high eigenvalues and eigenvectors (those with the meaningful economical information - section modes)


k' is the meaningful number of sectors kept and d is chosen such that the trace of the correlation matrix is preserved. The question how to chose k' remains. In the cited publication Random Matrix Theory has been used to determine the theoretical edge of the random part of the eigenvalue distribution and to set k' such that l(k') is close to this edge.

What is then the spectrum of the correlation matrix and how does this effect our estimation for correlation. We will follow these questions in our next posts …