Complete an Incomplete Correlation Matrix Problem

In many cases, due to paucity of data/information or dynamic nature of the problem at hand, it is not possible to obtain a complete correlation matrix (R). Some elements of R are unknown. In such cases, the problem is to obtain a valid (positive semidefinite) and complete correlation matrix. In absence of sufficient side conditions that are often impracticable to specify, this problem cannot be solved uniquely.

Several methods have been suggested to complete an incomplete correlation matrix - that is to obtain a valid complete correlation matrix from an incomplete correlation matrix (some of whose elements are unknown). In view of non-unique solutions admissible to the problem of completing the correlation matrix, some authors have suggested numerical methods that provide ranges to different unknown elements. However, those methods are limited to the incomplete correlation matrices of a very small size (such as of 4 x 4 order).

We provide a method (and a Fortran program - source codes) that completes a given incomplete correlation matrix of an arbitrary order. The resulting complete matrices are many in number, but all of them are valid (positive semi-definite - with all non-negative eigenvalues). Additionally, the suggested method does not require any pre-assigned pattern as in case of many other methods. It allows for holes (h or unknown elements) in any row and any column. The program that works out such complete matrices does not require any interaction with the user either. The method is based on the Global Optimization procedure (Differential Evolution or DE). The input matrix may be as follows (for example). The elements h1, h2, etc are the numbers beyond the range [-1, 1].

1.0 0.9h1 0.7
1.00.6 0.3

The Structure of Computer Program and Hints on its Use: The main program (in Fortran) to complete a correlation matrix has eight subroutines. The main program reads the input matrix from a file specified by the user. This file stores the main diagonal and upper diagonal elements of the given matrix (shown in the matrix above). Thus the first row has n elements beginning with 1.0; the second row has n-1 elements beginning with 1.0 and so on such that the last (nth) row has only one element (=1.0). In making the input matrix file one has to indicate the known and the unknown elements differently. While the known elements naturally lie between [-1, 1] they are put as they really are. However, a number lying beyond the range [-1, 1] represents an unknown element. The value could be any number such as 2, -3, 1.5, etc that cannot be a correlation coefficient. For example, if r12 is known to be 0.9, say, it will be put as 0.9, but if if r13 is unknown it may be represented by a number, say 2.0 or -1.9 and so on. A number outside the range [-1,1] indicates that it is a hole or an unknown correlation coefficient. When the program runs, it asks for the order of input matrix (morder) and the name of input data file in which the input matrix is already stored. The user has to specify them. The program also asks to name the output file in which the final results (valid correlation matrices) would be stored. The user should specify it. Then the program asks for a random number seed. Any 4-digit odd number (say 1271) can be fed as a seed. Subsequently, the program asks for the number of unknown elements (m) in the input matrix. This also has to be given by the user. The main program calls subroutine DE (differential Evolution optimizer). It asks for inputs from the user; the population size (N) and the number of iteration to be performed. The population size determines the number of valid matrices to be obtained as output. It should be normally 100 or so, but for larger problems, this number should be larger. The number of iterations should be specified at 1000 or larger. Then the program needs another random number seed that could be any 4-digit odd number. Once these inputs are given, DE starts running.

Other subroutines in the program are: Normal (generates normally distributed random numbers), Random (generates uniformly distributed random numbers between 0 and 1), Fselect (chooses a function), Func (organizes function calls), Eigen (computes eigenvalues and vectors; the program adapted from Krishnamurthy & Sen ,1976), Concor (constructs correlation matrices for optimization) and Ncorx (constructs valid correlation matrices and stores them in the output file specified by the user).

References :
  • Krishnamurthy, E.V. & Sen, S.K. (1976) Computer-Based Numerical Algorithms. Affiliated East-West Press, New Delhi.
  • Mishra, S.K. (2007) "Completing Correlation Matrices of Arbitrary Order by Differential Evolution Method of Global Optimization: A Fortran Program". Available at SSRN: Download.


Other Fortran Computer Programs



Journal of Alternative Economic Analysis

The Journal focuses on the research contributions to various upcoming approaches to analysis of the functioning of real-world economies in different countries. It welcomes original research articles/papers on agent-based computational, evolutionary, old (new) institutional, behavioral, ecological and environmental economics.

Quarterly Bulletin of the Department

The Department publishes a quarterly bulletin to publicize the research output of high quality emanating from the research work/activities of its faculty members and research scholars.
Current Research Work in the Department

A number of research topics, especially on the economy of the upland areas, are currently being investigated in the Department.
Design downloaded from free website templates.