Nonlinear Least Squares by Differential Evolution/Particle Swarm Optimization
Curve fitting or estimation by nonlinear least squares is a difficult task. There are two types of algorithm that are often used for this purpose: those that need evaluation of derivatives and the others that do not. In the first category we have Gauss-Newton, Levenberg-Marquardt and Fletcher-Powell, Fletcher-Reeves, Rosen, Quasi-Newton, etc while in the second category we have Powell, Rosenbrock, Hooke-Jeeves, Nelder-Mead and Box, etc (see Kuester and Mize, 1973). All of them are very prone to be caught into local minimum trap. Recently, some methods of global optimization have been developed that are based on some sort of stochastic process. The method of Differential Evolution (Storn and Price, 1995) is perhaps the most promising among them. In developing this program we have used DE as an optimizer of the nonlinear least square function. The DE is an evolutionary, population-based, algorithm; a development on the Genetic Algorithm. The DE as a method of optimization has been well tested on many extremely difficult multi-modal problems of nonlinear optimization. Of course, the Particle Swarm method of Global Optimization also may be quite effective for nonlinear curve fitting.
We provide here the Fortran program (source codes based on DE) for fitting nonlinear curves to the given data. We also provide here the alternative program based on the Repulsive Particle Swarm Optimization(source codes based on RPS). To use any of the programs one has to use a suitable FORTRAN compiler. A Fortran compiler may be obtained free from force.lepsch.com or alternatively from Download.com website. First, the Fortran Compiler may be installed. Then the program NLINLS (or rpsfit) may be copy-pasted from the source file (that may be in the text file such as NLINLS.txt or rpsfit.txt) to the FORCE editor and saved as NLINLS.f or rps.fit.f as the case may be. However, if you have obtained this program as NLINLS.f (or rpsfit.f) then after installing the Fortran compiler you may directly double click it. It will be taken up by FORCE automatically.
The first step in using the program is to make a text file (for example, HOUGEN.TXT in which data (only data, the colored portion in the left panel) are to be stored in the scheme exemplified in the left panel. The sequence of columns is sl no., x1, x2, ..., y. We suggest that the data file and the NLINLS.f should be saved in the same folder (directory) to avoid typing long paths afterwards.
The user of NLINLS (or rpsfit) program has to specify the parameters, bounds (limits) on them as well as the function to be fitted to data. This is done in the last subroutine of the program: REGFUNC. It has been specifically mentioned in the REGFUNC as to what is to be modified in the program and what is not be altered. Only three sets of changes are needed: (1) defining the set of parameters; p1 = p(1), p2 = p(2), etc depending on the problem; (2) defining the set of variables, x1 = datum(i, 1), x2 = datum(i, 2), ..., y = datum(i, last), for example, in the Hougen problem x1 = datum(i, 1), x2 = datum(i, 2), x3 = datum(i,3), y = datum(i, 4); and finally, (3) defining yx in terms of p and x, such that in the Hougen problem the function is defined as yx = (p1 * x2 - x3 / p5)/(1.d0 + p2 * x1 + p3 * x2 + p4 * x3). Almost never one would require changing FORMAT(I5, 2F25.12), but one may change it if (at all) needed. Once the changes in the program are made, it may be saved and compiled or even run directly. If no error has been committed in changing/redefining as advised above, the program will run. While it runs, the program asks for several inputs from the user. The NLINLS program issues clear instructions to be (normally) followed.
2. "Feed the name of input file ..." : As has been advised above, the user is ready with the input data file, for example, in mydata.txt named file. He/she may type it and strike the Enter key.
3. "Number of observations and variables ..." : The user should type the number of variables (e.g. 13 in the Hougen problem above) and the number of variables (including y) in the problem (e.g. 4 in the Hougen problem above). Then strike the Enter key. His/her input from the keyboard will be 13, 4 Enter.
4. "No. of parameters to be estimated" : It is the number m in p1, p2, ... , Pm. For the Hougen problem it is 5. Input 5 and Enter.
5. "Would you specify any limits ... " : If the user has to specify some bounds (limits) on the values of parameter, he/she should type an integer number other than zero (0), e.g. 2, 3 or 4, etc and Enter. But if he/she has not to specify any bounds, he/she should type 0 and Enter.
6. If the user has chosen to provide bounds on the parameter, then he/she will be asked to provide those (lower and upper limits).
7. "Specify the [lower upper, lower upper ..." : The user may specify them. For example, the lower upper bound on the parameters of the Hougen problem above may be given as : -10 10, -10 10, -10 10, -10 10, -10 10 Enter
8. "Option: Would you minimize ..." : If the user wants to minimize sum of squares, he/she should type 0 and Enter. If he/she wants to maximize R2 then type any non-zero integer and strike the Enter key. Both amount to the same; but display of results are different.
Once these inputs are accepted by the computer, the program enters into DE optimization. A number of inputs are needed by the DE algorithm. These are:1. "Population size and number of iterations ..." : Population size should not be less than ten times the number of parameters to be estimated. For example, in the Hougen problem above, 5 parameters are there. So population size should not be less than 50. Unless the number of parameters is more than 10, population size = 100 is good enough. Number of Iterations = 10000 is enough for moderately sized problems.
2. "Cross-over probability ... and two scale parameters" : It depends on the complexity of the problem. If the problem appears to be simple, (0.9, 0.5, 0) is alright and very effective. In some cases (0.9, 0.9, 0.01) performs better. In some other cases (0.5, 0.9, 0.1) has worked. In case of complicated problems one may run the program several times with these alternatives 0.9, 0.5, 0 Enter.
3. "Accuracy for convergence" : For most problems 0.0001 would be all right.
4. "Random number seed." : any 4-digits odd integer such as 5781 Enter.
5. "Name of output file" : Such as myresults.txt Enter.
And the program goes in for computation. All intermediate results pop up on the screen. All intermediate and final results including expected values of y are stored in the output file specified by the user (e.g. myresults.txt). Note that it is automatically saved in the same folder (directory) where NLINLS.exe (rpsfit.exe) and data file are already saved. Then one may enter into that folder and double click the output file (e.g. myresults.txt) to view the results. One may also open it by any other text editor.
Similarly, for using the rpsfit, the instructions may be followed and information needed for using RPSFIT should be provided while the program runs.
- Kuester, J.L. and Mize, J.H. (1973) Optimization Techniques with Fortran, McGraw-Hill Book Co., New York.
- Mishra, S.K. (2007) "Performance of Differential Evolution Method in Least Squares Fitting of Some Typical Nonlinear Curves", Journal of Quantitative Economics, 5(1), pp. 140-177. Available at SSRN Download.
- Storn, R. and Price, K. (1995) "Differential Evolution - A Simple and Efficient Adaptive Scheme for Global Optimization over Continuous Spaces": Technical Report, International Computer Science Institute, Berkley.
Other Fortran Computer Programs |