Exploitation of second-order information in models estimation

With the advances in big data analyses, the estimation of models is a more and more important issue. Two main estimation techniques are prevalent: least-squares minimization and likelihood maximization. These problems are typically solved using iterative optimization techniques, but given a model and an initial, the time required to obtain satisfactory estimates can be prohibitive due to the large number of observations, especially when non-linearities or/and heterogeneity have to be taken into account. Various strategies have been proposed to speed up the estimation time, especially when the models are twice-continuously differentiable. The first possibility is to increase the number of observations considered in the optimization only when close to the solution, as the first iterations solutions will be discarded. The question remains how to increase the sample during the estimation process, and how to select the observations to add. The second possibility is to exploit the mathematical structure of the problems, especially when using optimization techniques relying on quadratic models. In both cases, if the model is properly identified, the Hessian of the function to optimization can be related to the outer product of the scores, that is the individual gradient contributions to the average gradient of the objective function. However, model misspecification can lead to non-convergent candidate solutions sequences, as the constructed quadratic models do not properly approximate the objective function when close to the true parameters values to estimate. In order to circumvent this limitation, Hessian correction techniques, based on the secant equation, have been proposed. This has lead to Gauss-Newton algorithm in case of the non-linear least-squares estimation problem, and similar techniques for maximum likelihood. In the latter cases, we have however observed that for complex models, corrections based on positive-definite Hessian approximations, as BFGS, can perform quite poorly, while other techniques, e.g. SR1, can perform much better. Our current intuition is that since the outer product of the scores is already positive definite, to constraint the eigenvalues of the correction matrix to be positive can prevent to have an adequate correction. We also expect that a similar effect is prevalent for least-square estimation. The goal of the project is to validate this explanation and provide more adequate guidance in Hessian correction when estimating models by least-squares minimization or maximum likelihood estimation using standard non-linear optimization algorithms, that is line-search methods or trust-region algorithms. The applications to variants, as retrospective techniques, will also be considered. The impact on some adaptive sampling strategies, that allow to vary the number of observations used in the objective function at each iteration, will also be considered.

Faculty Supervisor:

Fabian Bastin

Student:

MANYUAN TAO

Partner:

Discipline:

Computer science

Sector:

University:

Program:

Globalink