## Nonlinear Parameter Estimation

I may not be an expert in machine learning, but as far as I know, the fitnet function of matlab implements function fitting using a neuron network. Although the network in fitnet is not deep , it may indicate that neuron network is capable of something beyond classification of finite classess. This is one of my puzzles about neuron network: I do find classification only in frameworks like tensorflow, but also come across fitting in more traditional tools like the fitnet function.

Does it become a common regression problem? I don't know whether one would call your problem a regression problem, but in any case, it's not the kind of problem solved by deep learning. If you have follow-up questions, use 'Ask Question' rather than posing them in a comment. This is a question-and-answer site, so we stick to a narrow format: one question; one answer. If you'd like a solution to the problem I suggest you follow the advice in the last sentence of my answer. You are estimting parameters for a set of inputs mapping via f to a set of outputs. Clearly the objective you wish to minimise is something like empirical risk minimisation.

## Nonlinear Parameter Estimation

If f was not a fixed formula then you can use a NN to estimate f. I do agree given f is fixed NNs are not suitable. This case of optimisation equates to regresion constrained to a specific RK-hilbert-space. Here is a very detailed tutorial. I have not tried it yet.

## Mahdi Soltanolkotabi's website

I will come back with new results. However, it will not be deep learning or Neural Network anymore. It just uses the Tensorflow's power to facilitate the optimization. The Tensorflow use the forward and backward propagation to solve the problem. I will MXnet and other packages to see if the idea works as well. Jon Duan Jon Duan 1 1 1 bronze badge.

It is good that you are also interested in the question, but it seems that your post doesn't answer the question. SGD is linear only and you only guess that the single function could be fed to Tensorflow. Even if it supports non-linear minimization, the link connecting it to the problem is missing. If you want to get back after some research but it will be neither deep learning nor Neural Network it will also miss the main point of the question as stated.

I am aware this problem. If you are not interested in how the minimization of the loss function is done, only that it can be done, you may skip the following paragraphs. However, you may find it useful to know a little about these procedures in case your regression model "refuses" to be fit to the data. In that case, the iterative estimation procedure will fail to converge, producing ever "stranger" e. In the following paragraphs we will first discuss some general issues involved in unconstrained optimization, and then briefly review the methods used.

For more detailed discussions of these procedures you may refer to Brent , Gill and Murray , Peressini, Sullivan, and Uhl , and Wilde and Beightler A common aspect of all estimation procedures is that they require the user to specify some start values, initial step sizes, and a criterion for convergence.

All methods will begin with a particular set of initial estimates start values , which will be changed in some systematic manner from iteration to iteration; in the first iteration, the step size determines by how much the parameters will be moved. Finally, the convergence criterion determines when the iteration process will stop. For example, the process may stop when the improvements in the loss function from iteration to iteration are less than a specific amount. Penalty Functions, Constraining Parameters. These estimation procedures are unconstrained in nature.

When this happens, it will move parameters around without any regard for whether or not permissible values result. For example, in the course of logit regression we may get estimated values that are equal to 0. When this happens, it will assign a penalty to the loss function, that is, a very large value. As a result, the various estimation procedures usually move away from the regions that produce those functions.

However, in some circumstances, the estimation will "get stuck," and as a result, you would see a very large value of the loss function. This could happen, if, for example, the regression equation involves taking the logarithm of an independent variable which has a value of zero for some cases in which case the logarithm cannot be computed. If you want to constrain a procedure, then this constraint must be specified in the loss function as a penalty function assessment.

By doing this, you may control what permissible values of the parameters to be estimated may be manipulated. Local Minima. The most "treacherous" threat to unconstrained function minimization is local minima. For example, a particular loss function may become slightly larger, regardless of how a particular parameter is moved. However, if the parameter were to be moved into a completely different place, the loss function may actually become smaller.

You can think of such local minima as local "valleys" or minor "dents" in the loss function. However, in most practical applications, local minima will produce "outrageous" and extremely large or small parameter estimates with very large standard errors. In those cases, specify different start values and try again.

Also note, that the Simplex method see below is particularly "smart" in avoiding such minima; therefore, this method may be particularly suited in order to find appropriate start values for complex functions. Quasi-Newton Method. As you may remember, the slope of a function at a particular point can be computed as the first- order derivative of the function at that point.

The "slope of the slope" is the second-order derivative, which tells us how fast the slope is changing at the respective point, and in which direction. The quasi-Newton method will, at each step, evaluate the function at different points in order to estimate the first-order derivatives and second-order derivatives. It will then use this information to follow a path towards the minimum of the loss function. Simplex Procedure. This algorithm does not rely on the computation or estimation of the derivatives of the loss function.

### Account Options

For example, in two dimensions i. These three points would define a triangle; in more than two dimensions, the "figure" produced by these points is called a Simplex. Intuitively, in two dimensions, three points will allow us to determine "which way to go," that is, in which direction in the two dimensional space to proceed in order to minimize the function.

The same principle can be applied to the multidimensional parameter space, that is, the Simplex will "move" downhill; when the current step sizes become too "crude" to detect a clear downhill direction, i. An additional strength of this method is that when a minimum appears to have been found, the Simplex will again be expanded to a larger size to see whether the respective minimum is a local minimum.

Thus, in a way, the Simplex moves like a smooth single cell organism down the loss function, contracting and expanding as local minima or significant ridges are encountered. Hooke-Jeeves Pattern Moves. In a sense this is the simplest of all algorithms. At each iteration, this method first defines a pattern of points by moving each parameter one by one, so as to optimize the current loss function. The entire pattern of points is then shifted or moved to a new location; this new location is determined by extrapolating the line from the old base point in the m dimensional parameter space to the new base point.

The step sizes in this process are constantly adjusted to "zero in" on the respective optimum.

This method is usually quite effective, and should be tried if both the quasi-Newton and Simplex methods see above fail to produce reasonable estimates. Rosenbrock Pattern Search. Where all other methods fail, the Rosenbrock Pattern Search method often succeeds. This method will rotate the parameter space and align one axis with a ridge this method is also called the method of rotating coordinates ; all other axes will remain orthogonal to this axis. If the loss function is unimodal and has detectable ridges pointing towards the minimum of the function, then this method will proceed with sure-footed accuracy towards the minimum of the function.

However, note that this search algorithm may terminate early when there are several constraint boundaries resulting in the penalty value; see above that intersect, leading to a discontinuity in the ridges. Hessian Matrix and Standard Errors.

Parameter Estimation with MATLAB fmincon and Python minimize

The matrix of second-order partial derivatives is also called the Hessian matrix. Intuitively, there should be an inverse relationship between the second-order derivative for a parameter and its standard error: If the change of the slope around the minimum of the function is very sharp, then the second-order derivative will be large; however, the parameter estimate will be quite stable in the sense that the minimum with respect to the parameter is clearly identifiable.

If the second-order derivative is nearly zero, then the change in the slope around the minimum is zero, meaning that we can practically move the parameter in any direction without greatly affecting the loss function. Thus, the standard error of the parameter will be very large. The Hessian matrix and asymptotic standard errors for the parameters can be computed via finite difference approximation. This procedure yields very precise asymptotic standard errors for all estimation methods. After estimating the regression parameters, an essential aspect of the analysis is to test the appropriateness of the overall model.

Proportion of Variance Explained. Even when the dependent variable is not normally distributed across cases, this measure may help evaluate how well the model fits the data. Goodness-of-fit Chi-square. For probit and logit regression models, you may use maximum likelihood estimation i. The degrees of freedom for this Chi-square value are equal to the difference in the number of parameters for the null and the fitted model; thus, the degrees of freedom will be equal to the number of independent variables in the logit or probit regression.

If the p -value associated with this Chi-square is significant, then we can say that the estimated model yields a significantly better fit to the data than the null model, that is, that the regression parameters are statistically significant. Plot of Observed vs. Predicted Values. It is always a good idea to inspect a scatterplot of predicted vs. If the model is appropriate for the data, then we would expect the points to roughly follow a straight line; if the model is incorrectly specified, then this plot will indicate a non-linear pattern.

Normal and Half-Normal Probability Plots. The normal probability plot of residual will give us an indication of whether or not the residuals i. Plot of the Fitted Function. For models involving two or three variables one or two predictors it is useful to plot the fitted function using the final parameter estimates.

Here is an example of a 3D plot with two predictor variables:. This type of plot represents the most direct visual check of whether or not a model fits the data, and whether there are apparent outliers. When a model is grossly misspecified, or the estimation procedure gets "hung up" in a local minimum, the standard errors for the parameter estimates can become very large. This means that regardless of how the parameters were moved around the final values, the resulting loss function did not change much.

## Select a Web Site

Also, the correlations between parameters may become very large, indicating that parameters are very redundant; put another way, when the estimation algorithm moved one parameter away from the final value, then the increase in the loss function could be almost entirely compensated for by moving another parameter. Thus, the effect of those two parameters on the loss function was very redundant. Products Solutions Buy Trials Support. Textbook Nonlinear Estimation.

Generalized Linear Mod. General Regression Mod. Graphical Techniques Ind. To index Estimating Linear and Nonlinear Models Technically speaking, Nonlinear Estimation is a general fitting procedure that will estimate any kind of relationship between a dependent or response variable , and a list of independent variables. Intrinsically Nonlinear Regression Models Some regression models which cannot be transformed into linear ones, can only be estimated via Nonlinear Estimation. However, if we set up the standard linear regression equation based on the underlying "feeling" or attitude we could write: feeling A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.