Unified nlsolve! implementation #325

devmotion · 2019-08-19T02:41:27Z

Since the changes of the non-linear solvers are going to be breaking anyway, I took this to the extreme and changed the structure of the non-linear solvers even more fundamentally than in #315.

Basically, with this PR we get a unified implementation of nlsolve! for general NLSolver structs, leading to identical convergence and divergence checks for different algorithms. New algorithms only have to implement a perform_step!(nlsolver, integrator, iter) function that computes the next iterate nlsolver.z based on the previous value nlsolver.zprev. Implementations can be optimized, e.g., by caching residuals and implementing norm_of_residuals(nlsolver, integrator) as well, which otherwise by default just calls into calculate_residuals with nlsolver.zprev and nlsolver.z and tolerances and norm of integrator. Moreover, further customization is possible by implementing preamble!(nlsolver, integrator) (which, e.g., can also be used for caching precalculated values) and loopheader!(nlsolver, integrator, iter) (which by default only updates nlsolver.zprev with nlsolver.z but can, e.g., be used to perform Anderson acceleration).

Since the nlsolve! interface is so generic, I moved most fields (such as dz and k) to the cache and only kept the fields in NLSolver that are part of the problem (tmp, gamma, and c), the convergence/divergence checks (kappa, eta_old, fast_convergence_cutoff), or the general algorithm logic (z, zprev, max_iter, nl_iters, status) and cache and alg. This choice is of course debatable and I'm not sure if it's the best one. I think, one could argue that fields that are probably used by most algorithms (such as dz or k, maybe?) should be added to the NLSolver struct; however, k is not needed for the out-of-place methods and in some methods one does not compute dz explicitly while computing the next iterate, so there might be cases in which one would not need these fields. On the other hand, one could also argue that p and dt are part of the mathematical problem formulation dt⋅f(tmp + γ⋅z, p, t + c⋅dt) - z = 0, and hence should be part of the NLSolver struct as well. Maybe one would even want a solver that is almost independent from the integrator, such that only values such as tmp, t or other values in the cache are updated from the integrator at the start of nlsolve! and then the algorithm runs (almost) independently and just returns a solution z at the end.

In addition to the generic nlsolve! this PR contains the following changes (which are also debatable):

in the implementation of the functional iterations u is never touched, in contrast to the current implementation which actually changes integrator.u in https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L205 (but only for the in-place version, so there exists an inconsistency between both versions at the moment)
the normalized difference z - zprev is computed based on the maximum values of z and zprev (at the moment, in lines such as https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L221 it's based on u (which changes in every iteration only in the in-place variant) and uprev)
the norm of these differences, which is used for divergence and convergence checks, is not updated anymore after performing Anderson acceleration (see Update of residuals in Anderson acceleration #310), and hence now they are always the norm of the normalized differences of G(z_k) - z_k instead of z_{k+1} - z_k
iipnlsolve and oopnlsolve are replaced with build_nlsolver which has a signature very similar to alg_cache
for consistency with build_jac_config and build_nlsolver (and build_solution), get_iip_uf and get_oop_uf are replaced with build_uf
instead of passing J and W to build_nlsolver, the implementation of build_nlsolver calls build_J_W for NLNewton
all algorithm-specific values (such as droptol, fast_convergence_cutoff, etc.) whose types are not fixed (such as aa_start) are converted to the appropriate unitless types of u or t and added to the NLSolver struct if they are used for the convergence/divergence checks or the caches otherwise
the default for droptol is changed to 1e10 (according to Walker's implementation of Anderson acceleration, see the latest changes in NLsolve)
support for resizing requires implementation of Base.resize!(nlcache, i), which is the default fallback of Base.resize!(nlcache, nlsolver, integrator, i), which can be implemented instead if the information in the cache alone is not sufficient

devmotion · 2019-08-19T10:32:57Z

I forgot to mention that I added an additional status MaxItersReached as well.

In the latest commit I moved the implementation of nlsolve! to different functions, such that a similar logic can be used for general AbstractNLSolver objects. Moreover, I added du_cache since it seems in OrdinaryDiffEq some algorithms save allocations by aliasing nlsolver.k to fsallast.

ChrisRackauckas · 2019-08-19T11:01:46Z

This needed to happen.

in the implementation of the functional iterations u is never touched, in contrast to the current implementation which actually changes integrator.u in https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L205 (but only for the in-place version, so there exists an inconsistency between both versions at the moment)

Good

the normalized difference z - zprev is computed based on the maximum values of z and zprev (at the moment, in lines such as https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L221 it's based on u (which changes in every iteration only in the in-place variant) and uprev)

Yes, I think in one of the issues I mentioned we should do this.

the norm of these differences, which is used for divergence and convergence checks, is not updated anymore after performing Anderson acceleration (see #310), and hence now they are always the norm of the normalized differences of G(z_k) - z_k instead of z_{k+1} - z_k

I like that

iipnlsolve and oopnlsolve are replaced with build_nlsolver which has a signature very similar to alg_cache

Makes sense.

instead of passing J and W to build_nlsolver, the implementation of build_nlsolver calls build_J_W for NLNewton

So then we should just do a semi-implicit nlsolver as well?

all algorithm-specific values (such as droptol, fast_convergence_cutoff, etc.) whose types are not fixed (such as aa_start) are converted to the appropriate unitless types of u or t and added to the NLSolver struct if they are used for the convergence/divergence checks or the caches otherwise

Interesting.

ChrisRackauckas · 2019-08-19T11:04:06Z

Since the nlsolve! interface is so generic, I moved most fields (such as dz and k) to the cache and only kept the fields in NLSolver that are part of the problem (tmp, gamma, and c), the convergence/divergence checks (kappa, eta_old, fast_convergence_cutoff), or the general algorithm logic (z, zprev, max_iter, nl_iters, status) and cache and alg. This choice is of course debatable and I'm not sure if it's the best one. I think, one could argue that fields that are probably used by most algorithms (such as dz or k, maybe?) should be added to the NLSolver struct; however, k is not needed for the out-of-place methods and in some methods one does not compute dz explicitly while computing the next iterate, so there might be cases in which one would not need these fields.

I think that's the way to go. Commit to it being generic and make specific caches add the things they need. My goal here is to get a documentable interface for adding new nlsolvers.

On the other hand, one could also argue that p and dt are part of the mathematical problem formulation dt⋅f(tmp + γ⋅z, p, t + c⋅dt) - z = 0, and hence should be part of the NLSolver struct as well. Maybe one would even want a solver that is almost independent from the integrator, such that only values such as tmp, t or other values in the cache are updated from the integrator at the start of nlsolve! and then the algorithm runs (almost) independently and just returns a solution z at the end.

This one I'll have to think about. For now, having the integrator isn't bad, so that's fine. Eventually making it integrator free could make it useful outside of diffeq? Maybe.

ChrisRackauckas · 2019-08-19T11:10:24Z

Let's get other eyeballs on this. How does it interact with DAEs? How does it interact with LHL? Are there any more changes people can think of? I think it looks good, but if there's a use case that needs something more than this then we should plan for it since at this point we'd know what those uses are.

kanav99 · 2019-08-20T09:13:09Z

I just checked this out and it seems that DAEs can be handled better in this approach.
For DAEs, I need a general purpose nlsolver. Basically a nlsolver solving F(tmp + γ⋅z, z, p, t) = 0 where tmp + γ⋅z is the estimate of derivative we see in BDFs which was very incompatible with the previos nlsolver (I could have tried hacks like setting dt = 1 and then calling the nlsolve routine but meh, thats dirty).
I would only need to make a new algortihm for DAEs. All the functions except perform_step! do not call f so thats a huge huge help. Fetching less and less from the integrator is great.For my issue with user defined nlsolver algorithm, this again is helpful.

Now regarding the conflicting PRs all around, I have a way -

Get the resize! changes from Restructure non-linear solvers #315 and make a seperate PR for that only. (non breaking) and release DiffEqBase.
Merge [Experimental] Uses PseudoNLSolver OrdinaryDiffEq.jl#878 as a semi-implicit structure is required as pointed out by Chris.
Merge [WIP] Simplify derivative_utils.jl OrdinaryDiffEq.jl#876 (need to solve some conflicts), tag and release OrdinaryDiffEq.
Then test this PR.

A getproperty! hack would be useful for now -

function Base.getproperty(nls::NLSolver,s::Symbol)
  if s === :uf
    return nls.cache.uf
  else
    return getfield(nls,s)
  end
end

Then the next milestone would be DAEs. I get a good feeling about this 😄

ChrisRackauckas · 2019-08-20T09:42:49Z

I still don't see why DAEs cannot use the same nlsolver if we try hard enough. It's exactly the same problem. In ODEs, you move everything to one side and you get Mu' - f - tmp = 0 to solve, and with DAEs it's f - tmp. But in the linearized form it's Mu' - Ju, while for DAEs it's J'u' - Ju. So literally the only difference is that you need to call f slightly differently, and the W matrix has to use a the Jacobian of du instead of the mass matrix.

kanav99 · 2019-08-20T19:13:09Z

I still don't see why DAEs cannot use the same nlsolver if we try hard enough

Yes we can, and if this PR takes more time than we expect, I can make the changes by then. I have some of the changes already done, I will complete that and show that in a PR.

devmotion · 2019-08-20T22:21:14Z

Now regarding the conflicting PRs all around, I have a way -

I think we should approach these PRs in a different order. IMO SciML/OrdinaryDiffEq.jl#876 should be merged first, since tests pass and it does not require any new DiffEqBase release, so it's safe to do.

I'm still confused about some aspects of the PseudoNLsolver changes, and I think we should prioritize the general NLsolver changes in DiffEqBase. IMO, the most important thing is to change the DiffEqBase version bounds in OrdinaryDiffEq and StochasticDiffEq to the standard way of using Caret specifiers such that we can release breaking changes of NLsolver without breaking these packages.

Then we should update the existing drafts for fixing OrdinaryDiffEq and StochasticDiffEq, which should only require replacing some field names and accesses. IMO we should try to keep any algorithm changes or improvements in a separate PRs and just make sure that we have some version ready that plays niceley with the DiffEqBase changes. I quickly did this partly to be able to run the mass matrix tests in OrdinaryDiffEq. I think other features should only be considered if we think they might require additions/changes in the DiffEqBase PR; then we could change that PR. If we agree on it, we release it with a version number that is not supported by OrdinaryDiffEq and StochasticDiffEq. Then we merge the fixes in OrdinaryDiffEq and StochasticDiffEq, and release a version that is compatible with the latest DiffEqBase version.

And then we add new features and improvements in the downstream packages.

devmotion · 2019-08-23T07:21:25Z

I've merged SciML/OrdinaryDiffEq.jl#876 and started to fix the upper bounds of the dependencies of OrdinaryDiffEq and StochasticDiffEq. IMO, otherwise the main thing left to do is to check if the current design of this PR is sufficient for DAEs as well.

I still don't see why DAEs cannot use the same nlsolver if we try hard enough. It's exactly the same problem. In ODEs, you move everything to one side and you get Mu' - f - tmp = 0 to solve, and with DAEs it's f - tmp. But in the linearized form it's Mu' - Ju, while for DAEs it's J'u' - Ju. So literally the only difference is that you need to call f slightly differently, and the W matrix has to use a the Jacobian of du instead of the mass matrix.

I've never worked with DAEs, so I might be wrong, but wouldn't just dispatching on integrator in perform_step!(nlsolver, integrator) be sufficient? The signature of the current implementations could (should?) be changed to perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractODEAlgorithm}), and then we just add a different perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractDAEAlgorithm}).

Thinking a bit more about this, it seems slightly weird to have such ODE-specific algorithms in DiffEqBase. Maybe the actual implementations of the algorithms (or maybe rather the perform_step! and (maybe) build_nlsolver implementations) should be moved to OrdinaryDiffEq? I guess the reason to have them in OrdinaryDiffEq is that the same algorithms are used in StochasticDiffEq?

ChrisRackauckas · 2019-08-25T07:10:50Z

Thinking a bit more about this, it seems slightly weird to have such ODE-specific algorithms in DiffEqBase. Maybe the actual implementations of the algorithms (or maybe rather the perform_step! and (maybe) build_nlsolver implementations) should be moved to OrdinaryDiffEq? I guess the reason to have them in OrdinaryDiffEq is that the same algorithms are used in StochasticDiffEq?

Yes they are used in StochasticDiffEq which is why they were moved here. We could have a StarDiffEq.jl for high level *DiffEq handling, but that hasn't seemed necessary yet.

I've never worked with DAEs, so I might be wrong, but wouldn't just dispatching on integrator in perform_step!(nlsolver, integrator) be sufficient? The signature of the current implementations could (should?) be changed to perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractODEAlgorithm}), and then we just add a different perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractDAEAlgorithm}).

We could do that, but I'm not convinced we even have to. It should be even simpler.

ChrisRackauckas · 2019-08-25T07:15:37Z

Yes, looking at the algorithms, I don't think we need to have any new perform steps for DAEs. It should work by just chainging uf and the J_W building. So I think this is good to go.

ChrisRackauckas · 2019-08-25T07:24:26Z

For a quick reference, take a look at DASSL: https://www.osti.gov/servlets/purl/5882821 .

We might need to add a hook for updating uf, where a multistep method would define the du part directly with a backwards difference and then compute the residual f(resid,du,u,p,t). And then W is just defined directly as df/ddu + gamma df/du, and it should work.

kanav99 · 2019-09-02T19:29:07Z

src/nlsolve/nlsolve.jl

+    # check convergence and divergence criteria
+    check_status!(nlsolver, integrator)
+  end
+


You have forgot to update z in the last step.

function nlsolve!(nlsolver::NLSolver{algType, iip}, integrator) where {algType, iip} preamble!(nlsolver, integrator) while get_status(nlsolver) === SlowConvergence # (possibly modify and) accept step apply_step!(nlsolver, integrator) # compute next iterate perform_step!(nlsolver, integrator) # check convergence and divergence criteria check_status!(nlsolver, integrator) end if iip recursivecopy!(nlsolver.z, nlsolver.gz) else nlsolver.z = nlsolver.gz end postamble!(nlsolver, integrator) end

This one should work.

Thanks, I had noticed this issue as well some days ago. Unfortunately, there are still some issues However, I've come to believe that it's wrong to push such a huge update to DiffEqBase directly. IMO we should rather update OrdinaryDiffEq to the new algorithm design without touching DiffEqBase, and then move these changes upstream. In that way it's also easier to integrate DAE support in the redesign as well. As I mentioned in the PR and on Slack, I'm currently preparing such a branch of OrdinaryDiffEq that also adds the different step by step to separate the different parts of these changes more clearly.

I hope to be done until the end of the week with it, but since I'm currently on vacation and travelling around, it might take some additional days...

devmotion and others added 8 commits August 12, 2019 09:12

Change type structure of non-linear solvers

6f35f35

Remove duplications in Anderson caches and extend oop Newton cache

c8fb609

Extend NLSolver interface

b1f8a7e

Fix resize! for NLAndersonConstantCache

db23db9

Make changes less breaking

390a117

Unified nlsolve! method

0f241a3

Less specific nlsolve!

7e7a7a5

Add AbstractNLSolver

de2f102

ChrisRackauckas approved these changes Aug 19, 2019

View reviewed changes

ChrisRackauckas requested review from YingboMa and kanav99 August 19, 2019 11:08

ChrisRackauckas mentioned this pull request Aug 19, 2019

[WIP] Simplify derivative_utils.jl SciML/OrdinaryDiffEq.jl#876

Merged

YingboMa approved these changes Aug 19, 2019

View reviewed changes

Rename fields and separate Anderson acceleration

8d0876b

kanav99 mentioned this pull request Aug 20, 2019

resize! from #315 #326

Closed

devmotion added 3 commits August 27, 2019 14:14

Fix typo

1750a6f

Make Anderson reusable by DelayDiffEq

0709924

More defaults for AbstractNLSolver

a564968

devmotion mentioned this pull request Aug 28, 2019

Fix fixed-point iteration and add proper tests SciML/DelayDiffEq.jl#152

Merged

kanav99 reviewed Sep 2, 2019

View reviewed changes

devmotion closed this Sep 7, 2019

Uh oh!

Unified nlsolve! implementation #325

Unified nlsolve! implementation #325

Uh oh!

Conversation

devmotion commented Aug 19, 2019

Uh oh!

devmotion commented Aug 19, 2019

Uh oh!

ChrisRackauckas commented Aug 19, 2019

Uh oh!

ChrisRackauckas commented Aug 19, 2019

Uh oh!

ChrisRackauckas commented Aug 19, 2019

Uh oh!

kanav99 commented Aug 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChrisRackauckas commented Aug 20, 2019

Uh oh!

kanav99 commented Aug 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devmotion commented Aug 20, 2019

Uh oh!

devmotion commented Aug 23, 2019

Uh oh!

ChrisRackauckas commented Aug 25, 2019

Uh oh!

ChrisRackauckas commented Aug 25, 2019

Uh oh!

ChrisRackauckas commented Aug 25, 2019

Uh oh!

kanav99 Sep 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devmotion Sep 2, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kanav99 commented Aug 20, 2019 •

edited

Loading

kanav99 commented Aug 20, 2019 •

edited

Loading

kanav99 Sep 2, 2019 •

edited

Loading