Skip to content

Conversation

@devmotion
Copy link
Member

Since the changes of the non-linear solvers are going to be breaking anyway, I took this to the extreme and changed the structure of the non-linear solvers even more fundamentally than in #315.

Basically, with this PR we get a unified implementation of nlsolve! for general NLSolver structs, leading to identical convergence and divergence checks for different algorithms. New algorithms only have to implement a perform_step!(nlsolver, integrator, iter) function that computes the next iterate nlsolver.z based on the previous value nlsolver.zprev. Implementations can be optimized, e.g., by caching residuals and implementing norm_of_residuals(nlsolver, integrator) as well, which otherwise by default just calls into calculate_residuals with nlsolver.zprev and nlsolver.z and tolerances and norm of integrator. Moreover, further customization is possible by implementing preamble!(nlsolver, integrator) (which, e.g., can also be used for caching precalculated values) and loopheader!(nlsolver, integrator, iter) (which by default only updates nlsolver.zprev with nlsolver.z but can, e.g., be used to perform Anderson acceleration).

Since the nlsolve! interface is so generic, I moved most fields (such as dz and k) to the cache and only kept the fields in NLSolver that are part of the problem (tmp, gamma, and c), the convergence/divergence checks (kappa, eta_old, fast_convergence_cutoff), or the general algorithm logic (z, zprev, max_iter, nl_iters, status) and cache and alg. This choice is of course debatable and I'm not sure if it's the best one. I think, one could argue that fields that are probably used by most algorithms (such as dz or k, maybe?) should be added to the NLSolver struct; however, k is not needed for the out-of-place methods and in some methods one does not compute dz explicitly while computing the next iterate, so there might be cases in which one would not need these fields. On the other hand, one could also argue that p and dt are part of the mathematical problem formulation dt⋅f(tmp + γ⋅z, p, t + c⋅dt) - z = 0, and hence should be part of the NLSolver struct as well. Maybe one would even want a solver that is almost independent from the integrator, such that only values such as tmp, t or other values in the cache are updated from the integrator at the start of nlsolve! and then the algorithm runs (almost) independently and just returns a solution z at the end.

In addition to the generic nlsolve! this PR contains the following changes (which are also debatable):

  • in the implementation of the functional iterations u is never touched, in contrast to the current implementation which actually changes integrator.u in https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L205 (but only for the in-place version, so there exists an inconsistency between both versions at the moment)
  • the normalized difference z - zprev is computed based on the maximum values of z and zprev (at the moment, in lines such as https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L221 it's based on u (which changes in every iteration only in the in-place variant) and uprev)
  • the norm of these differences, which is used for divergence and convergence checks, is not updated anymore after performing Anderson acceleration (see Update of residuals in Anderson acceleration #310), and hence now they are always the norm of the normalized differences of G(z_k) - z_k instead of z_{k+1} - z_k
  • iipnlsolve and oopnlsolve are replaced with build_nlsolver which has a signature very similar to alg_cache
  • for consistency with build_jac_config and build_nlsolver (and build_solution), get_iip_uf and get_oop_uf are replaced with build_uf
  • instead of passing J and W to build_nlsolver, the implementation of build_nlsolver calls build_J_W for NLNewton
  • all algorithm-specific values (such as droptol, fast_convergence_cutoff, etc.) whose types are not fixed (such as aa_start) are converted to the appropriate unitless types of u or t and added to the NLSolver struct if they are used for the convergence/divergence checks or the caches otherwise
  • the default for droptol is changed to 1e10 (according to Walker's implementation of Anderson acceleration, see the latest changes in NLsolve)
  • support for resizing requires implementation of Base.resize!(nlcache, i), which is the default fallback of Base.resize!(nlcache, nlsolver, integrator, i), which can be implemented instead if the information in the cache alone is not sufficient

@devmotion
Copy link
Member Author

I forgot to mention that I added an additional status MaxItersReached as well.

In the latest commit I moved the implementation of nlsolve! to different functions, such that a similar logic can be used for general AbstractNLSolver objects. Moreover, I added du_cache since it seems in OrdinaryDiffEq some algorithms save allocations by aliasing nlsolver.k to fsallast.

@ChrisRackauckas
Copy link
Member

This needed to happen.

in the implementation of the functional iterations u is never touched, in contrast to the current implementation which actually changes integrator.u in https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L205 (but only for the in-place version, so there exists an inconsistency between both versions at the moment)

Good

the normalized difference z - zprev is computed based on the maximum values of z and zprev (at the moment, in lines such as https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/nlsolve/functional.jl#L221 it's based on u (which changes in every iteration only in the in-place variant) and uprev)

Yes, I think in one of the issues I mentioned we should do this.

the norm of these differences, which is used for divergence and convergence checks, is not updated anymore after performing Anderson acceleration (see #310), and hence now they are always the norm of the normalized differences of G(z_k) - z_k instead of z_{k+1} - z_k

I like that

iipnlsolve and oopnlsolve are replaced with build_nlsolver which has a signature very similar to alg_cache

Makes sense.

instead of passing J and W to build_nlsolver, the implementation of build_nlsolver calls build_J_W for NLNewton

So then we should just do a semi-implicit nlsolver as well?

all algorithm-specific values (such as droptol, fast_convergence_cutoff, etc.) whose types are not fixed (such as aa_start) are converted to the appropriate unitless types of u or t and added to the NLSolver struct if they are used for the convergence/divergence checks or the caches otherwise

Interesting.

@ChrisRackauckas
Copy link
Member

Since the nlsolve! interface is so generic, I moved most fields (such as dz and k) to the cache and only kept the fields in NLSolver that are part of the problem (tmp, gamma, and c), the convergence/divergence checks (kappa, eta_old, fast_convergence_cutoff), or the general algorithm logic (z, zprev, max_iter, nl_iters, status) and cache and alg. This choice is of course debatable and I'm not sure if it's the best one. I think, one could argue that fields that are probably used by most algorithms (such as dz or k, maybe?) should be added to the NLSolver struct; however, k is not needed for the out-of-place methods and in some methods one does not compute dz explicitly while computing the next iterate, so there might be cases in which one would not need these fields.

I think that's the way to go. Commit to it being generic and make specific caches add the things they need. My goal here is to get a documentable interface for adding new nlsolvers.

On the other hand, one could also argue that p and dt are part of the mathematical problem formulation dt⋅f(tmp + γ⋅z, p, t + c⋅dt) - z = 0, and hence should be part of the NLSolver struct as well. Maybe one would even want a solver that is almost independent from the integrator, such that only values such as tmp, t or other values in the cache are updated from the integrator at the start of nlsolve! and then the algorithm runs (almost) independently and just returns a solution z at the end.

This one I'll have to think about. For now, having the integrator isn't bad, so that's fine. Eventually making it integrator free could make it useful outside of diffeq? Maybe.

@ChrisRackauckas
Copy link
Member

Let's get other eyeballs on this. How does it interact with DAEs? How does it interact with LHL? Are there any more changes people can think of? I think it looks good, but if there's a use case that needs something more than this then we should plan for it since at this point we'd know what those uses are.

@kanav99
Copy link
Contributor

kanav99 commented Aug 20, 2019

I just checked this out and it seems that DAEs can be handled better in this approach.
For DAEs, I need a general purpose nlsolver. Basically a nlsolver solving F(tmp + γ⋅z, z, p, t) = 0 where tmp + γ⋅z is the estimate of derivative we see in BDFs which was very incompatible with the previos nlsolver (I could have tried hacks like setting dt = 1 and then calling the nlsolve routine but meh, thats dirty).
I would only need to make a new algortihm for DAEs. All the functions except perform_step! do not call f so thats a huge huge help. Fetching less and less from the integrator is great.For my issue with user defined nlsolver algorithm, this again is helpful.

Now regarding the conflicting PRs all around, I have a way -

  1. Get the resize! changes from Restructure non-linear solvers #315 and make a seperate PR for that only. (non breaking) and release DiffEqBase.
  2. Merge [Experimental] Uses PseudoNLSolver OrdinaryDiffEq.jl#878 as a semi-implicit structure is required as pointed out by Chris.
  3. Merge [WIP] Simplify derivative_utils.jl OrdinaryDiffEq.jl#876 (need to solve some conflicts), tag and release OrdinaryDiffEq.
  4. Then test this PR.

A getproperty! hack would be useful for now -

function Base.getproperty(nls::NLSolver,s::Symbol)
  if s === :uf
    return nls.cache.uf
  else
    return getfield(nls,s)
  end
end

Then the next milestone would be DAEs. I get a good feeling about this 😄

@ChrisRackauckas
Copy link
Member

I still don't see why DAEs cannot use the same nlsolver if we try hard enough. It's exactly the same problem. In ODEs, you move everything to one side and you get Mu' - f - tmp = 0 to solve, and with DAEs it's f - tmp. But in the linearized form it's Mu' - Ju, while for DAEs it's J'u' - Ju. So literally the only difference is that you need to call f slightly differently, and the W matrix has to use a the Jacobian of du instead of the mass matrix.

@kanav99
Copy link
Contributor

kanav99 commented Aug 20, 2019

I still don't see why DAEs cannot use the same nlsolver if we try hard enough

Yes we can, and if this PR takes more time than we expect, I can make the changes by then. I have some of the changes already done, I will complete that and show that in a PR.

@kanav99 kanav99 mentioned this pull request Aug 20, 2019
@devmotion
Copy link
Member Author

Now regarding the conflicting PRs all around, I have a way -

I think we should approach these PRs in a different order. IMO SciML/OrdinaryDiffEq.jl#876 should be merged first, since tests pass and it does not require any new DiffEqBase release, so it's safe to do.

I'm still confused about some aspects of the PseudoNLsolver changes, and I think we should prioritize the general NLsolver changes in DiffEqBase. IMO, the most important thing is to change the DiffEqBase version bounds in OrdinaryDiffEq and StochasticDiffEq to the standard way of using Caret specifiers such that we can release breaking changes of NLsolver without breaking these packages.

Then we should update the existing drafts for fixing OrdinaryDiffEq and StochasticDiffEq, which should only require replacing some field names and accesses. IMO we should try to keep any algorithm changes or improvements in a separate PRs and just make sure that we have some version ready that plays niceley with the DiffEqBase changes. I quickly did this partly to be able to run the mass matrix tests in OrdinaryDiffEq. I think other features should only be considered if we think they might require additions/changes in the DiffEqBase PR; then we could change that PR. If we agree on it, we release it with a version number that is not supported by OrdinaryDiffEq and StochasticDiffEq. Then we merge the fixes in OrdinaryDiffEq and StochasticDiffEq, and release a version that is compatible with the latest DiffEqBase version.

And then we add new features and improvements in the downstream packages.

@devmotion
Copy link
Member Author

I've merged SciML/OrdinaryDiffEq.jl#876 and started to fix the upper bounds of the dependencies of OrdinaryDiffEq and StochasticDiffEq. IMO, otherwise the main thing left to do is to check if the current design of this PR is sufficient for DAEs as well.

I still don't see why DAEs cannot use the same nlsolver if we try hard enough. It's exactly the same problem. In ODEs, you move everything to one side and you get Mu' - f - tmp = 0 to solve, and with DAEs it's f - tmp. But in the linearized form it's Mu' - Ju, while for DAEs it's J'u' - Ju. So literally the only difference is that you need to call f slightly differently, and the W matrix has to use a the Jacobian of du instead of the mass matrix.

I've never worked with DAEs, so I might be wrong, but wouldn't just dispatching on integrator in perform_step!(nlsolver, integrator) be sufficient? The signature of the current implementations could (should?) be changed to perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractODEAlgorithm}), and then we just add a different perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractDAEAlgorithm}).

Thinking a bit more about this, it seems slightly weird to have such ODE-specific algorithms in DiffEqBase. Maybe the actual implementations of the algorithms (or maybe rather the perform_step! and (maybe) build_nlsolver implementations) should be moved to OrdinaryDiffEq? I guess the reason to have them in OrdinaryDiffEq is that the same algorithms are used in StochasticDiffEq?

@ChrisRackauckas
Copy link
Member

Thinking a bit more about this, it seems slightly weird to have such ODE-specific algorithms in DiffEqBase. Maybe the actual implementations of the algorithms (or maybe rather the perform_step! and (maybe) build_nlsolver implementations) should be moved to OrdinaryDiffEq? I guess the reason to have them in OrdinaryDiffEq is that the same algorithms are used in StochasticDiffEq?

Yes they are used in StochasticDiffEq which is why they were moved here. We could have a StarDiffEq.jl for high level *DiffEq handling, but that hasn't seemed necessary yet.

I've never worked with DAEs, so I might be wrong, but wouldn't just dispatching on integrator in perform_step!(nlsolver, integrator) be sufficient? The signature of the current implementations could (should?) be changed to perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractODEAlgorithm}), and then we just add a different perform_step!(nlsolver::NLSolver, integrator::DEIntegrator{<:AbstractDAEAlgorithm}).

We could do that, but I'm not convinced we even have to. It should be even simpler.

@ChrisRackauckas
Copy link
Member

Yes, looking at the algorithms, I don't think we need to have any new perform steps for DAEs. It should work by just chainging uf and the J_W building. So I think this is good to go.

@ChrisRackauckas
Copy link
Member

For a quick reference, take a look at DASSL: https://www.osti.gov/servlets/purl/5882821 .

We might need to add a hook for updating uf, where a multistep method would define the du part directly with a backwards difference and then compute the residual f(resid,du,u,p,t). And then W is just defined directly as df/ddu + gamma df/du, and it should work.

# check convergence and divergence criteria
check_status!(nlsolver, integrator)
end

Copy link
Contributor

@kanav99 kanav99 Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have forgot to update z in the last step.

function nlsolve!(nlsolver::NLSolver{algType, iip}, integrator) where {algType, iip}
  preamble!(nlsolver, integrator)

  while get_status(nlsolver) === SlowConvergence
    # (possibly modify and) accept step
    apply_step!(nlsolver, integrator)

    # compute next iterate
    perform_step!(nlsolver, integrator)
  
    # check convergence and divergence criteria
    check_status!(nlsolver, integrator)
  end

  if iip
    recursivecopy!(nlsolver.z, nlsolver.gz)
  else
    nlsolver.z = nlsolver.gz
  end

  postamble!(nlsolver, integrator)
end

This one should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had noticed this issue as well some days ago. Unfortunately, there are still some issues However, I've come to believe that it's wrong to push such a huge update to DiffEqBase directly. IMO we should rather update OrdinaryDiffEq to the new algorithm design without touching DiffEqBase, and then move these changes upstream. In that way it's also easier to integrate DAE support in the redesign as well. As I mentioned in the PR and on Slack, I'm currently preparing such a branch of OrdinaryDiffEq that also adds the different step by step to separate the different parts of these changes more clearly.

I hope to be done until the end of the week with it, but since I'm currently on vacation and travelling around, it might take some additional days...

@devmotion devmotion closed this Sep 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants