Skip to content

Conversation

@devmotion
Copy link
Member

Let's make use of the new CI test feature 😃

@coveralls
Copy link

Coverage Status

Coverage decreased (-62.5%) to 3.817% when pulling 148cbad on nlsolver_unified into 8f94512 on master.

@devmotion
Copy link
Member Author

OK, so two things:

  1. Tests fail on Gitlab since we have a separate build step there, and so the setup with adjusting the DiffEqBase version in runtests.jl cannot work.
  2. Right now I'm mostly worried about the test failures of the Newton algorithms, e.g., in the Jacobian and time derivative tests. It seems they are not broken in general, since, e.g., the mass matrix tests succeed. I haven't figured out yet if it's a problem in OrdinaryDiffEq or in DiffEqBase, that's the most worrisome part to me. My preliminary debugging of the Jacobian tests shows that for the Newton algorithms the Jacobian is only evaluated once. Adding print statements to do_newJ reveals that it returns false (apart from the first iteration) the since the status of the non-linear solver is FastConvergence. So right now I believe that there might be an issue with the implementation of the Newton method or the new way of evaluating the residuals. If it's the latter and we do not want to change it back, maybe we have to adjust fast_convergence_cutoff? Another possibility is that strictly separating the non-linear solver from the u and uprev values/arrays of the integrator might actually makes things worse (maybe also due to some additional already existing bugs in the OrdinaryDiffEq implementation of the algorithms?)?

@ChrisRackauckas
Copy link
Member

@YingboMa might want to dig in a bit here.

update_W!(integrator, cache, dt, false)
@.. gprev = dt*0.5*(du₂ - f2ⱼ₋₁) + dt*(0.5 - μs₁)*(du₁ - f1ⱼ₋₁)
nlsolver.linsolve(vec(tmp),get_W(nlsolver),vec(gprev),false)
get_linsolve(nlsolver)(vec(tmp),get_W(nlsolver),vec(gprev),false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels pretty bad to me (appears in multiple other places as well). I think we should rather define a generic method linsolve(out, nlsolver, in, flag; kwargs...) (or maybe some other less generic name) (in DiffEqBase?) that's defined as f_linsolve = get_linsolve(nlsolver); W = get_W(nlsolver); f_linsolve(vec(out), W, vec(in), flag; kwargs...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes sense.

@devmotion
Copy link
Member Author

Reading through IV.8 ("Stopping criterion") in Hairer's book gives me the impression that the convergence/divergence checks of the non-linear solvers are not completely according to his suggestions. Denoting the current iterate by z_k and the next one by z_{k+1}, it feels that:

  • The norm of the residuals dz_k = z_{k+1} - z_k should just be integrator.opts.internalnorm(dz_k, t) since "The norm, used in all these formulas, should be the same as the one used for the local error estimator". In particular, this does not include any tolerances, they are considered at a later stage.
  • Under the assumption of linear convergence ||dz_k|| <= Theta * ||dz_{k+1}||, the triangle inequality gives the estimate ||z_{k+1} - z*|| <= Theta / (1 - Theta) ||dz_k||.
  • Theta can be estimated by Theta_k = ||dz_k|| / ||dz_{k-1}|| for k > 1. That's currently implemented (but with the incorrect norms, I think).
  • Iteration is stopped ("convergence") if eta_k * ||dz_k|| <= kappa * tol, where eta_k = Theta_k / (1 - Theta_k) for k > 1 and eta_0 = (max(eta_old, eps))^0.8 (implemented for Newton but not for functional iterations), and tol is some tolerance ("It is clear that the iteration error should not be larger than the local discretization error, which is usually kept close to Tol."). Interestingly in the case of convergence, Hairer accepts the next iterate, i.e., z_{k+1}, due to the estimate from the triangle inequality above. In our implementation we check eta_k * ||y_k|| <= kappa for fixed kappa and absolute and relative tolerances are only taken into account implicitly in the computation of y_k = dz_k / (abstol + reltol * max(|z_{k+1}|, |z_k|)). This implies that instead of Theta_k = ||dz_k|| / ||dz_{k-1}|| we compute Theta_k = ||dz_k|| / ||dz_{k-1}|| * (abstol + reltol * max(|z_k|, |z_{k-1}|)) / (abstol + reltol * max(|z_{k+1}|,|z_k|)).
  • kappa around 10^{-1} or 10^{-2} and high number of iterations (k_max = 7 or 10) seemed to work best. That fit's to our implementation.
  • If Theta_k >= 1 the iteration is stopped and restarted with a smaller stepsize ("divergence"). Matches our implementation.
  • If Theta_k^(k_max - k) / (1 - Theta_k) ||dz_k|| > kappa * tol integration is stopped as well since the LHS is a rough estimated of the k_max - 1 iterations. This is implemented as so-called "slow convergence", but again with the tolerances handled implicitly in the norm.
  • Moreover, if only one iteration was needed or the last Theta_k was very small (e.g., < 10^-3), then the Jacobian is not recomputed in the next step (as long as no step rejection occurs). This is known as "fast convergence" in our code, but the criterion seems to be slightly different. I'm also not sure if we ever check if a step was rejected when basing recomputation of the Jacobian and W on the convergence speed?

So, if we want to include the tolerances implicitly in the norm it should be done in one specific way throughout all iterations, if we want to follow Hairer's reasoning. In particular, we should NOT make the current change and make it dependent on the iteration step. However, before it was also iteration-dependent, but only for some algorithms. A bit unfortunate is also that usually the error estimators are based on max(|u|, |uprev|) it seems, but the u might change and be different in the non-linear solvers. Should we just base it on uprev?

Moreover, it seems the criterion for (fast) convergence is different in our implements. We could simplify the convergence check to eta_k * ||dz_k|| <= kappa * tol (or without tol, depending on our norm) and replace the fast convergence check by iter == 1 || Theta <= fast_convergence_cutoff.

A third point is, that I'm not sure anymore if removing the updates of the residuals for Anderson acceleration is a good idea. If it's not updated, the triangle inequality cannot be applied to the sum of dz_k (or would contain additional terms). But I guess it depends a bit on if we think that in the first place the linear convergence assumption is reasonable at all in that case.

@ChrisRackauckas
Copy link
Member

Way back we were matching Hairer, but that ended up being very inefficient, so we moved more towards matching Sundials and Kennedy and Carpenter. @YingboMa will keep continuing on the Jacobian reuse so I wouldn't mix changes to this during the refactor. Any changes here needs lots of benchmarks

@devmotion
Copy link
Member Author

I completely agree, we shouldn't mix any algorithm changes with the refactoring. In fact, we have done that already (changing the computation of the residuals and not touching integrator.u). I'll revert any changes of the algorithm to rule out that the test failures are caused by these modifications. However, I still think probably one would want to use the same norm for computing ||dz_k|| for all k and it should be consistent with the norm used for the local error estimator. Also not modifying u seems reasonable to me but I'll check if it had any effects on the tests.

Way back we were matching Hairer, but that ended up being very inefficient, so we moved more towards matching Sundials and Kennedy and Carpenter.

Is that the reason for setting fast_convergence_cutoff = 1 // 5, which is surprisingly large compared with the value of 10^-3 found in Hairer's book? I'm wondering also about that since I observed that the algorithms basically returned FastConvergence all the time.

@devmotion
Copy link
Member Author

Quick update: found a stupid bug, the type signature of initial_η was incorrect. It improved things, the errors are much smaller now. Now I'm reverting different parts of the algorithm.

@devmotion
Copy link
Member Author

OK, I think I tracked down the issue. Apparently the crucial point is (at least for the Newton algorithm) to return not the current but the next iterate. I'm pretty certain that's the main culprit but I changed a bunch of things during the debugging, so I want to recheck once again.

@devmotion
Copy link
Member Author

Of course, it's more complicated... I could fix most issues but there are two main annoyances left:

  1. For some methods I cannot reproduce exactly the same results in the new setup although the code is supposed to be equivalent - for the Newton methods at least. Since there are currently some inconsistencies between the different algorithms in how the residuals are computed (adaptive norm? fixed norm? based on z and gz?), my goal is to reproduce the results on the master branch for the Newton method exactly. However, the following script (taken from jacobian_tests.jl) indicates that the results for TRBDF2 and KenCarp4 are slightly off while all other methods produce exactly the same stats and solution errors:
using OrdinaryDiffEq, Test, ParameterizedFunctions

d_alembert = @ode_def DAlembert begin
    dx = a - b*x + c*t
end a b c
function d_alembert_analytic(u0,p,t::Number)
    a,b,c = p
    ebt = exp(b*t)
    @. exp(-b*t)*(-a*b + c + ebt*(a*b + c*(b*t - 1)) + b^2 * u0)/(b^2)
end

p = (1., 2., 3.)
u0 = [1.0]
tspan = (0.0,10.0)
prob = ODEProblem(ODEFunction(d_alembert.f,
                              jac = d_alembert.jac,
                              analytic=d_alembert_analytic),
                              u0,tspan,p)

println("Rosenbrock23")
sol = solve(prob,Rosenbrock23(),abstol=1e-8,reltol=1e-8)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 1e-7

println("Rodas4")
sol = solve(prob,Rodas4(),abstol=1e-10,reltol=1e-10)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 1e-7

println("Veldd4")
sol = solve(prob,Veldd4(),abstol=1e-10,reltol=1e-10)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 1e-7

println("Rodas5")
sol = solve(prob,Rodas5(),abstol=1e-10,reltol=1e-10)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 1e-7

println("Trapezoid")
sol = solve(prob,Trapezoid(),abstol=1e-10,reltol=1e-10)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 2e-6

println("KenCarp3")
sol = solve(prob,KenCarp3(),abstol=1e-10,reltol=1e-10)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 8e-4

println("KenCarp4")
sol = solve(prob,KenCarp4(),abstol=1e-10,reltol=1e-10)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 1e-7

println("KenCarp4 - Functional")
sol = solve(prob,KenCarp4(nlsolve=NLFunctional()),abstol=1e-10,reltol=1e-10)
@show sol.errors

println("TRBDF2 - Functional")
sol = solve(prob,TRBDF2(nlsolve=NLFunctional()),abstol=1e-10,reltol=1e-10)
@show sol.errors

println("TRBDF2")
sol = solve(prob,TRBDF2(),abstol=1e-10,reltol=1e-10)
@show sol.destats
@show sol.errors
@test sol.errors[:l2] < 2e-6
  1. It's not desirable at all to reproduce the current state since I'm pretty sure the current behaviour is not intended. Currently, nlsolve! internally sets fail_convergence = false and integrator.force_stepfail = false if and only if the optimization is aborted due to Convergence or FastConvergence. However, if the optimization is not exited due to Divergence or VerySlowConvergence (which is checked even if the procedure satisfies already the divergence criterion - that seems undesirable) - that might happen, e.g., if the maximum number of iterations is reached - the resulting status could be any of the five status: in particular, if multiple nlsolve! are run sequentially, the status is not reset in between, so just the unchanged old status would be returned. That means that checking nlsolvefail(nlsolver) in perform_step! might not actually stop the current step, but instead the next nlsolve! call could even reset integrator.force_stepfail to false, if it converges although the previous one failed. This implies that currently steps that should fail might not abort the execution of perform_step! immediately, leading to superfluous computations, and might even not lead to a failing step at all. Since nlsolvefail returns true if and only if the status is Divergence or VerySlowConvergence, even resetting the solver's status to SlowConvergence in between different calls would not help: in case the maximum number of iterations is reached, we would still have fail_convergence = true but nlsolvefail(nlsolver) = false. Hence, as already done in the previous NLSolver drafts, one might want to use another status MaxIters(Reached?) that is of negative value (such that nlsolvefail(nlsolver) = true if nlsolvers status is MaxIters) and to which the solver's status is reset at the beginning of nlsolve! (such that any exit without explicitly setting the status to Convergence, FastConvergence, Divergence, or VerySlowConvergence) would return a solver with status MaxIters. Actually I'm not sure what's the current purpose of SlowConvergence since it's never used, it seems.

I'm trying to get OrdinaryDiffEq to the new design in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/tree/nlsolve_full, starting with a basic restructuring that does not affect the algorithm. Then step by step different bugs/inconsistencies/inefficiencies can be addressed. Since I'm not happy with building against the DiffEqBase branch and it slows down the development process to depend on changes in two packages, I moved all the non-linear solver code apart from the status and the abstract types to OrdinaryDiffEq for now, so all changes are restricted to OrdinaryDiffEq. If it's working, the nonlinear solvers can be moved back to DiffEqBase.

I would be happy for any pointers why TRBDF2 and KenCarp4 in the example above cannot reproduce the results on master.

@ChrisRackauckas
Copy link
Member

It's fine if they don't reproduce the same results but if they do well on the benchmarks. We also need to get @isaacsas 's example into the stiff ODE benchmarks.

nlsolve_f(integrator.f, unwrap_alg(integrator, true))

function iip_generate_W(alg,u,uprev,p,t,dt,f,uEltypeNoUnits)
function DiffEqBase.build_J_W(alg,u,uprev,p,t,dt,f,uEltypeNoUnits,::Val{true})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be in integrator_utils. It's a derivative utility, so it should go with the derivative utilities.

nlsolve!(integrator, cache) = DiffEqBase.nlsolve!(cache.nlsolver, cache.nlsolver.cache, integrator)
nlsolve!(nlsolver::NLSolver, integrator) = DiffEqBase.nlsolve!(nlsolver, nlsolver.cache, integrator)

DiffEqBase.nlsolve_f(f, alg::OrdinaryDiffEqAlgorithm) = f isa SplitFunction && issplit(alg) ? f.f1 : f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't belong here either.

nlsolver.uf.p = p
J = jacobian(nlsolver.uf,uprev,integrator)
@unpack uf = DiffEqBase.get_cache(nlsolver)
uf.t = t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uf.f = ...


function calc_J(nlsolver, integrator, cache::OrdinaryDiffEqConstantCache, is_compos)
@unpack t,dt,uprev,u,f,p = integrator
function calc_J(nlsolver, integrator, cache::OrdinaryDiffEqConstantCache)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we still need 2 dispatches here? They do the same thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely should remove one. It's just such a massive change that I wanted to do the derivative_utils.jl cleanup in a separate PR/commit.

`jac_config` of the cache.
"""
function calc_J!(integrator, cache::OrdinaryDiffEqCache, is_compos)
function calc_J!(integrator, cache::OrdinaryDiffEqCache)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just pass in J and get rid of all of this extra code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants