Skip to content

Timer is causing silent failure of ExpressionTool #1680

@douglowe

Description

@douglowe

The exec_js_process subroutine inside the sandboxjs.py module uses two methods for deciding when a javascript expression tool has finished:

  1. It reads the end of the stderror and stdout streams, looking for a given string to indicate the end of the process
  2. A timer runs out, indicating that the tool has taken too long to complete, probably has failed, and so we should stop waiting

That timer is, currently, 20 seconds. And on one of our local HPC systems it seems that 20 seconds is not enough time to load the singularity container and run the javascript tool. This causes our workflow to fail. Increasing that time limit to 30 seconds (using a hardcoded value in the subroutine) solves the problem.

There are a number of issues here that I think need addressing:

  1. Currently, when tasks fail in this manner, there is no feedback given to the user as to what the cause of the failure is. This needs to be corrected, so that users are aware that they have encountered a time out failure. Also, when the process is closed due to the timer, rather than due to finding the "processed finished" string in stderr and stdout, the last characters of stderr and stdout probably shouldn't be removed, as this could delete important debugging information.
  2. What is a sensible time limit to use here? 20 seconds feels reasonable - but (as we've shown in our case here) is not enough where we are working with distributed filesystems. Presumably we need a low limit, so that not too much time is wasted when tools do fail, but would something like 60 seconds be more reasonable?
  3. Why is an error not thrown when a tool fails by hitting the time limit? Are there situations in which tools complete the required task without returning the proper "process finished" string at the end of stderr and stdout? If not then would it not be better for the workflow to end with an error for this step?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions