-
-
Notifications
You must be signed in to change notification settings - Fork 33
Retry on libssh SSH_AGAIN return code #756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Retry on libssh SSH_AGAIN return code #756
Conversation
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
1 similar comment
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
``ssh_userauth_password`` are now retried when libssh returns SSH_AGAIN. | ||
:user:`justin-stephenson`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``ssh_userauth_password`` are now retried when libssh returns SSH_AGAIN. | |
:user:`justin-stephenson`. | |
``ssh_userauth_password`` are now retried when ``libssh`` returns ``SSH_AGAIN`` | |
-- by :user:`justin-stephenson`. |
@@ -0,0 +1,3 @@ | |||
The ``Channel`` class calls to libssh ``ssh_channel_open_session`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure classes don't call things. They represent states. Could you rephrase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could result in an infinite loop..
Also, I don't think a proof that this works as intended without tests. Add them.
Yes, this would go into infinite loop if the server dies and does not properly disconnect. And timeouts are to handle this issue. Retrying unconditionally and infinitely is ok for tests, but for real-world application, the pylibssh should do at very least some check with Or setting some limit how many times you could retry. But in this case, why not raise the timeout itself? |
Thank you for your comments and review.
I tried to add a test for this but unfortunately I couldn't reproduce the scenario we see in the test environment. I'll try again.
I can change this PR to add a call to Whichever you prefer is acceptable for us, just let me know and i'll make those changes. |
I am actually wondering how you are getting the SSH_AGAIN in these two places with pylibssh. The sessions in libssh are blocking by default. The only way to change the session to non-blocking mode is to use But there might be the oddness that setting low timeout might actually return the SSH_AGAIN in places where it should not, according to the documentation, which would be a bug in libssh that needs to be fixed. What brought you initially to set smaller timeouts? Is a viable workaround to raise the timeouts? |
The error we currently see in our PRCI is specific to
I added the
In our code we set
If I understand correctly, setting this low -- for reference https://github.com/next-actions/pytest-mh/blob/master/pytest_mh/conn/ssh.py |
Workaround ansible pylibssh issue which causes test failures pylibsshext.errors.LibsshChannelException: Failed to open_session: [-2] PR ansible/pylibssh#756 is under review but workaround it in the meantime.
Workaround ansible pylibssh issue which causes test failures pylibsshext.errors.LibsshChannelException: Failed to open_session: [-2] PR ansible/pylibssh#756 is under review but workaround it in the meantime.
Setting a low SSH options timeout value can lead to LibsshChannelException: Failed to open_session: [-2] Attempt to retry when this occurs.
51b9a9b
to
7afce6d
Compare
Workaround ansible pylibssh issue which causes test failures pylibsshext.errors.LibsshChannelException: Failed to open_session: [-2] PR ansible/pylibssh#756 is under review but workaround it in the meantime.
Workaround ansible pylibssh issue which causes test failures pylibsshext.errors.LibsshChannelException: Failed to open_session: [-2] PR ansible/pylibssh#756 is under review but workaround it in the meantime.
Workaround ansible pylibssh issue which causes test failures pylibsshext.errors.LibsshChannelException: Failed to open_session: [-2] PR ansible/pylibssh#756 is under review but workaround it in the meantime.
Workaround ansible pylibssh issue which causes test failures pylibsshext.errors.LibsshChannelException: Failed to open_session: [-2] PR ansible/pylibssh#756 is under review but workaround it in the meantime.
Ok, setting the libssh timeout is the timeout you are giving to the libssh to return to you. but if you are setting the low timeout to get the signals delivered, then either pylibssh or the caller needs to retry. The pylibssh code is really not written to support the retries around here so my proposal would be to create some pylibssh timeout/retry counter to avoid infinite cycle when stuff will go wrong. What do you think? It can be either separate pyblissh option, or it can be somehow intercepted when we set the libssh timeout to set it to some multiply of the user specified value to return the handling to the python code. Or the second option by default with possible override. And obviously we need some tests with this option, otherwise its untested broken code. I bet we can get some slow CI runners where this would demonstrate from time to time. |
SUMMARY
When a low SSH options timeout value is set, we see sometimes that calls to
new_channel()
andssh_channel_open_session
fail when libssh returnsSSH_AGAIN
. Currently, pylibssh returns an exception:SSH_AGAIN return code is documented https://api.libssh.org/master/group__libssh__channel.html#gaf051dd30d75bf6dc45d1a5088cf970bd
It is not clearly stated this but SSH_AGAIN also happens due to timeout.
ISSUE TYPE
ADDITIONAL INFORMATION
This issue happens in our https://github.com/next-actions/pytest-mh project.
CC @pbrezina