Skip to content

Conversation

@blaggacao
Copy link
Collaborator

Context

The ssh connection to a discovery host leaves a dangling ssh agent.
If the workflow runner itself runs on a persistent runner, that process might just accumulate.

Solution

Clean up after ourselves. Unfortunately, only js actions allow to trigger a post action which is why a slight refactor needed to be done to wrap the bash action inside a js action.

@blaggacao blaggacao requested a review from nrdxp August 31, 2023 10:28
@nrdxp
Copy link
Contributor

nrdxp commented Aug 31, 2023

Is this something you have observed in practice?

@blaggacao
Copy link
Collaborator Author

Is this something you have observed in practice?

Yes, this had reportedly happend in some scenarios. Although, now, on second thought, I doubt it since the discovery is the only persistent runner and it regularly would not set up a ssh connection to itself.

Workers are regularly not persistent so they shouldn't suffer from dangling ssh agents.


Only in the scenario that a worker is serviced by a hosted runner can those be left dangling.

@michalrus do you happen to have more context on what we had observed, in this context?

@michalrus
Copy link

If the workflow runner itself runs on a persistent runner, that process might just accumulate.

I don’t think this is the case, since if an ssh-agent stayed running from a previous job, it will be reused

@michalrus do you happen to have more context on what we had observed, in this context?

Yes, so we’re using NixOS on the discovery host, and the standard GitHub Runner service (services.github-runners.runnerN = { … }defined here).

We’re not setting the .ephemeral option, which causes systemd service restart (and clean-up of not only processes, but also of the work dir), because several other runners are faster without cleaning their work dir.

So the ssh-agent process that this action (used to) launch in the background stayed there between runs, and for whatever reason didn’t react to SIGTERM from systemd when I requested the restart of the systemd service, and I had to kill it manually (or wait 90 s for SIGKILL – this value could also be tweaked).

But now that I think of it, I could just set .ephemeral = true in discovery runners?

Copy link

@michalrus michalrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works, then it’s looking good =) Thanks!

# Auth agent socket to ssh config
echo "IdentityAgent $SSH_AUTH_SOCK" >> "$SSH_CONFIG_FILE"
# Save pid to cleanup in post step
echo "SSH_AGENT_PID=$SSH_AGENT_PID" >> "$GITHUB_STATE"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So later GHA export that automatically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants