Skip to content

Conversation

@synical
Copy link

@synical synical commented Apr 22, 2017

I have noticed that on systems with a high value for SC_OPEN_MAX, tcollector
can cause spikes in CPU utilization when it runs a collector process.

The reason for this is that close_fds calls close() on every integer
between 3 and SC_OPEN_MAX before executing the child process, regardless
of if those file descriptors are currently open or not. On a
system with say a value of 1 million for SC_OPEN_MAX this is a pretty
expensive operation, even more so if that system is running something
like auditd/kaudit.

It seems that tcollector will typically spawn a short lived collector
that it will kill after a certain amount of time passes, so I don't
think the normal concerns of setting close_fds to False are applicable.

Some strace histogram output to demonstrate (note the error count):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.76    4.463530        3351      1332         5 select
  4.70    0.221257           2    127189    127114 close
...

A python bug report that discusses the same issue:
https://bugs.python.org/issue1663329

I am not clear on all of the potential downsides here so if anyone has
strong opinions on why it is a bad idea to disable close_fds here, I am
all ears.

I have noticed that on systems with a high value for SC_OPEN_MAX, tcollector
can cause spikes in CPU utilization when it runs a collector process.

The reason for this is that close_fds calls close() on every integer
between 3 and SC_OPEN_MAX before executing the child process, regardless
of if those file descriptors are currently open or not. On a
system with say a value of 1 million for SC_OPEN_MAX this is a pretty
expensive operation, even more so if that system is running something
like auditd/kaudit.

It seems that tcollector will typically spawn a short lived collector
that it will kill after a certain amount of time passes, so I don't
think the normal concerns of setting close_fds to False are applicable.

Some strace histogram output to demonstrate (note the error count):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.76    4.463530        3351      1332         5 select
  4.70    0.221257           2    127189    127114 close
...

A python bug report that discusses the same issue:
https://bugs.python.org/issue1663329

I am not clear on all of the potential downsides here so if anyone has
strong opinions on why it is a bad idea to disable close_fds here, I am
all ears.
@johann8384
Copy link
Member

@synical I know this is three years old. Any chance you can rebase and make this a config option? It can default to "False"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants