Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken atopacct blocks atop indefinetely #207

Open
Zugschlus opened this issue Aug 13, 2022 · 7 comments
Open

broken atopacct blocks atop indefinetely #207

Zugschlus opened this issue Aug 13, 2022 · 7 comments

Comments

@Zugschlus
Copy link
Contributor

Hi,

when something goes wrong in atopacct, it keeps a system-wide semaphore which causes subsequent calls to atop to stall indefinetely in

getuid()                                = 1000
setresuid(-1, 1000, -1)                 = 0
semtimedop(1, [{0, -1, SEM_UNDO}, {1, -1, SEM_UNDO}], 2, NULL

This happens when the debian package is installed on a s390x system. Unfortunately, I don't have root on that system and can therefore not see what atopacct does when it happens. The other arches Debian builds for are fine.

Therefore, this issues has two parts:

  1. atopacct should not block the semaphore on s390x systems
  2. atop itself should time out and terminate with a meaningful error message if it cannot obtain the semaphore
@Zugschlus
Copy link
Contributor Author

Due to this issue, atop will be removed from Debian testing next week.

@gleventhal
Copy link
Contributor

Have you tried clearing atopacct state rto resolve the issue?
Something like:
mv /var/run/pacct_shadow.d{,.orig} && systemctl start atopacct

@Zugschlus
Copy link
Contributor Author

The main problem is that I don't see this behavior on any box I have immediate shell access to. I cannot try anything there short of writing a test case, build that test case into an official package and upload this package to Debian. I'd really like to avoid that.

The real showstopper is that atop waits indefinetly and silently for the semaphore until the test is aborted with a timeout. As I wrote in the original bug report, we have two problems there that should both be addressed.

Marc

@Atoptool
Copy link
Owner

Part 2 of the issue has been solved: atop times out after waiting 3 seconds for the semaphore and then continues without process accounting.

@Atoptool
Copy link
Owner

I do not understand part 1 of the issue: in between the claiming of the semaphore in atopacctd and releasing it there are no blocking calls. Even if atopacctd would terminate after claiming the semaphore, the SEM_UNDO flag takes care of releasing the semaphore automatically.

@Atoptool
Copy link
Owner

Is it possible for you to gain root privileges on the test system to issue a system call trace with strace to see where atopacctd blocks?

@Zugschlus
Copy link
Contributor Author

I currently dont have even shell access to the (only) test box that shows the behavior. I'm trying to find out whether atop 2.8.1 passes the test as it's really tight timing to get atop back into Debian testing (Debian is planning to freeze). I apologize for not having this prioritized properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants