Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper/batchsender] batchsender deadlock preventing shutdown #10255

Closed
timannguyen opened this issue May 29, 2024 · 0 comments · Fixed by #10258
Closed

[exporterhelper/batchsender] batchsender deadlock preventing shutdown #10255

timannguyen opened this issue May 29, 2024 · 0 comments · Fixed by #10258
Assignees
Labels
bug Something isn't working

Comments

@timannguyen
Copy link
Contributor

timannguyen commented May 29, 2024

Describe the bug

deadlock during batchsender shutdown

  1. main goroutine shutdowns
  2. main goroutine close(shutdownch) https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go#L217
  3. ticker goroutine tries to obtain lock https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go#L69
  4. sending goroutine already have the lock and is trying to flush https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go#L197
  5. Deadlock because ticker goroutine will never be able to obtain the lock and to receive resetTimerCh sending goroutine is waiting to push

Steps to reproduce

I have a unit test to reproduce the issue timannguyen@adf2fb9

the mergeFunc would need to take a bit of time in the sendMergeBatch during shutdown to cause this deadlock

What did you expect to see?

to shutdown without deadlock

What did you see instead?

deadlock when a sending goroutine holding to the lock while the ticket goroutine is trying to get the lock. This prevents shutdown

What version did you use?

pdata 1.7.0
otel 0.100.0

What config did you use?

Environment

MACOS

ubuntu

Additional context

@timannguyen timannguyen added the bug Something isn't working label May 29, 2024
@dmitryax dmitryax self-assigned this May 29, 2024
dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue May 29, 2024
dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue May 29, 2024
dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue May 29, 2024
dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue May 29, 2024
dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants