-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[connector/failover] Failover connector erroneously flips back to lower priority pipelines #32094
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I'm aware that the failover connector isn't official yet fwiw, but we're testing it internally so I figure I should create tickets for the issues we find |
The fix is pretty simple - sinkingpoint@ad1b387 but it's 10pm and I'm too tired to write a proper test for it. I'll do so in the morning 😅 |
Thank you! This is especially helpful for newer components. |
Hey @sinkingpoint, thanks for creating this issue I did also notice this bug and was going to include the fix in my next PR. I do think your fix might need another modification, as I think the result you'll get with your fix is that when the higher priority pipeline (lets call it level 1) comes back up, it will switch to level 1 but then will switch one more time to level 2 and then stay there. After the new currentIndex is set it will proceed right into the next iteration of the loop and then the Something like this resolves this issue
Feel free to add a fix, or let me know if you'd like me to, thanks! |
Ah, that makes more sense. If you've got a PR going then I'm happy to let you deal with it there :) |
Description: This PR adds a bug fix that caused the pipeline selector to continue switching between the stable and stable + 1 index after a new stable index has been established. Link to tracking Issue: Resolves #32094 Testing: Additional test case added to check current index after stable check --------- Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the failover connector to the contrib distro and moves the component to alpha state as all MVP functionality has been put in place. This PR also adds a bug fix that caused the pipeline selector to continue switching between the stable and stable + 1 index after a new stable index has been established. **Link to tracking Issue:** <Issue number if applicable> Resolves #32094 **Testing:** <Describe what testing was performed and which tests were added.> Additional test case added to check current index after stable check
Description: This PR adds a bug fix that caused the pipeline selector to continue switching between the stable and stable + 1 index after a new stable index has been established. Link to tracking Issue: Resolves open-telemetry#32094 Testing: Additional test case added to check current index after stable check --------- Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the failover connector to the contrib distro and moves the component to alpha state as all MVP functionality has been put in place. This PR also adds a bug fix that caused the pipeline selector to continue switching between the stable and stable + 1 index after a new stable index has been established. **Link to tracking Issue:** <Issue number if applicable> Resolves open-telemetry#32094 **Testing:** <Describe what testing was performed and which tests were added.> Additional test case added to check current index after stable check
Component(s)
connector/failover
What happened?
Description
The failover connector periodically retries higher priority pipelines that have failed, so that it can reinstate them as the stable pipeline should they start working again. We observe however that when it does so, it then reinstates the lower priority pipeline, even when the higher priority pipeline is working.
Steps to Reproduce
nc -l 127.0.0.1 4278 # the high priority exporter
Expected Result
The logs should be stably redirected to the high priority exporter once it comes back online
Actual Result
The logs flip flop between the high and low priority exporters
Investigation
Adding a bit more logging around pipeline decisions finds that the lower priority pipeline is being re-inserted at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/failoverconnector/internal/state/pipeline_selector.go#L105-L107
This is because the loop terminates for pipelines after, but including the current pipeline (https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/failoverconnector/internal/state/pipeline_selector.go#L96-L98). This means that while the lower priority pipeline is active, it creates a job that makes it active again, even if we select a higher priority pipeline.
Collector version
master (failover connector isn't released yet)
Environment information
Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: