Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock: Navigation stops when request interception is enabled #1072

Closed
10 tasks done
Tracked by #1138
inancgumus opened this issue Oct 18, 2023 · 2 comments · Fixed by #1078
Closed
10 tasks done
Tracked by #1138

Deadlock: Navigation stops when request interception is enabled #1072

inancgumus opened this issue Oct 18, 2023 · 2 comments · Fixed by #1078
Assignees
Labels
bug Something isn't working internal internal improvements and features network manager Issues and PRs related to network manager remote remote browser related

Comments

@inancgumus
Copy link
Member

inancgumus commented Oct 18, 2023

NetworkManager deadlocks while racing with two different browser network events:

  • onLoadingFailed
  • onRequestPaused

This issue only happens when we have outgoing requests that are in the k6-blocked hosts and IPs list. The outgoing requests won't let other communications with the browser go through. When we remove the network.GetResponseBody call (called by emitResponseMetrics when a request loading finishes or we're handling a redirect), the test run finishes.

We discovered how and why this problem occurs, but the underlying reason why this is happening has yet to be entirely apparent. Find a way to reproduce this behavior with a Go test, discover the underlying reason, report it, and fix the problem.

Update: Root cause discovered

The main page redirects to another page before finishing its loading. The redirected page gets blocked by the network interception (blocked hosts and IPs) analysis stage. The problem was that we needed to tell the redirected page to continue. But we couldn't because we were waiting for the main page to load. Hence the deadlock.

xk6-browser version

v1.1.0

Steps to reproduce the problem

  • Copy the script from the test run ID: 161224.
  • Run the test script locally with a blocklist (no matter which).

Relevant links

Tasks

  1. refactor
    inancgumus
  2. bug
    inancgumus
  3. area: browser bug
    inancgumus
  4. bug
    inancgumus
  5. inancgumus
@inancgumus inancgumus added the bug Something isn't working label Oct 18, 2023
@inancgumus inancgumus self-assigned this Oct 18, 2023
@inancgumus inancgumus changed the title Test runs never finish when a navigation exception occurs when the request blocking is enabled Test runs never finish when a navigation exception occurs and the request blocking is enabled Oct 18, 2023
@inancgumus inancgumus added the remote remote browser related label Oct 18, 2023
@inancgumus
Copy link
Member Author

inancgumus commented Oct 18, 2023

We receive and process the following and then send a getResponseBody:

<- {"method":"Network.loadingFinished","params":{"requestId":"51317.37","timestamp":24102.926893,"encodedDataLength":6759,"shouldReportCorbBlocking":false},"sessionId":"2D176DA231707745A2E7710694B2A890"}
-> {"id":147,"method":"Network.getResponseBody","params":{"requestId":"51317.37"}}

getResponseBody is intercepted by the network request blocker and waits for us to tell it to continue, but we never do because we're blocked in the same NetworkManager goroutine, calling getResponseBody in loadingFinished. The following is never processed:

<- {"method":"Network.requestWillBeSent","params": ...}
<- {"method":"Fetch.requestPaused","params":{"requestId":"interception-job-20.0" ...}
<- {"method":"Network.loadingFailed","params":{"requestId":"51317.38", ...}

We need to unblock the request in NetworkManager.onRequestPaused. When we run each network event in a different goroutine, the issue resolves.


So, the issue is getting the response body while the request is intercepted:

  • We can't get the response body while the request is blocked in NetworkManager.onLoadingFinished.
  • To continue the request, we need to handle the NetworkManager.onRequestPaused.

This deadlocks because both are on the same goroutine.

We can fix the issue by running onLoadingFinished in another goroutine:

case *network.EventLoadingFinished:
    go m.onLoadingFinished(ev)

This results in the following:

Network.loadingFinished: RequestID:"82697.70"
-> {"id":177,"sessionId":"12A27C27647CF8CA2F6D3178FB1415D0","method":"Network.getResponseBody","params":{"requestId":"82697.70"}}

Fetch.requestPaused: RequestID:"interception-job-33.0", NetworkID:"82697.71"
-> {"id":178,"sessionId":"12A27C27647CF8CA2F6D3178FB1415D0","method":"Fetch.continueRequest","params":{"requestId":"interception-job-33.0"}}
<- {"id":178,"result":{},"sessionId":"12A27C27647CF8CA2F6D3178FB1415D0"}

<- {"id":177,"result":{"body":"..."} ✅

@inancgumus inancgumus changed the title Test runs never finish when a navigation exception occurs and the request blocking is enabled Test runs never finish when request blocking is enabled Oct 18, 2023
@inancgumus inancgumus changed the title Test runs never finish when request blocking is enabled Fix request interception deadlock while navigation to another page Oct 23, 2023
@inancgumus inancgumus changed the title Fix request interception deadlock while navigation to another page Deadlock: Navigation stops when request interception is enabled Oct 23, 2023
@inancgumus inancgumus added the network manager Issues and PRs related to network manager label Oct 23, 2023
@inancgumus
Copy link
Member Author

inancgumus commented Oct 25, 2023

We've solved this issue in #1078. You can read about the root cause in Update: Root cause discovered heading in this issue's description.

However, the solution can be suboptimal since it uses Go routines for each response metric emission. Although these goroutines should be short-lived and won't cause any performance issues down the line, stack traces can be confusing. Especially when something goes wrong, we want to analyze the logs (since goroutines dump their state on panic).

We can find a better solution that carefully tracks network events. Then we can postpone fetching the body and do that when the correct time comes. I'll create another issue to find a better solution.

cc: @grafana/k6-browser

inancgumus added a commit that referenced this issue Oct 26, 2023
This is a prior to fixing the bug #1072.
inancgumus added a commit that referenced this issue Oct 26, 2023
This is a prior to fixing the bug #1072.
inancgumus added a commit that referenced this issue Oct 26, 2023
This is a prior to fixing the bug #1072.
inancgumus added a commit that referenced this issue Oct 26, 2023
This is a prior to fixing the bug #1072.
inancgumus added a commit to grafana/k6 that referenced this issue Oct 27, 2023
inancgumus added a commit to grafana/k6 that referenced this issue Oct 27, 2023
inancgumus added a commit that referenced this issue Oct 27, 2023
This is a prior to fixing the bug #1072.
inancgumus added a commit that referenced this issue Oct 31, 2023
This is a prior to fixing the bug #1072.
@inancgumus inancgumus added the internal internal improvements and features label Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working internal internal improvements and features network manager Issues and PRs related to network manager remote remote browser related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant