Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock one single iteration test #1253

Open
Tracked by #1138
ankur22 opened this issue Mar 21, 2024 · 5 comments
Open
Tracked by #1138

Deadlock one single iteration test #1253

ankur22 opened this issue Mar 21, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@ankur22
Copy link
Collaborator

ankur22 commented Mar 21, 2024

Test run: 2568554, 2571988 and 2591015, 2840522.

@ankur22 ankur22 added the bug Something isn't working label Mar 21, 2024
@inancgumus
Copy link
Member

inancgumus commented Mar 29, 2024

Any idea why don't we use vu.Context() here?

func (m *RootModule) NewModuleInstance(vu k6modules.VU) k6modules.Instance {
    // ...
                browserRegistry: newBrowserRegistry(
                    context.Background(),   // why isn't this vu.Context()?
...

When I looked into this stack trace (of test run ID 2591015), it's clear that Sesssion's and event handler's loops get stuck. The root cause is probably something else, but they still look suspicious to me.

stderr_contents	
Logger format: RAW SIGQUIT: quit PC=0x40720e m=0 sigcode=0

goroutine 1 [chan receive, 38 minutes]:
    runtime.gopark(0xc000bf0830?, 0x4105e5?, 0x80?, 0x44?, 0x40?)
    runtime/proc.go:398 +0xce fp=0xc000bf07c8 sp=0xc000bf07a8 pc=0x43e1ee
    runtime.chanrecv(0xc00117af00, 0xc000bf0a28, 0x1)
    runtime/chan.go:583 +0x3cd fp=0xc000bf0840 sp=0xc000bf07c8 pc=0x409bcd
    runtime.chanrecv1(0x1b97aa0?, 0xc000c3f8f0?)
    runtime/chan.go:442 +0x12 fp=0xc000bf0868 sp=0xc000bf0840 pc=0x4097d2
        go.k6.io/k6/execution.(*Scheduler).Run(0xc000884000, {0x1b97ad8?, 0xc0002de230}, {0x1b97aa0, 0xc0004fc6c0}, 0xc00218e360)
        go.k6.io/k6/execution/scheduler.go:508 +0xd9d fp=0xc000bf0e48 sp=0xc000bf0868 pc=0xa4e7dd
        go.k6.io/k6/cmd.(*cmdRun).run(0xc000120570, 0xc0004c2a00, {0xc0004d0300?, 0x1, 0x62})
        go.k6.io/k6/cmd/run.go:395 +0x1671 fp=0xc000bf1c48 sp=0xc000bf0e48 pc=0x143ed91

        github.com/grafana/xk6-browser/common.(*Session).readLoop(0xc00c39b0e0)
        github.com/grafana/xk6-browser@v1.4.3/common/session.go:83 +0x99 fp=0xc005dfd7c8 sp=0xc005dfd6b8 pc=0x1243e19
        github.com/grafana/xk6-browser/common.NewSession.func1()
        github.com/grafana/xk6-browser@v1.4.3/common/session.go:47 +0x25 fp=0xc005dfd7e0 sp=0xc005dfd7c8 pc=0x1243b25
    runtime.goexit()
    runtime/asm_amd64.s:1650 +0x1 fp=0xc005dfd7e8 sp=0xc005dfd7e0 pc=0x471421 created by
        github.com/grafana/xk6-browser/common.NewSession in

goroutine 279771
    github.com/grafana/xk6-browser@v1.4.3/common/session.go:47 +0x365

goroutine 272270 [select, 6 minutes]:
    runtime.gopark(0xc011aa7788?, 0x2?, 0x50?, 0x76?, 0xc011aa777c?)
    runtime/proc.go:398 +0xce fp=0xc011aa7628 sp=0xc011aa7608 pc=0x43e1ee
    runtime.selectgo(0xc011aa7788, 0xc011aa7778, 0xc026c116e0?, 0x0, 0xc001f00860?, 0x1)
    runtime/select.go:327 +0x725 fp=0xc011aa7748 sp=0xc011aa7628 pc=0x44e625
        github.com/grafana/xk6-browser/common.(*BaseEventEmitter).syncAll(0xc0021badc0, {0x1b97ad8, 0xc000d51cc0})
        github.com/grafana/xk6-browser@v1.4.3/common/event_emitter.go:126 +0xa5 fp=0xc011aa77b8 sp=0xc011aa7748 pc=0x11ed0a5
        github.com/grafana/xk6-browser/common.NewBaseEventEmitter.func1()
        github.com/grafana/xk6-browser@v1.4.3/common/event_emitter.go:116 +0x28 fp=0xc011aa77e0 sp=0xc011aa77b8 pc=0x11ecfc8
    runtime.goexit()
    runtime/asm_amd64.s:1650 +0x1 fp=0xc011aa77e8 sp=0xc011aa77e0 pc=0x471421 created by
        github.com/grafana/xk6-browser/common.NewBaseEventEmitter in

goroutine 271907
    github.com/grafana/xk6-browser@v1.4.3/common/event_emitter.go:116 +0x199 rax 0xfffffffffffffffc rbx 0x4 rcx 0x40720e rdx 0x80 rdi 0x4 rsi 0x7ffff87436d8 rbp 0x7ffff8743650 rsp 0x7ffff8743608 r8 0x0 r9 0x0 r10 0x31 r11 0x246 r12 0x7ffff87436e0 r13 0xc00006b744 r14 0x29be0a0 r15 0x0 rip 0x40720e rflags 0x246 cs 0x33 fs 0x0 gs 0x0

Even if we use VU context, why would k6-core cancel the context is another question, though.... But it might still be better than getting nothing out of the errors. At least, we can log something when the context is canceled due to timeout, etc.

These components (session and event loop) are suspicious because there are no other goroutines left but them. There is probably some bug in them. They probably both want to receive it, but nobody sends something to the channels they listen to. Maybe we can build some circuitry in the module so that when all subcomponents finish, we can stop running these two.

@ankur22
Copy link
Collaborator Author

ankur22 commented Apr 2, 2024

Any idea why don't we use vu.Context() here?

Here's the reason why we're using a background context.

I still need to take a deep dive into your findings though to align properly. Thanks for investigating @inancgumus!

@ankur22
Copy link
Collaborator Author

ankur22 commented Apr 2, 2024

These components (session and event loop) are suspicious because there are no other goroutines left but them. There is probably some bug in them. They probably both want to receive it, but nobody sends something to the channels they listen to. Maybe we can build some circuitry in the module so that when all subcomponents finish, we can stop running these two.

Yep, I agree with the idea of building something in that can force close any remaining goroutines. Maybe, If a goroutine isn't listening to a context, it could register itself to a global goroutineRegister that can be iterated over from when the exitEvent is received, which would call a new forceClose method.

The forceClose method could close a forceClosed channel which these goroutines would need to select on. I prefer the idea of adding a new method and channel over reusing existing channels so that it's more explicit that the goroutine is being force to close, which will also help us debug the issue and find a more long term solution to the problem.

@ankur22
Copy link
Collaborator Author

ankur22 commented May 30, 2024

I found a stack trace which might be useful in helping to diagnose the issue or at least use a symptom to link failures to this issue:

goroutine 0 gp=0x29f8360 m=0 mp=0x29f8d40 [idle]:
runtime.futex(0x29f8e80, 0x80, 0x0, 0x0, 0x0, 0x0)
    runtime/sys_linux_amd64.s:557 +0x21 fp=0x7fff3ff27120 sp=0x7fff3ff27118 pc=0x47a6a1
runtime.futexsleep(0x7fff3ff27198?, 0x410110?, 0x13ff271c8?)
    runtime/os_linux.go:69 +0x30 fp=0x7fff3ff27170 sp=0x7fff3ff27120 pc=0x43a5d0
runtime.notesleep(0x29f8e80)
    runtime/lock_futex.go:170 +0x87 fp=0x7fff3ff271a8 sp=0x7fff3ff27170 pc=0x4102a7
runtime.mPark(...)
    runtime/proc.go:1761
runtime.stopm()
    runtime/proc.go:2782 +0x8c fp=0x7fff3ff271d8 sp=0x7fff3ff271a8 pc=0x44514c
runtime.findRunnable()
    runtime/proc.go:3512 +0xd5f fp=0x7fff3ff27350 sp=0x7fff3ff271d8 pc=0x446cbf
runtime.schedule()
    runtime/proc.go:3868 +0xb1 fp=0x7fff3ff27388 sp=0x7fff3ff27350 pc=0x447d91
runtime.park_m(0xc00058c1c0)
    runtime/proc.go:4036 +0x1ec fp=0x7fff3ff273e0 sp=0x7fff3ff27388 pc=0x44836c
runtime.mcall()
    runtime/asm_amd64.s:458 +0x4e fp=0x7fff3ff273f8 sp=0x7fff3ff273e0 pc=0x4769ce

goroutine 1 gp=0xc0000061c0 m=nil [chan receive, 38 minutes]:
runtime.gopark(0x29f8d40?, 0xc0013a49a0?, 0x0?, 0x0?, 0x0?)
    runtime/proc.go:402 +0xce fp=0xc0005aa990 sp=0xc0005aa970 pc=0x440b4e
runtime.chanrecv(0xc001e29d40, 0xc000efcb80, 0x1)
    runtime/chan.go:583 +0x3bf fp=0xc0005aaa08 sp=0xc0005aa990 pc=0x40a71f
runtime.chanrecv1(0x1bb81d0?, 0xc0008a9230?)
    runtime/chan.go:442 +0x12 fp=0xc0005aaa30 sp=0xc0005aaa08 pc=0x40a332
go.k6.io/k6/execution.(*Scheduler).Run(0xc0003f8000, {0x1bb8208, 0xc00009b400}, {0x1bb81d0, 0xc0003c2ba0}, 0xc00232b4a0)
    go.k6.io/k6/execution/scheduler.go:508 +0xd3e fp=0xc0005aafc8 sp=0xc0005aaa30 pc=0xa5cd5e
go.k6.io/k6/cmd.(*cmdRun).run(0xc000050820, 0xc0001c78c8, {0xc0001f0008, 0x1, 0x32})
    go.k6.io/k6/cmd/run.go:395 +0x15f1 fp=0xc0005abc50 sp=0xc0005aafc8 pc=0x1451931
go.k6.io/k6/cmd.(*cmdRun).run-fm(0xc0001e8800?, {0xc0001f0008?, 0x4?, 0x17ec151?})
    <autogenerated>:1 +0x34 fp=0xc0005abc88 sp=0xc0005abc50 pc=0x145c534
github.com/spf13/cobra.(*Command).execute(0xc0001c78c8, {0xc0
... omitting 1312447 bytes ...
    runtime/proc.go:402 +0xce fp=0xc009823598 sp=0xc009823578 pc=0x440b4e
runtime.selectgo(0xc009823790, 0xc009823700, 0x180b369?, 0x0, 0xc009823780?, 0x1)
    runtime/select.go:327 +0x725 fp=0xc0098236b8 sp=0xc009823598 pc=0x4528e5
github.com/grafana/xk6-browser/common.(*Session).readLoop(0xc01087f040)
    github.com/grafana/xk6-browser@v1.5.1/common/session.go:83 +0x99 fp=0xc0098237c8 sp=0xc0098236b8 pc=0x128dcd9
github.com/grafana/xk6-browser/common.NewSession.gowrap1()
    github.com/grafana/xk6-browser@v1.5.1/common/session.go:47 +0x25 fp=0xc0098237e0 sp=0xc0098237c8 pc=0x128d9e5
runtime.goexit({})
    runtime/asm_amd64.s:1695 +0x1 fp=0xc0098237e8 sp=0xc0098237e0 pc=0x478881
created by github.com/grafana/xk6-browser/common.NewSession in goroutine 6444541
    github.com/grafana/xk6-browser@v1.5.1/common/session.go:47 +0x35d

goroutine 6447058 gp=0xc00576e1c0 m=nil [select, 9 minutes]:
runtime.gopark(0xc006d78f90?, 0x2?, 0x90?, 0xf4?, 0xc006d78f04?)
    runtime/proc.go:402 +0xce fp=0xc006d78d98 sp=0xc006d78d78 pc=0x440b4e
runtime.selectgo(0xc006d78f90, 0xc006d78f00, 0x0?, 0x0, 0x0?, 0x1)
    runtime/select.go:327 +0x725 fp=0xc006d78eb8 sp=0xc006d78d98 pc=0x4528e5
github.com/grafana/xk6-browser/common.(*Session).readLoop(0xc0052214a0)
    github.com/grafana/xk6-browser@v1.5.1/common/session.go:83 +0x99 fp=0xc006d78fc8 sp=0xc006d78eb8 pc=0x128dcd9
github.com/grafana/xk6-browser/common.NewSession.gowrap1()
    github.com/grafana/xk6-browser@v1.5.1/common/session.go:47 +0x25 fp=0xc006d78fe0 sp=0xc006d78fc8 pc=0x128d9e5
runtime.goexit({})
    runtime/asm_amd64.s:1695 +0x1 fp=0xc006d78fe8 sp=0xc006d78fe0 pc=0x478881
created by github.com/grafana/xk6-browser/common.NewSession in goroutine 6444541
    github.com/grafana/xk6-browser@v1.5.1/common/session.go:47 +0x35d

@inancgumus
Copy link
Member

Happened again: 2840522

@inancgumus inancgumus self-assigned this Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants