Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NATS Jetstream MQT goes in a loop if function returns code other than 200 #118

Open
umeshgtank opened this issue Feb 16, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@umeshgtank
Copy link

Fission/Kubernetes version
Fission version 1.17 / Kubernetes version 1.24

$ fission --version
client:
  fission/core:
    BuildDate: "2022-09-16T13:24:57Z"
    GitCommit: b36e0516
    Version: v1.17.0
server:
  fission/core:
    BuildDate: "2022-09-16T13:24:57Z"
    GitCommit: b36e0516
    Version: v1.17.0

$ kubectl version
Client Version: v1.25.3
Kustomize Version: v4.5.7
Server Version: v1.24.8

Kubernetes platform (e.g. Google Kubernetes Engine)
On-prem

Describe the bug

I am building a data pipeline and workflow looks something like Producer -> NATS Jetstream -> MQT -> Consumer. I am following fission documentation available here - https://fission.io/docs/usage/triggers/message-queue-trigger-kind-keda/nats-jetstream/#producer-function.
While testing the workflow, if I return an error (it is 400) from the consumer function I can see MQT keeps calling consumer function in a loop with the same message and it never stops. To reproduce the issue all you need to do is just return 400 from the handler function of hello.go file. I thought of investigating this further and I came across a keda-connectors code for NATS Jetstream which is available here(https://github.com/fission/keda-connectors/blob/main/nats-jetstream-http-connector/main.go). As we can see in the code, the handleHTTPRequest function ack messages received from Jetstream only if http request is successful. In the case of failure it doesn't send out ack to Jetstream. According to Jetstream documentation (see here https://docs.nats.io/nats-concepts/jetstream/consumers) if ack is not received by the server within the AckWait time, Jetstream will redeliver the message. Since new delivered message is also result in the error (since request is bad) this will go in a loop.

To Reproduce
To reproduce the issue all you need to do is just return 400 from the handler function of hello.go file. The sample is available here - https://fission.io/docs/usage/triggers/message-queue-trigger-kind-keda/nats-jetstream/#producer-function.

Expected result

MQT shouldn't go into the never ending loop

Actual result
We can see MQT keeps calling the consumer function again and again with the same message

Screenshots/Dump file

$ fission support dump

Additional context

May be potential fix would be to just ack the message regardless of the success or failure. And failure scenarios are handled by fission in two different ways. Once is retry and if it fails even after retry messages will be pushed to error queue. So I believe it would be safe to just ack as soon as message is received from the Jetstream. The bigger problem is - in case if authentication fails the function will never get a chance to execute since router will return auth failure error. In such a scenario loop is unavoidable.

@sjk7524068
Copy link

sjk7524068 commented Mar 2, 2023

Big issue indeed. Same bug with Keda-Rabbitmq, Unacked message keep consuming in a loop without any act move.

@NikhilSharmaWe NikhilSharmaWe added the bug Something isn't working label Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants