Unable to fetch queue messages #147

heretogo · 2022-11-03T16:28:55Z

Hello. I installed the WPA using the script in hack/install.sh.

I am encountering the following error which I believe are permissions or
namespace related. I am running a v1.23 cluster in Amazon EKS.

E1103 15:50:14.855014       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/myAccount/myQueueName

The WPA scaler runs in the kube-system namespace and the WPA and example deployment run in a test namespace called eks-sample-app
The WPA queueURI was configured manually using kubectl edit

$k get pods -n kube-system
NAME                                   READY   STATUS    RESTARTS   AGE
workerpodautoscaler-8667d55684-9zs6l   1/1     Running   0          72m

$k get wpa -n eks-sample-app
NAME          AGE
example-wpa   85m

$k get deployment -n eks-sample-app
NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
example-deployment            1/1     1            1           7m34s

I have attached the following policy to the cluster service role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "WPA",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData",
                "sqs:ReceiveMessage",
                "sqs:GetQueueAttributes"
            ],
            "Resource": "*"
        }
    ]
}

Any idea how to proceed in debugging this?
I have looked through the documentation in the repo's README.md

An unrelated note: what is the context for the WPA Controller section of the
docs? In which context would workerpodautoscaler run be invoked? Is this a standalone
binary?

The text was updated successfully, but these errors were encountered:

alok87 · 2022-11-04T06:46:48Z

Possible to share the complete log?

heretogo · 2022-11-04T12:37:56Z

Hi @alok87 : please see the example included in my issue. The WPA starts spitting out Unable to fetch no of messages messages as soon as the container starts. There are no other kinds of log messages.

Note that I have anonymized the account number and queue name.

E1104 12:42:26.926463       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/<account>/queue
.
E1104 12:42:26.926476       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/<account>/queue
.
E1104 12:42:26.926498       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/<account>/queue
.
E1104 12:42:26.926513       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/<account>/queue
.
E1104 12:42:26.926527       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/<account>/queue
.
E1104 12:42:26.926540       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/<account>/queue
.
E1104 12:42:26.926552       1 sqs.go:406] Unable to fetch no of messages to the queue "queue", Client not found for queue: https://sqs.us-east-1.amazonaws.com/<account>/queue

alok87 · 2022-11-04T12:55:50Z

Does the queue exist in sqs? Possible to try using sqs client with same creds and see data comes?

Just want to rule out the possibility of configuration issue first

heretogo · 2022-11-14T20:44:55Z

I generated temporary credentials manually using the AssumeRole. I believe it is working now.

Previously my node's role permissions included the following as per this policy:

"cloudwatch:GetMetricData"
"sqs:GetQueueAttributes"
"sqs:ReceiveMessage"

It was resolved by granting all read permissions on SQS:

"cloudwatch:GetMetricData"
"sqs:GetQueueAttributes"
"sqs:GetQueueUrl"
"sqs:ListDeadLetterSourceQueues"
"sqs:ListQueueTags"
"sqs:ListQueues"
"sqs:ReceiveMessage"

My restarted WPA no longer logs any errors.

alok87 · 2022-11-15T07:20:21Z

can we close this?

alok87 · 2022-11-15T07:21:23Z

Do you think we should update something in the doc here on policy, https://github.com/practo/k8s-worker-pod-autoscaler#install

heretogo · 2022-11-15T20:42:34Z

I feel like there may be something else missing.

Even though I get no permissions errors, I am unable to trigger a scaling operation on the deployment. Any ideas?

I have 10000+ messages in the queue and only one deployment pod running.

k get pods
NAME                                 READY   STATUS    RESTARTS   AGE
example-deployment-795d868d4-8nzfv   1/1     Running   0          7m19s

Does the WPA require some kind of write or tag attributes?

I can submit a PR for the documentation once I confirm this is working.

alok87 · 2022-11-17T06:55:49Z

WPA has verbosity in logs, may be try that. -v=4

Also share the output of WPA yaml

k get wpa -o yaml <wpa_object>

check if deployment replicas changed with queue length
check the queue length in AWS shows the 1000 messages? sqs metrics picture if posted here can help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to fetch queue messages #147

Unable to fetch queue messages #147

heretogo commented Nov 3, 2022 •

edited

Loading

alok87 commented Nov 4, 2022

heretogo commented Nov 4, 2022 •

edited

Loading

alok87 commented Nov 4, 2022 •

edited

Loading

heretogo commented Nov 14, 2022

alok87 commented Nov 15, 2022

alok87 commented Nov 15, 2022

heretogo commented Nov 15, 2022

alok87 commented Nov 17, 2022

Unable to fetch queue messages #147

Unable to fetch queue messages #147

Comments

heretogo commented Nov 3, 2022 • edited Loading

alok87 commented Nov 4, 2022

heretogo commented Nov 4, 2022 • edited Loading

alok87 commented Nov 4, 2022 • edited Loading

heretogo commented Nov 14, 2022

alok87 commented Nov 15, 2022

alok87 commented Nov 15, 2022

heretogo commented Nov 15, 2022

alok87 commented Nov 17, 2022

heretogo commented Nov 3, 2022 •

edited

Loading

heretogo commented Nov 4, 2022 •

edited

Loading

alok87 commented Nov 4, 2022 •

edited

Loading