Significant CPU usage and possibly etcd usage when deploying this #447

drewwells · 2024-06-10T15:57:33Z

We noticed our ETCD storage usage doubled after doing a production release that included deploying reflector. Is there an architecture document for how this service watches for object changes and decided on API calls to make to kubeapi?

We have one configmap that rarely changes. This is the labels and annotations on it.

metadata:
  annotations:
    checksum/configmap: 4420642124fb6c99affe13e8904ba3ede9bee1d41edc0df8a50696833fe15fca
    reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
    reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
  creationTimestamp: "2024-04-16T20:07:50Z"
  labels:
    reflector.v1.k8s.emberstack.com/reflection-allowed: "true"

Here's the CPU and memory usage of reflector

❯ k -n reflector top po --containers                                                                                        🗑️  env-2a
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-5bc45489b8-k9g7f   reflector   1423m        332Mi

The text was updated successfully, but these errors were encountered:

drewwells · 2024-06-10T16:10:03Z

I see this happening every 3 seconds. Does this service act like a watcher, watching for changes in the cluster? Can we add labels so it only looks at specific configmaps instead of looking at all of them

[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:10.306 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:03.3548677. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:10.306 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:13.336 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapMirror) Auto-reflected feature-flag/ff-feature-flag where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 299.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:13.343 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:03.0375045. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:13.343 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:15.826 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:02.4830903. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:15.826 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:19.845 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapMirror) Auto-reflected feature-flag/ff-feature-flag where permitted. Created 0 - Updated 0 - Deleted 0 - Validated 299.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:19.859 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:00:04.0327364. Faulted: False.
[reflector-5bc45489b8-k9g7f] 2024-06-10 16:08:19.859 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources

winromulus · 2024-06-10T16:59:35Z

@drewwells reflector opens a watcher with a default timeout (in k8s) of around 40 minutes. The fact that the connection closes every 3 seconds is extremely odd. I would need to know more about the setup. Also, are you sure you didn't set the timeout to something like 3 seconds in the configuration?
Please add more details about the host of k8s, if it's k8s or some other variant etc.

drewwells · 2024-06-10T18:02:48Z

Nothing special about the cluster, it's running v1.25.10

winromulus · 2024-06-10T19:09:39Z

@drewwells Is this standard k8s or any other flavor (like k3s or something). Also are you self hosting or using a cloud provider?

drewwells · 2024-06-10T19:53:04Z

it's deployed with kops and the nodes are hosted on AWS. Hmm, usage is vastly different across clusters. The only thing that is consistent is significant ETCD storage usage like 2x before deploying the service.

# staging environment
❯ k -n reflector top po --containers                                                                                                                   
POD                         NAME        CPU(cores)   MEMORY(bytes)
reflector-c786c5fb4-jmqg9   reflector   4m           163Mi

# dev environment
❯ k -n reflector top po --containers                                                                                                                  
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-5bc45489b8-k9g7f   reflector   1154m        322Mi

zzjin · 2024-06-28T04:33:39Z

Same issue here:

# env1
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-5dddff7688-rp6tx   reflector   1403m        205Mi

# env2
POD                          NAME        CPU(cores)   MEMORY(bytes)
reflector-64dcc58c5f-wrh8q   reflector   2636m        370Mi

What I found is that cpu usage is high when there are too many secrets/configmaps.

> kubectl get secrets -A | wc -l

# env1
24121

# env2
88112

Both cluster doing one and only one same thing: copy one given namespace's TLS secret to other namespaces.
witch means, env1 have one base TLS secret and about 2W+ reflected secrets and env2 have one and about 8W+ reflected secrets.
The base secret is barely changed(90d to upgrade).

  annotations:
    cert-manager.io/alt-names: "*.example.io,example.io"
    cert-manager.io/certificate-name: wildcard-example-io
    cert-manager.io/common-name: example.io
    cert-manager.io/ip-sans: ""
    cert-manager.io/issuer-group: ""
    cert-manager.io/issuer-kind: ClusterIssuer
    cert-manager.io/issuer-name: cluster-issuer-example
    cert-manager.io/uri-sans: ""
    reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
    reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: \w+-system,\w+-frontend,ns-[\-a-z0-9]*
    reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
    reflector.v1.k8s.emberstack.com/reflection-auto-namespaces: \w+-system,\w+-frontend,ns-[\-a-z0-9]*
  labels:
    controller.cert-manager.io/fao: "true"

IMO, reflector controller only monitor one namespace's secret and copy it to others when changes happen
Wonder why cpu is related to cluster's all secret counts?

kubernetes is standard deployed on GCP VM.

drewwells · 2024-06-28T12:13:06Z

An easy way to limiter the watchers is using labels. Also usage goes up after it creates configmaps or secrets, I don't think it needs to watch generated resources. If people change them, let it be until next sync wipes out those changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant CPU usage and possibly etcd usage when deploying this #447

Significant CPU usage and possibly etcd usage when deploying this #447

drewwells commented Jun 10, 2024

drewwells commented Jun 10, 2024

winromulus commented Jun 10, 2024

drewwells commented Jun 10, 2024

winromulus commented Jun 10, 2024

drewwells commented Jun 10, 2024

zzjin commented Jun 28, 2024 •

edited

Loading

drewwells commented Jun 28, 2024

Significant CPU usage and possibly etcd usage when deploying this #447

Significant CPU usage and possibly etcd usage when deploying this #447

Comments

drewwells commented Jun 10, 2024

drewwells commented Jun 10, 2024

winromulus commented Jun 10, 2024

drewwells commented Jun 10, 2024

winromulus commented Jun 10, 2024

drewwells commented Jun 10, 2024

zzjin commented Jun 28, 2024 • edited Loading

drewwells commented Jun 28, 2024

zzjin commented Jun 28, 2024 •

edited

Loading