Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query ingesters in different clusters for in-memory/WAL logs without forcing membership #13353

Open
Bear-LB opened this issue Jun 28, 2024 · 1 comment
Labels
type/feature Something new we should do type/question

Comments

@Bear-LB
Copy link

Bear-LB commented Jun 28, 2024

I've made a configuration for Loki that consists of 2 Loki clusters that are separated from eachother in each network zone and they use the same storage platform.
This architecture somewhat works.. But there's a problem when using a single querier and trying to retrieving in-memory/WAL logs from seperate Loki clusters.
Will refer to a picture below..
Protected Network-Zone are allowed to send egress traffic to Exposed Network-Zone.
Exposed Network-Zone are not allowed to send ingress traffic to Protected Network-Zone.
My goal is I want to be able to read live-logs from either ingester in each network-zone from the querier.
Problem is I can't read in-memory/WAL logs from the Exposed Network-Zone. I have to wait until ingester-B have filled out it's Chunks and sent it to the cloud storage.

For querier-A to query ingester-B inmemory/WAL logs I have to add ingester-B to querier-A's memberlist, i could not figure out any other way, the querier will however unpremediated tell the other Loki components to add it to their ring, and that declares all the components in every cluster to think they're in the same cluster
And it will error and make the cluster unhealthy since the components in Exposed Network-Zone can't start and send new network traffic to components in the Protected Network Zone.

I'm looking for a suggestion on a alternative configuration on how to accomplish my goal with the same architecture.
Otherwise the solution i'd like would be a configuration option for querier component to read from additional ingester without forcing membership or a ring.

image

@Bear-LB Bear-LB changed the title Query other ingesters in other clusters for in-memory/WAL logs without forcing membership Query ingesters in different clusters for in-memory/WAL logs without forcing membership Jun 28, 2024
@JStickler JStickler added type/feature Something new we should do type/question labels Jul 1, 2024
@Bear-LB
Copy link
Author

Bear-LB commented Jul 2, 2024

For some reason i got it kind off working with a hack... It still does not work perfectly
Usually Loki should use the same configuration for the same cluster all the way around.
But to solve my issue the querier must have a unique configuration file compared to the writer.
Ingester-A:

    ingester:
      lifecycler:
        availability_zone: protected-zone
        ring:
          excluded_zones: exposed-zone

Querier-A:

    ingester:
      lifecycler:
        availability_zone: protected-zone
Ingester-B in exposed-zone:
  ingester:
    lifecycler:
      availability_zone: exposed-zone
      ring:
        excluded_zones: protected-zone

Means i only tell the querier that it should not exclude zones...
the querier actually seems to change behavior by reading the configuration in the ingester: section...
I would have thought the only component to change behavior by reading the ingester: config would be the ingester component...

There's still membership between the 2 clusters but no component in either cluster seems to get unhealthy even though they don't have a fully communicating ring because of the firewall.

Protected-zone cluster must have rejoin_interval: set. Or else protected-zone-cluster will not try to re-read join_members: and re-invite exposed-zone cluster to its cluster in case exposed-zone components gets
scaled/restarted since exposed-zone cluster can't advertise itself to its counterpart cluster because of firewall.

So everything somewhat works but I can't seem to stop exposed-zone from trying to gossip to protected-zone.. Which generates a ton of annoying logs...

level=warn ts=2024-07-02T17:09:31.708191396Z caller=tcp_transport.go:440 component="memberlist TCPTransport" msg="WriteTo failed" addr=172.29.1.251:7946 err="dial tcp 172.29.1.251:7946: i/ │
│ o timeout"

I've tried these configurations with very high values or zero. Changing them did nothing to change the frequency or stop the logs from getting spewed.

  gossip_interval: 
  gossip_nodes: 
  pull_push_interval: 
  retransmit_factor:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Something new we should do type/question
Projects
None yet
Development

No branches or pull requests

2 participants