Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Servers connection settings #22

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dmccarthy-bluefin
Copy link
Contributor

Here's the first draft of my reworking the connection and reconnection logic. Open issue: #19

I've added four new connection config options that roughly match up with the options on the nodejs/deno client libs. I'm creating this as a draft pull request as there are a few items that I need feedback on. They are:

  1. I've added a recursive call to the connect function. Phan is complaining about it but I can't come up with an alternative that doesn't rely on the recursive call.
  2. For the reconnectTimeWait param I went with milliseconds whereas other time based params are in seconds. Should we consider switching to milliseconds for all time based params.
  3. I have unit tests for handling the server INFO message updates, but no functional tests as I can't figure out how to put the server into lame duck mode from php.
  4. I'm not happy with the name I came up with for the ServersManager class. Can you think of anything better?

…ettings. Also added support for servers entering lame duck mode and adding gossiped servers to the pool. New config options add a delay between reconnection attempts and set a max retry attempts limit.

Added support for specifying more than one server in the connection settings. Also added support for servers entering lame duck mode and adding gossiped servers to the pool. New config options add a delay between reconnection attempts and set a max retry attempts limit.
@dmccarthy-bluefin
Copy link
Contributor Author

Can someone please try run that Action again. It errored out pulling from Docker.

@dmccarthy-bluefin dmccarthy-bluefin marked this pull request as ready for review September 13, 2022 12:40
@nekufa
Copy link
Member

nekufa commented Sep 14, 2022

Hi @dmccarthy-bluefin

  1. We can reuse maxReconnects Allowed as parameter for connect.
    This way it will have same limit as server manager next Servers call.
    So, signature will be connect($limit = null) and when you call it recursively use ($limit ? $limit - 1 : $this->pool->maxReconnectsAllowed);

  2. Maybe reconnectTimeWait parameter can be float? Like delay option - you simply set 0.001 for 1ms.

  3. I still have same question. Maybe try to find how other clients are testing this?

  4. I suggest to rename Servers Manager to Server Pool, it is more familiar.

In my opinion, on a single line comment there should be a space before the message.

@nekufa
Copy link
Member

nekufa commented Sep 26, 2022

hi @dmccarthy-bluefin, sorry for delay - i was on sick leave.
looks good for me, maybe also change property name and will merge this?

@dmccarthy-bluefin
Copy link
Contributor Author

@nekufa can you hold off please? I just about to deploy it on some servers with long running processes. I would like to see how the reconnection logic behaves over a prolonged period.

@nekufa
Copy link
Member

nekufa commented Sep 26, 2022

@dmccarthy-bluefin sure! I was just worried that the case is paused, but if that is okay with you - no problem at all, keep in touch

@tachigami
Copy link

I've added four new connection config options that roughly match up with the options on the nodejs/deno client libs

@dmccarthy-bluefin This is excellent work, man.
To be consistent, I think it should be possible to accept URLs with the protocol.

@dmccarthy-bluefin
Copy link
Contributor Author

@TeroBlaZe are you referring to the config option below? I've found that specifying a protocol can cause problems when dealing with TLS. Nats is a little strange in that the client is expected to connect over plain socket and flip over to TLS when instructed by the cluster. You can't just start off with a TLS connection. If you set nats://localhost:4222 PHP will throw an error when you convert the socket to TLS. If you set it to tls://localhost:4222 the initial connection will get rejected.

'servers' => ['localhost:4222', 'localhost:4221', 'localhost:4220']

@nekufa
Copy link
Member

nekufa commented Feb 6, 2023

@dmccarthy-bluefin hi!
are there any updates on this contribution?
looks good for me

@dmccarthy-bluefin
Copy link
Contributor Author

dmccarthy-bluefin commented Feb 7, 2023

Hi @nekufa. Unfortunately it didn't work out in my tests and I had to drop this approach in favour of another. In saying that just because it didn't work for me doesn't mean it it won't work for others. There may also be a way for you to salvage some of my code.

The main reason why it didn't work for us is because we deploy our PHP code on the roadrunner application server (https://roadrunner.dev). This means our PHP threads live for longer than the PHP standard one http request. I had hoped that my alterations to this lib would allow me maintain persistent connections to the nats.io cluster which would provide a significant performance boost. It did work... until the number of messages back and forth slowed. When that happened the nat.io cluster sent a PING request to the PHP client to see if it was still there. In order to fulfil the nats.io server/client contract the client is supposed to reply with a PONG response. Unfortunately there is no mechanism in the lib, core PHP or in Roadrunner that would allow me attach a callback that could trigger the PONG response. I found a number of 3rd party libs that implement callbacks, but they aren't compatible with my roadrunner workers as both wanted to be the main PHP process.

In the end I had to drop this lib and write a roadrunner plugin (in golang) that my PHP code makes an RPC calls to in order to communicate with the nats.io cluster. Roadrunner maintains the persistent connection to the nat.io cluster and the RPC calls are always to localhost so this plugin performs well. It's not ideal as I do have to deal with RPC overhead but it's a lot faster than constantly connecting to the nats.io cluster (using JWT auth).

You might be able to salvage some of this code. If someone is using basic auth (or no auth) then they might get some value from this PR in a single request per thread setup. The JWT and NKEY auth methods are pretty expensive and can take some time to complete. They require a number of messages to be sent back and forth between the client and the server. They were designed for persistent connections.

Basic auth is much simpler. On a single request per thread model this PR will give the benefit of distributing connections between different the different nodes in the cluster (randomisation feature). If the node it tries to connect to is down it will attempt to connect to the next one in it's list. I don't see much more value from this PR beyond that for single request per thread setup. If this is your target audience I would suggest that you extract the relevant parts of this code and create a new PR with just that.

As it is now this PR isn't suitable for long running processes. However, with some adjustments you should be able to get it working with a 3rd party callback lib. Maybe a better approach would be to give the person using the lib the ability to interact with the PING/PONG functionality and let them choose their own callback lib. I've a feeling that this could be tricky as (if I remember correctly) the server sends a PING message as soon as the connection is setup and this lib responds. You'd have to somehow differentiate between this PING and the "you've been idle for too long" PING.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants