MSC4140: Delayed events (Futures) #4140

toger5 · 2024-05-07T16:52:35Z

This could also supersede MSC2228 (by making it possible to send a redaction with the /send endpoint. This is the case as mentioned here)

Implementations:

Signed-off-by: Timo K <toger5@hotmail.de>

proposals/4140-expiring-events-with-keep-alive-endpoint.md

Signed-off-by: Timo K <toger5@hotmail.de>

…is used to trigger on of the actions Signed-off-by: Timo K <toger5@hotmail.de>

Signed-off-by: Timo K <toger5@hotmail.de>

Add event type to the body Add event id template variable

proposals/4140-delayed-events-futures.md

toger5 · 2024-05-20T17:46:39Z

proposals/4140-delayed-events-futures.md

+- One example would be redacting an event. It only makes sense to redact the event
+  if it exists.
+  It might be important to have the guarantee, that the redact is received
+  by the server at the time where the original message is sent.
+- In the case of a state event we might want to set the state to `A` and after a
+  timeout reset it to `{}`. If we have two separate request sending `A` could work
+  but the event with content `{}` could fail. The state would not automatically
+  reset to `{}`.
+
+For this usecase an optional `m.send_now` field can be added to the body.


If we have a generlized way to batch sent matrix events this could be leverged on this and a future itself is JUST the future. And the batch send event would define the semantics for sending the future and guaranteeing that the send_now event is also sent

MSC2716 allows bulk sending events.
It is limited to application services however and focuses on historic data. Since we also need the additional capability to use a template event_id parameter, this probably is not a good fit.

proposals/4140-delayed-events-futures.md

AndrewFerr · 2024-05-22T19:26:40Z

proposals/4140-delayed-events-futures.md

+To make this as generic as possible, the proposed solution is to allow sending
+multiple presigned events and delegate the control of when to actually send these
+events to an external services. This allows to a very flexible way to mark events as expired,
+since the sender can choose what event will be sent once expired.


Is it entire events that need to be signed, or just their content? If it's the former, then /send/future should behave more like #4080's /send_pdus. This could be done by having /send/future's send_* fields accept fully-signed events instead of signed content + other fields.

For this to work, there'd also need to be the modified PUT /send & PUT /state endpoints for retrieving the PDUs that need signing.

That would make the client flow of sending a Future as follows:

call PUT /send / PUT /state for each event that's to be sent in a Future (ideally, this could be batched)

sign each retrieved PDU

put each signed PDU in a request to /send/future, placing them in "send_*" fields as desired

It's also worth mentioning why we want events to be presigned in the first place (for compatibility with Crypto IDs; to ensure that Future events were truly generated by a client and not made up by the homeserver; and possibly other reasons).

call PUT /send / PUT /state for each event that's to be sent in a Future (ideally, this could be batched)
sign each retrieved PDU
put each signed PDU in a request to /send/future, placing them in "send_*" fields as desired

What should be batched is mostly the put PDU's right. Since creating the signed PDU's is okay to not be batched. The client then just needs to be sure that they have created all the events before sending the signed PDU's. if creating the PDU's fails the client just retires until it has the full list of events that need to sent (because they rely on each other)

So what really would need to be batched is the last step. Sending the signed PDU's.

Maybe its worth exploring to just introduce a new type of
PDUInfo:

{ room_version: string, via_server: string, // optional pdu: PDU // signed PDU }

If we include a timeout here + action PDU's

{ room_version: string, via_server: string, // optional timout: number, // optional future_actions:{ actionName: PDU // signed alternative PDU in case an action is trigger } future_id: randomString // optional pdu: PDU // signed PDU } response { future_tokens:{ future_id_0: token, future_id_1: ... } }

we would not even need a new endpoint. and the homeserver response would need to include tokens for the future PDU's

That's a really good idea, especially since the future-aware fields don't conflict at all with the base PDUInfo. It also allows immediate events to be sent (even several at once!), thus replacing the send_now events without needing any extra spec!

One question: what is the optional future_id in the request for?

If there are multiple futures in one send_pdus call multiple future tokens need to be issued. In the response a dictionary with future_id-> token would use the id to map all the tokens.

AndrewFerr · 2024-05-23T14:24:39Z

proposals/4140-delayed-events-futures.md

+To make this as generic as possible, the proposed solution is to allow sending
+multiple presigned events and delegate the control of when to actually send these
+events to an external services. This allows to a very flexible way to mark events as expired,
+since the sender can choose what event will be sent once expired.


It's also worth mentioning why we want events to be presigned in the first place (for compatibility with Crypto IDs; to ensure that Future events were truly generated by a client and not made up by the homeserver; and possibly other reasons).

proposals/4140-delayed-events-futures.md

Co-authored-by: Andrew Ferrazzutti <af_0_af@hotmail.com>

proposals/4140-delayed-events-futures.md

AndrewFerr · 2024-06-13T15:05:30Z

proposals/4140-delayed-events-futures.md

+  The server will respond with a [`400`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400) (`Bad Request`, with a message
+  containing the maximum allowed `timeout_duration`) if the
+  client tries to send a timeout future with a larger `timeout_duration`.
+- The future is using a group_id that belongs to a future group from another user. In this case the homeserver sends a [`405`] (`Not Allowed`).


Even better (and what I've been implementing) is for each user to get their own namespace of group IDs, i.e. for the effective group ID to be the tuple of (user ID, group ID). Benefits include:

Users cannot guess at other users' group IDs by spamming future requests and waiting to receive an error response.

More group IDs are available for each user, which is especially useful if we want to allow user-defined group IDs that could otherwise clash.

I think if the groupId is a large UUID we can just use that and synapse stores the relation between user and uuid.
But it also does not hurt to include the userId in the group id.
At the end its a hs implementation detail. The important bit is just that the homeserver makes sure its unique in its domain.

proposals/4140-delayed-events-futures.md

Co-authored-by: Andrew Ferrazzutti <af_0_af@hotmail.com>

proposals/4140-delayed-events-futures.md

and use proper msc number in unstable prefix section.

turt2live

For logging purposes: this MSC crossed the SCT's desk as potentially scary, so we've given it an early review here.

The idea of a generic mechanism to address call participant counts, scheduled messages, and self-destructing events is very much appealing. The fewer moving pieces we have to worry about in the spec, the better. This MSC appears to have 3 major concerns which could be classified as 'scary', which all have their own dedicated threads - please ensure discussion happens in those threads. The highlights are:

Self-destructing events has a metadata component which likely means it will need a dedicated MSC, despite the preference being fewer moving pieces. The user privacy concerns leading to wanting self-destructing events outweigh the idealistic genericism.
A comparison to MSC3277 is missing from this proposal. MSC3277 uses a DAG-based approach to ensure events are authorized and servers don't have to implement complicated subsystems for the scheduled messages feature.
Keep alives are unreliable and can have unexpected consequences for users and clients, particularly when a network partition causes a failed ping. For the call participant count use case in particular, the SFU(s) should already know how many connections it has and can reveal that information back to other users. Where network partitions fail the connection to the SFU, the user is dropped from the call regardless. Otherwise, temporary connection issues can ensure the user is reflected as connected with lost audio/video. This system may very well use a keep alive internally (possibly at the TCP layer), but here it would be appropriate compared to event sending.

I've also left several editorial comments to aide understanding of the MSC. I've not done a complete pass on this - these are just the more notable ones.

As always, if any of my comments require clarification or more information, let me know in the threads :)

turt2live · 2024-06-22T03:57:43Z

proposals/4140-delayed-events-futures.md

+
+- Updating call member events after the user disconnected.
+- Sending scheduled messages (send at a specific time).
+- Creating self-destructing events (by sending a delayed redact).


I don't think this proposal addresses self-destructing events in a way which is useful/safe for users. Aside from a message's content, the second most important detail users want to destroy is the metadata, which this proposal doesn't address. A self-destructing events MSC would most likely erase the event from the DAG entirely.

I'd suggest eliding this use case.

Thread for highlight 1:
I wasn't aware that it is on the table that self-destructing events would use an entirely different concept to redactions.
I was coming from: https://github.com/matrix-org/matrix-spec-proposals/blob/matthew/msc2228/proposals/2228-self-destructing-events.md
Which basically does the same thing. It redacts the event based on conditions.

What I like about this proposal, is that instead of making a custom event and add logic to create synthetic redaction we generalize the concept of event delays and everything else is a completely normal redaction.

Also

erase the event from the DAG entirely

Sounds like an anti-pattern for a distributed system. Can you guide me on where I can find information about erasing whole events from the DAG including all there metadata without federation conflicts?

proposals/4140-delayed-events-futures.md

turt2live · 2024-06-22T04:06:22Z

proposals/4140-delayed-events-futures.md

+  - Every future group needs at least one timeout future to guarantee that all future expire eventually.
+  - If a timeout future is sent without a `future_group_id` a unique identifier will be generated by the
+    homeserver and is part of the `send_future` response.
+  - Group id's can only be used by one user. Reasons for this are, that this would basically allow full control over a future group once another matrix user knows the group id. It would also require to federate futures if the users are not on the same homeserver.


As in only a single user on the homeserver can have groupA? I'm not sure there's advantage to that - we should copy the transaction IDs behaviour from the existing spec.

The group_id's are server generated UUID's, so I am not sure how this could be phrased like transaction ID's.

proposals/4140-delayed-events-futures.md

turt2live · 2024-06-22T04:58:24Z

proposals/4140-delayed-events-futures.md

+Polling based solution have a big overhead in complexity and network requests on the clients.
+Example:
+
+> A room list with 100 rooms where there has been a call before in every room
+> (or there is an ongoing call) would require the client to send a to-device message
+> (or a request to the SFU) to every user that has an active state event to check if
+> they are still online. Just to display the room tile properly.


The SFU should have a fairly good idea on how many connections it's holding, and this information can be federated when there's multiple SFUs in play. The client shouldn't need to poll for this either: it can likely subscribe directly either as a data stream, or using something like websockets. That subscription can then be used to count the number of 'active' participants.

It could theoretically mean a client connects to get information but isn't producing media, which is something the subscription stream can handle: the client can indicate (or otherwise authenticate) which other media streams it owns for the SFU to count them 'joined'.

Thread for highlight 3:

This is the most important reason for why we need a heardbeat like expiration system to make matrixRTC performant an realibale!

The main problem we solve here is that we DONT want the client to connect to each SFU all the time and we dont talk about calls this client is connected to. This might sound like a possbile thing todo but is more overhead that one would think (polling or connecting to a socket is not really making a difference here.)

Maybe a more detailed description of the current situation is required here:
Without MatrixRTC your client has no idea about any ongoing call when it starts up.
We introduced call.member state events so now we can easily read who is connected in a session in each room.
But due to the nature of state events there is no guarantee that the client will not forget to remove (set them to {}) after they disconnect. Each event a client misses to remove results in a room that looks like there is an ongoing call. So for each of these rooms the client has to:

Authenticate with the sfu (jwt service token)

Connect to the sfu wbsocket.

Check the current participants.

Invalidate all the member event locally. (Since they cannot write to those member events because its owned by another member they have to do this on every client.)
It is very easy to have such a invalid event: a user pressing ctrl + w while in call in EW is enough.
The commented section of the MSC gives an exmaple. If you have multiple rooms where there are left over member state events you need do the 3 steps for each of them individually.

Workarounds we tried:
Storing a timestamp in the state event and updating the event every 30min. This allows to compute an invalidate event without connecting to an sfu. (but only after a 30min time where you see a call that is not happening)

turt2live · 2024-06-22T05:00:31Z

proposals/4140-delayed-events-futures.md

+This would be elegant but since those two endpoint are core to Matrix, changes to them might  
+be controversial if their return value is altered.


I think this would be fine. Clients would only get a different response if they use the parameters, meaning they should be expecting a different format. It would be different if lack of the future parameters meant a different response body.

That is really nice!
It would be as you describe:
Without future parameters the response would be the same as it is now.
With the parameters the response would be different.

proposals/4140-delayed-events-futures.md

turt2live · 2024-06-22T05:12:24Z

proposals/4140-delayed-events-futures.md

+
+## Potential issues
+
+## Alternatives


This proposal's main competition appears to be MSC3277, where events are scheduled by placing them in the DAG. The homeserver then forwards the previously-created event at a set time, but it is part of the room until that point.

The DAG approach has a few advantages which I think make it more appealing:

Message delivery is not subject to a third party being online. Specifically, an appservice, SFU, or similar does not need to be looped in to ensure a message gets sent - it's the homeserver's own responsibility to stay online. We may even be able to federate the event outwards with a soft_fail_until timestamp for maximum reliability (at the cost of eager delivery and harder cancellation).

Events are known to be authorized and able to be sent because they've been given an event ID (and thus run through the auth rules).

The tracking overhead is extremely minimal: servers at most need to remember to un-soft-fail an event after a set time, but it can do that with rough precision. They do not need to track different tokens for different actions - if a user redacts a scheduled event while it's still soft failed, the send is cancelled. It knows this intrinsically.

We are very open to insert the events into the DAG when sent. This was discussed and even has a bunch of advantages (mostly that the event_id is available immediately)

The biggest reason against it was the assumption that this might be too big of a change: It adds the complexity of a state where an event is in the dag but not yet (and maybe never) is distributed to clients.

Message delivery is not subject to a third party being online. Specifically, an appservice, SFU, or similar does not need to be looped in to ensure a message gets sent - it's the homeserver's own responsibility to stay online. We may even be able to federate the event outwards with a soft_fail_until timestamp for maximum reliability (at the cost of eager delivery and harder cancellation).

The delivery does not rely on a third party. It relies on the homeservers timeout computation. (that is enforced by the future group! The example with the client that pings every 10s describes this well. The client getting disconnected and not sending the ping anymore lets the homerserver know it can deliver the event now.

To me, it seems that the least intrusive change is to not add events to the DAG and instaed just queu them on the homeserver/pretend the http request was received later. But I really like it if it's possible to add it to the DAG right after receiving the event and immediately computing an event_id.
Because Future events can also be canceled, it needs to be valid to have non sent events in the dag forever. As described in the MSC, in most cases you want to schedule multiple events in one future group and only send one of them. Adding them all to the DAG seems unnecassary.
But other than that, this sounds like a very compatible and nice solution. Do I understand it correctly that the homeserver will not send the event to federating homeservers until the timeout condition is met?

The VoIP team had a dedicated meeting to thoroughly investigate the option to use MSC3277's approach of adding the event to the DAG on retrieval.

We have created a summary document here: https://hackmd.io/h0z82KvKSaiW-jYlOnU69w?edit
(this will be posted below as well for visibility)

Reliable State events (Future MSC)

The Future MSC is not sending

Soft fail (DAG) vs Homeserver queuing (not in DAG)

MSC4140 and MSC3277

#4140 (comment)

Message delivery is not subject to a third party being online. Specifically, an appservice, SFU, or similar does not need to be looped in to ensure a message gets sent - it's the homeserver's own responsibility to stay online. We may even be able to federate the event outwards with a soft_fail_until timestamp for maximum reliability (at the cost of eager delivery and harder cancellation).

This is the case for both proposals. The delivery itself is in full control of the homeserver.

For both proposals the homserver is given additional information to the event and will make it real eventually based on those parameters.

The main differentiator is that with MSC4140 external interaction is optionally possible.
The /future/{futureToken} endpoint allows interacting with event scheduling.

This is the main feature we require to get Reliable VoIP to work.

It is not important if we add the event to the DAG immediately, but we need the scheduling interaction of /future/{futureToken}.

Events are known to be authorized and able to be sent because they've been given an event ID (and thus run through the auth rules).

We explicitly only have one auth period: at send time. While scheduled, the auth conditions can change, so the proper time to do auth computation is at send time.

With MSC3277 we would need to rerun the auth rules when sending the event.

The tracking overhead is extremely minimal: servers at most need to remember to un-soft-fail an event after a set time, but it can do that with rough precision. They do not need to track different tokens for different actions - if a user redacts a scheduled event while it's still soft failed, the send is cancelled. It knows this intrinsically.

Using this with the required interaction tokens would result in the following:

Send a scheduled event with 10s

On each refresh/reset token retrival the homserver would redact the event and send a new one (with the send_at updated by 10s)

This would become quite complicated with cryptographic identities, since the user would need sign all of them. Each reset/refresh needs a client server roundtrip for signing.

Whenever there is no reset/refresh for 10 seconds (e.g. the device has crashed) the homeserver would un-soft-fail (send) the event.

Automatically
The main issue we have with this is the large trafic that generates on the homeserver. For clients this would be equivalent (at least without cryptographic identities) but the homeserver needs to insert a new event (and send it over federation) for each refresh.

Comment:

If we delegate leaves to the SFU we would configure the timeout to a duration in the hour range. This would make the above much more bearable.

We still need to decide if we want to federate the delegation tokens. Otherwise we dont get any benefit from federating the scheduled event if we can only interact with it through the sending homeserver.

Comparison

MSC3277 MSC4140

Traffic Lots of traffic (one evet per refresh) between federating homeservers. For the client it is the same Since we do not add the event to the DAG until we know it will be sent, there is no traffic because of the scheduling.

resilience (only scheduling) Scheduled events are even sent if the sender's homeserver goes down. The sender's homeserver is required at send time to send the event. If it is down at sent time the event will not be sent until it comes back up.

resilience (with interaction) This is just not possible because sending an event (based on an interaction) before the at time requires redacting the scheduled one and creating a new one, which cannot be done by a different homeserver then the one hosting the sending user. Cannot be done. It would also require federting all the interaction tokens and the unsent events but we end up in the same scenario where a homeserver would need to send an event for a user on a different homerserver.

Federation (without interaction) Works Doesn't work

Federation (with interaction) We need interaction tokens (futureTokens) and they cannot be federated. Otherwise federating homeserver admins could interact with the Futures/scheduled messages. With interaction there is no way we can make federation work and we lose all the benefit of inserting the event into the DAG. The same, we cannot ever send futureTokens to other homeservers since they would be able to interact with the futures.

Auth Here interaction could work by redacting the scheduled event and sending a new scheduled event. This is authenticaed but only the client can send the events. If we want interaction that can be delegated to the SFU (which is the best source of truth) we would need to introduce tokens similar to futureTokens. With the futureToken based authentication (that is different to the client auth token) we have an extremely scoped auth mechanism that can be send to third parties like the SFU. The SFU can notify the homeserver about a user disconnect. The SFU is the best source of truth for that.

Tracking overhead All the schedule tracking information is stored in the DAG and a timer needs to run through all potential expired scheduled messages periadically. This has to happen on ALL homeservers. Very similar logic to the scheduled events except that the tracking information is stored in a seperate Future database only on the sender's homeserver (less overhead).

TLDR:

If we don't need interaction, there is a benefit in federating the scheduled message (adding it to the DAG immediately). It increases resiliance: the sender's homeserver can disconnect and the scheduled message still will enter non-soft-failed state (will be sent).

With delegatable interaction (which is the one property we need for reliable state events), we lose the possiblity to federate scheduled messages* and the two solutions would converge to one where we lose the property of the sender's homeserver being able to lose connection but the scheduled message still is sent on other homeservers.

* reason being that this would also require to federate (and hence leak) the interaction tokens and allowing other homeservers to interact with the future without the user explicitly giving consent by sending the tokens.)

Conclusion

We are faced with the decision between:

Do not assume that the senders homeserver stays online:

Federate the scheduled message (reslience) but loose the interaction we need for MatrixRTC reliable call member events.

Assume that the senders homeserver stays online:

If the senders homeserver is expected to stay online there is no reason to federate the scheduled/future event and we can safely add interaction which allows us to implement reliable call member events

turt2live · 2024-06-22T05:24:42Z

proposals/4140-delayed-events-futures.md

+
+This would redact the message with content: `"m.text": "my msg"` after 10minutes.
+
+## Potential issues


The keep alive approach appears to reverse the expectation of scheduled messages a bit, which I think may be one of the core concerns with this proposal. With MSC3277, senders expect that their event will go out at the time they schedule it. This proposal moves the expectation to much later in the send process, and creates an assumption that any event can be cancelled or "undone" at any point. This behaviour leads to "unexpected" consequences, because the sender was expecting to be an opportunity for their event to never send.

It's a bit subtle, but that expectation changing I think makes MSC3277 more favourable. MSC3277 doesn't really help with the call participant count problem though. For that, I think the SFU can likely count its connections more reliably than a keep alive (a single network partition leads to a false count). This is discussed in more detail in the 'MatrixRTC use case' thread I've started.

The core assumption of the VoIP when writing this MSC is that it is of essence that syncing room state is enough to compute all ongoing MatrixRTC sessions. With the reasons described here: #4140 (comment)

So It seems we have the following two scenarios:

We go with a SFU based state event validation approach (each client requests a token and then logs into the SFU for each SFU mentioned in any call member event it encounters and verifies its validity (if the user is still connected to the session))

We want the memberships/sessions to be reliably represented in the room state.

If we decide for 1. we really don't need any of the MSC's and I can see how a static timeout as described in MSC3277 could be a solution. Even though no interaction to the scheduling definitively limits its use-cases.

But the experience we made during a whole year concluded that we should make the state events reliable.
(Other logic: historic session computation, changes in call realted UX, building matrixRTC sdk's all are super hard to implement and involve a a lot of duct tape if we cannot trust member state events but have to check the SFU all the time. It also means we can only validate state events in real time but never tell if a state event was invalid in the past.)

Co-authored-by: Travis Ralston <travisr@matrix.org>

…ain proposal of `send_future` and `state_future`.

toger5 added 2 commits May 7, 2024 18:52

draft for expiring event PR

480b00a

Signed-off-by: Timo K <toger5@hotmail.de>

Add msc number

8839b8d

Signed-off-by: Timo K <toger5@hotmail.de>

toger5 force-pushed the toger5/expiring-events-keep-alive branch from 2bc07c4 to 0eb1abc Compare May 7, 2024 17:03

add security consideration and alternatives

8bf6db7

Signed-off-by: Timo K <toger5@hotmail.de>

toger5 force-pushed the toger5/expiring-events-keep-alive branch from 0eb1abc to 8bf6db7 Compare May 8, 2024 15:49

alternative name and alternative content

8ec6374

Signed-off-by: Timo K <toger5@hotmail.de>

AndrewFerr reviewed May 8, 2024

View reviewed changes

review andrewF

9f45cfa

Signed-off-by: Timo K <toger5@hotmail.de>

turt2live changed the title ~~Draft for expiring event PR~~ MSC4140: Expiring events with keep alive endpoint May 9, 2024

turt2live added proposal A matrix spec change proposal client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels May 9, 2024

toger5 force-pushed the toger5/expiring-events-keep-alive branch from 3e54c2a to c82adf7 Compare May 10, 2024 17:54

draft of iteration two (after meeting with the backend team)

54fff99

Signed-off-by: Timo K <toger5@hotmail.de>

toger5 force-pushed the toger5/expiring-events-keep-alive branch from c82adf7 to 54fff99 Compare May 10, 2024 18:08

toger5 added 3 commits May 13, 2024 16:56

timeout_refresh_token is not a well description since the same token …

abdfe1c

…is used to trigger on of the actions Signed-off-by: Timo K <toger5@hotmail.de>

rename msc, rephrase introduction

53f6186

Signed-off-by: Timo K <toger5@hotmail.de>

Add usecase specific section.

087c74e

Add event type to the body Add event id template variable

MatMaul reviewed May 20, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

add GET futures endpoint

538b853

toger5 commented May 20, 2024

View reviewed changes

toger5 added 2 commits May 21, 2024 15:13

shorten introduction

f7a1aad

add alternative section to not include the m.send_now field

f5f4b38

AndrewFerr mentioned this pull request May 22, 2024

MSC3757: Restricting who can overwrite a state event #3757

Open

AndrewFerr reviewed May 22, 2024

View reviewed changes

AndrewFerr reviewed May 23, 2024

View reviewed changes

AndrewFerr reviewed May 30, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

toger5 and others added 2 commits May 31, 2024 09:20

Update proposals/4140-delayed-events-futures.md

c16afbc

Co-authored-by: Andrew Ferrazzutti <af_0_af@hotmail.com>

batch sending considerations

c52c80d

fkwp reviewed Jun 6, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

make future_group_id server generated and small adjustments

0a7896e

fkwp reviewed Jun 6, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

fkwp reviewed Jun 6, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

AndrewFerr reviewed Jun 6, 2024

View reviewed changes

review

8fa33d6

fkwp reviewed Jun 6, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Show resolved Hide resolved

AndrewFerr reviewed Jun 12, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

AndrewFerr reviewed Jun 12, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

user scoping details

49d5294

toger5 force-pushed the toger5/expiring-events-keep-alive branch from 28ddfbb to 49d5294 Compare June 13, 2024 14:56

AndrewFerr reviewed Jun 13, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

toger5 and others added 2 commits June 13, 2024 23:55

Update proposals/4140-delayed-events-futures.md

7550d9b

Co-authored-by: Andrew Ferrazzutti <af_0_af@hotmail.com>

add rate limiting section

a663bb4

toger5 force-pushed the toger5/expiring-events-keep-alive branch from 828486e to a663bb4 Compare June 14, 2024 11:47

toger5 marked this pull request as ready for review June 14, 2024 11:53

toger5 changed the title ~~MSC4140: Expiring events with keep alive endpoint~~ MSC4140: Delayed events (Futures) Jun 18, 2024

toger5 mentioned this pull request Jun 18, 2024

Future Events: Add future endpoints to ruma. ruma/ruma#1845

Merged

AndrewFerr mentioned this pull request Jun 18, 2024

Support MSC4140: Delayed events (Futures) element-hq/synapse#17326

Draft

3 tasks

zecakeh reviewed Jun 19, 2024

View reviewed changes

proposals/4140-delayed-events-futures.md Outdated Show resolved Hide resolved

rename /futures to /future

99b3a20

turt2live self-requested a review June 19, 2024 22:24

Update everything to v1

eb50a19

and use proper msc number in unstable prefix section.

turt2live requested changes Jun 22, 2024

View reviewed changes

toger5 and others added 4 commits June 22, 2024 08:49

Update proposals/4140-delayed-events-futures.md

9ff051e

Co-authored-by: Travis Ralston <travisr@matrix.org>

review

425b9bf

Swap the alternative of reusing the send and state request with the m…

2e7be46

…ain proposal of `send_future` and `state_future`.

add reference to MSC4143 (MatrixRTC)

5653fe1

toger5 mentioned this pull request Jul 5, 2024

Support for sending futures events through the widget api. matrix-org/matrix-rust-sdk#3600

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSC4140: Delayed events (Futures) #4140

MSC4140: Delayed events (Futures) #4140

toger5 commented May 7, 2024 •

edited

Loading

toger5 May 20, 2024

toger5 May 21, 2024 •

edited

Loading

AndrewFerr May 22, 2024

AndrewFerr May 23, 2024

toger5 May 24, 2024

toger5 May 24, 2024

AndrewFerr May 24, 2024

toger5 May 24, 2024

AndrewFerr May 23, 2024

AndrewFerr Jun 13, 2024

toger5 Jun 13, 2024

turt2live left a comment

turt2live Jun 22, 2024

toger5 Jun 24, 2024

turt2live Jun 22, 2024

toger5 Jun 24, 2024

turt2live Jun 22, 2024

toger5 Jun 24, 2024

turt2live Jun 22, 2024

toger5 Jun 24, 2024

turt2live Jun 22, 2024

toger5 Jun 24, 2024

toger5 Jun 25, 2024

toger5 Jun 25, 2024

turt2live Jun 22, 2024

toger5 Jun 24, 2024 •

edited

Loading

		This would be elegant but since those two endpoint are core to Matrix, changes to them might
		be controversial if their return value is altered.

	MSC3277	MSC4140
Traffic	Lots of traffic (one evet per refresh) between federating homeservers. For the client it is the same	Since we do not add the event to the DAG until we know it will be sent, there is no traffic because of the scheduling.
resilience (only scheduling)	Scheduled events are even sent if the sender's homeserver goes down.	The sender's homeserver is required at send time to send the event. If it is down at sent time the event will not be sent until it comes back up.
resilience (with interaction)	This is just not possible because sending an event (based on an interaction) before the `at` time requires redacting the scheduled one and creating a new one, which cannot be done by a different homeserver then the one hosting the sending user.	Cannot be done. It would also require federting all the interaction tokens and the unsent events but we end up in the same scenario where a homeserver would need to send an event for a user on a different homerserver.
Federation (without interaction)	Works	Doesn't work
Federation (with interaction)	We need interaction tokens (`futureTokens`) and they cannot be federated. Otherwise federating homeserver admins could interact with the Futures/scheduled messages. With interaction there is no way we can make federation work and we lose all the benefit of inserting the event into the DAG.	The same, we cannot ever send `futureTokens` to other homeservers since they would be able to interact with the futures.
Auth	Here interaction could work by redacting the scheduled event and sending a new scheduled event. This is authenticaed but only the client can send the events. If we want interaction that can be delegated to the SFU (which is the best source of truth) we would need to introduce tokens similar to `futureTokens`.	With the `futureToken` based authentication (that is different to the client auth token) we have an extremely scoped auth mechanism that can be send to third parties like the SFU. The SFU can notify the homeserver about a user disconnect. The SFU is the best source of truth for that.
Tracking overhead	All the schedule tracking information is stored in the DAG and a timer needs to run through all potential expired scheduled messages periadically. This has to happen on ALL homeservers.	Very similar logic to the scheduled events except that the tracking information is stored in a seperate Future database only on the sender's homeserver (less overhead).


		This would redact the message with content: `"m.text": "my msg"` after 10minutes.

		## Potential issues

MSC4140: Delayed events (Futures) #4140

Are you sure you want to change the base?

MSC4140: Delayed events (Futures) #4140

Conversation

toger5 commented May 7, 2024 • edited Loading

Choose a reason for hiding this comment

toger5 May 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

turt2live left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Reliable State events (Future MSC)

Soft fail (DAG) vs Homeserver queuing (not in DAG)

TLDR:

Conclusion

Choose a reason for hiding this comment

toger5 Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

toger5 commented May 7, 2024 •

edited

Loading

toger5 May 21, 2024 •

edited

Loading

toger5 Jun 24, 2024 •

edited

Loading