You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the logs uploaded to Discord by @tschak909, there are following 2 cases of the out of sequence problems:
17:42:21.704 > FileHandlerLocal::seek
17:42:23.705 > Timeout after 2000 milliseconds. Retrying
17:42:23.708 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 14, Expected: 15
17:42:23.714 > Timeout after 2000 milliseconds. Retrying
17:42:24.701 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 15, Expected: 14
17:42:26.700 > Timeout after 2000 milliseconds. Retrying
17:42:27.702 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 14, Expected: 15
17:42:27.742 > FileHandlerLocal::read
and
17:47:21.652 > sending block packet ...
17:47:21.681 > Command Packet:
17:47:21.681 > fc ff ff ff ff c3 81 80 80 80 80 82 81 80 81 83 b0 80 d2 a2 f3 80 88 80 ab ef c8 cc
17:47:21.682 > Decoding 9 bytes
17:47:21.682 > handling read block command
17:47:21.683 > Drive 81 Read block 0073a2
17:47:21.683 > iwm_readblock NORMAL READ
17:47:21.686 > FileHandlerLocal::seek
17:47:21.735 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 90, Expected: 91
17:47:21.735 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 3d, Expected: 91
17:47:21.736 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 3f, Expected: 91
17:47:21.738 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 0, Expected: 92
17:47:21.739 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 0, Expected: 91
17:47:21.740 > TNFS OUT OF ORDER SEQUENCE! Rcvd: ee, Expected: 92
17:47:21.740 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 0, Expected: 91
17:47:21.742 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 0, Expected: 92
17:47:21.743 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 48, Expected: 91
17:47:21.744 > TNFS OUT OF ORDER SEQUENCE! Rcvd: e5, Expected: 92
17:47:21.746 > TNFS OUT OF ORDER SEQUENCE! Rcvd: d, Expected: 91
17:47:21.746 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 75, Expected: 92
...
In the first case it seems that the session was ultimately recovered, in the second not really.
In both cases, the same packet is received twice, while the client is expecting the next one already:
17:42:23.708 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 14, Expected: 15
17:47:21.735 > TNFS OUT OF ORDER SEQUENCE! Rcvd: 90, Expected: 91
I'm not sure what's the reason of double delivery, but the firmware should be able to retry the request and deal with this problem. However, it seems it can't do it. There are two issues here.
Root cause - sending response as a request
Let's take a look on the code validating the sequence number:
bool sent = _tnfs_send(&udp, m_info, pkt, payload_size);
Please notice, that we're sending the pkt here. However, it's no longer the request packet. It's been overwritten by the tcp->recv() and now it's the response packet, the one with the wrong sequence number.
To fix this, we should keep request and response separately. Also, we should go immediately to send() if the received response is wrong.
Room for improvement - exhausting buffer on wrong response
The bug above explains the first log report. The second report is caused by a different issue. Apparently server sends a large response with a wrong number. The client reads it chunk by chunk and treats each chunk as a atomic response. This causes the random Rcvd numbers - these are not really sequence numbers, but random data from the middle of the data block.
To make this better, we may try to read all pending data from the TCP connection on the wrong sequence number and then send the original request again (see the description of the bug).
Tech debt - method complexity
_tnfs_transaction method is very complex, over 140 lines with two nested loops and multiple if/else. To make it easier to reason about and avoid bugs as above, we should split it into smaller chunks.
Plan
So, we need to following things:
Split the _tnfs_transaction into smaller methods.
On the wrong sequence number
exhaust buffer,
send the request again,
make sure we don't send response as a request.
The text was updated successfully, but these errors were encountered:
In the logs uploaded to Discord by @tschak909, there are following 2 cases of the out of sequence problems:
and
In the first case it seems that the session was ultimately recovered, in the second not really.
In both cases, the same packet is received twice, while the client is expecting the next one already:
I'm not sure what's the reason of double delivery, but the firmware should be able to retry the request and deal with this problem. However, it seems it can't do it. There are two issues here.
Root cause - sending response as a request
Let's take a look on the code validating the sequence number:
fujinet-firmware/lib/TNFSlib/tnfslib.cpp
Lines 1384 to 1389 in 1b0c424
this causes the fall through to the:
fujinet-firmware/lib/TNFSlib/tnfslib.cpp
Lines 1360 to 1371 in 1b0c424
unless the recv() doesn't return anything and the timeout occurs, then it logs
and goes to:
fujinet-firmware/lib/TNFSlib/tnfslib.cpp
Lines 1339 to 1346 in 1b0c424
Please notice, that we're sending the
pkt
here. However, it's no longer the request packet. It's been overwritten by thetcp->recv()
and now it's the response packet, the one with the wrong sequence number.To fix this, we should keep request and response separately. Also, we should go immediately to send() if the received response is wrong.
Room for improvement - exhausting buffer on wrong response
The bug above explains the first log report. The second report is caused by a different issue. Apparently server sends a large response with a wrong number. The client reads it chunk by chunk and treats each chunk as a atomic response. This causes the random
Rcvd
numbers - these are not really sequence numbers, but random data from the middle of the data block.To make this better, we may try to read all pending data from the TCP connection on the wrong sequence number and then send the original request again (see the description of the bug).
Tech debt - method complexity
_tnfs_transaction
method is very complex, over 140 lines with two nested loops and multiple if/else. To make it easier to reason about and avoid bugs as above, we should split it into smaller chunks.Plan
So, we need to following things:
_tnfs_transaction
into smaller methods.The text was updated successfully, but these errors were encountered: