Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vhost-device-can: Add initial implementation #602

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TimosAmpel
Copy link

Summary of the PR

This PR introduces an initial implementation of vhost-device-can device.
The development of this device was performed by Virtual Open Systems in the context
of the Automotive Grade Linux SDV-EG (Software Defined Vehicle expert group)
AGL Native VIRTIO project. Part of this effort has already been upstreamed
(see AGL Gerrit can patch).

The new device takes as arguments two CAN/FD interfaces ("can_out", "can_in")
and forward the traffic between those, and virtio-can driver (virtio-can RFC).
Since there is no upstream implementation of any vhost-user-can frontend in QEMU,
the current version has only been tested with virtio-loopback design.

Virtio-loopback architecture aims to create a new HAL (hardware abstraction layer),
designed for non-Hypervisor environments based on virtio. The main objective
is to enable virtio drivers communicate with vhost-user devices without emulation
or virtualization being involved. This technology has also been upstreamed
into AGL gerrit repositories.

More information about virtio-loopback can be found in the links below:

Since CAN device is a crucial component in automotive, we would like
to present this work to the community, ask for comments and potentially
merge this work under vhost-device repository.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR are signed (with git commit -s), and the commit
    message has max 60 characters for the summary and max 75 characters for each
    description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

@MatiasVara
Copy link
Contributor

Hello, the device is based on https://github.com/OpenSynergy/qemu/tree/virtio-can-spec-rfc-v3 ?

@TimosAmpel
Copy link
Author

Hello @MatiasVara,

Yes, a part of vhost-device-can was based on hw/net/can/can_virtio.c file
presented in https://github.com/OpenSynergy/qemu/tree/virtio-can-spec-rfc-v3.
Mainly, vhost-device-can integrates the general communication logic
and CAN related checks.

@TimosAmpel
Copy link
Author

Hello @MatiasVara and @stsquad ,

A new version of vhost-device-can has been pushed, tested with QEMU's
"vhost-user-device" as suggested:

qemu-system-x86_64  \
        -m 4096 \
        -numa node,memdev=mem \
        -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
        -chardev socket,id=char1,path=/tmp/can.sock \                                                                                   
        -device vhost-user-device-pci,chardev=char1,virtio-id=36,num_vqs=3,config_size=16 \

Alternatively, testing can be performed with new QEMU vhost-user-can
front-end device based on vhost-user-device which can be found on this
repository.

More about, known limitations, test examples and future work can be found in
the README file.

@MatiasVara
Copy link
Contributor

Thanks, I will try to review this soon.

@TimosAmpel TimosAmpel force-pushed the vhost-device-can-rfc branch 2 times, most recently from f12a7c5 to 12f190b Compare February 14, 2024 10:31
@TimosAmpel TimosAmpel force-pushed the vhost-device-can-rfc branch 2 times, most recently from e7e69bf to 934bdc6 Compare February 15, 2024 14:29
@MatiasVara
Copy link
Contributor

Hello, I think you can fix some minor issues by running cargo clippy.

@stefano-garzarella
Copy link
Member

Hello, I think you can fix some minor issues by running cargo clippy.

Yes please, @TimosAmpel please check the CI, it is green, because staging device issues will not block our CI, but if you go inside you can see how staging CI is failing: https://buildkite.com/rust-vmm/vhost-device-ci/builds/2234

@MatiasVara
Copy link
Contributor

I am probably missing something but I tried sending frames from the host to guest and I observe that, when sending two frames, the first is displayed only after I have sent the second one. To try it, you can run candump any in the guest and then cansend vcan0 123#1122334455667788 two times. You will observe that the first frame is only displayed together with the second. I am not sure if both frames arrive at the time.

@TimosAmpel
Copy link
Author

I am probably missing something but I tried sending frames from the host to guest and I observe that, when sending two frames, the first is displayed only after I have sent the second one. To try it, you can run candump any in the guest and then cansend vcan0 123#1122334455667788 two times. You will observe that the first frame is only displayed together with the second. I am not sure if both frames arrive at the time.

That's a bug! Thanks to your description it was easy to solve it. Based on my testing, now it should work fine.

@stefano-garzarella
Copy link
Member

@TimosAmpel can you rebase this PR, so next week I'll review it again ;-)

Please also resolve conversations that you fixed, so it's easy for reviewers.

@TimosAmpel TimosAmpel force-pushed the vhost-device-can-rfc branch 2 times, most recently from 5008a2a to 8f83dd4 Compare May 31, 2024 09:24
@TimosAmpel
Copy link
Author

@TimosAmpel can you rebase this PR, so next week I'll review it again ;-)

Please also resolve conversations that you fixed, so it's easy for reviewers.

Hello @stefano-garzarella, that's great news, thanks!
I have rebased the code to the latest main branch and resolved any old conversations.
Also, I have left the commit history, as it was evolved during the review process, at a certain point I can squash everything into one commit. Let me know if you want me to proceed with that.

@stefano-garzarella
Copy link
Member

@TimosAmpel great! Yes, please, since it's not Draft anymore, I'd suggest to always have the final version we want to merge. So I suggest having a single commit, or dived by topics/features, as you prefer.

@TimosAmpel
Copy link
Author

@TimosAmpel great! Yes, please, since it's not Draft anymore, I'd suggest to always have the final version we want to merge. So I suggest having a single commit, or dived by topics/features, as you prefer.

@stefano-garzarella thanks, the commits have been squashed into a single one!

if self.controller.write().unwrap().ctrl_state == CAN_CS_STARTED {
Ok(VIRTIO_CAN_RESULT_NOT_OK)
} else {
self.controller.write().unwrap().busoff = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to change the busoff here? The spec states

The bit VIRTIO_CAN_S_CTRL_BUSOFF in status is used to indicate the unsolicited CAN controller state
change from started to stopped due to a detected bus off condition.

So I think the switching to False shall happen independent of the driver although I do not know when or how.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that was similar to what you wrote below, I tried to reply to all of them together in the next thread.

pub(crate) fn config(&mut self) -> &VirtioCanConfig {
trace!("Get config\n");
if self.busoff {
self.config.status = VIRTIO_CAN_S_CTRL_BUSOFF.into();
Copy link
Contributor

@MatiasVara MatiasVara Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec only defines the VIRTIO_CAN_S_CTRL_BUSOFF bit, however, I do not think that indicates the status of the device but only:

The bit VIRTIO_CAN_S_CTRL_BUSOFF in status is used to indicate the unsolicited CAN controller state
change from started to stopped due to a detected bus off condition.

I was wondering if we do not require another bit to indicate the status.

Copy link
Contributor

@MatiasVara MatiasVara Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec states (section 5.20.5.1):

If a status update is necessary, the device updates the configuration status before marking the
request used. As the configuration status change is caused by a request from the driver the device is
allowed to omit the configuration change notification here.

But I do not understand what it means with status update because the only defined bit in the status is the busoff but I do not think it means changing it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Matias, since that busoff story is not crystal clear to me also through
the virtio-spec, I followed the implementation of virtio-can device provided by
OpenSynergy here. I believe that is safe to do since 2 out of 3 contributor to
virtio-can-spec (Harald Mommer, Mikhail Golubev-Ciuchea) are also the main
contributors on the virtio-can device on QEMU.

Having that in mind, I will try first to provide my interpretation of the
virtio-spec and then answer in-line to your comments.

Virtio-spec useful points:
a)

The bit VIRTIO_CAN_S_CTRL_BUSOFF in status is used to indicate the unsolicited
CAN controller state change from started to stopped due to a detected bus off
condition.

b)

There are certain error conditions so that the physical CAN controller
has to stop participating in CAN communication on the bus. If such an
error condition occurs the device informs the driver about the
unsolicited CAN controller state change by setting the
VIRTIO_CAN_S_CTRL_BUSOFF bit in the configuration \field{status} field.

After bus-off detection the CAN controller is in STOPPED state. The CAN
controller does not participate in bus communication any more so all CAN
messages pending for transmission are put into the used queue with
\field{result} VIRTIO_CAN_RESULT_NOT_OK.

Based on points a), b) and also to the implementation of virtio-can-device in
QEMU provided by Opensynergy (here), I assume that:

  1. The referenced CAN controller is part of the vhost-user-can device and
    responsible of communicating with the CAN bus located on the host.
  2. The VIRTIO_CAN_S_CTRL_BUSOFF is set by the vhost-user-can device and
    propagated to the virtio-can driver when the nested CAN controller
    changes its status from STARTED to STOPPED unsolicitedly.
    • This CAN controller status change can occur in cases like this,
      when the CAN message received by the underlying HW, fulfils the condition
      (can_id & CAN_ERR_FLAG) & (can_id & CAN_ERR_BUSOFF).

In-line comments:

I was wondering if we do not require another bit to indicate the status.

That indeed. I kept the variable busoff to be aligned with Opensynergy's
implementation of virtio-can device here but as you correctly noticed,
practically we can directly modify the config.status variable. The new
implementation includes that modification.

Is it OK to change the busoff here? The spec states

The bit VIRTIO_CAN_S_CTRL_BUSOFF in status is used to indicate the unsolicited CAN controller state
change from started to stopped due to a detected bus off condition.

So I think the switching to False shall happen independent of the driver although I do not know when or how.

The spec states (section 5.20.5.1):

If a status update is necessary, the device updates the configuration status before marking the
request used. As the configuration status change is caused by a request from the driver the device is
allowed to omit the configuration change notification here.

But I do not understand what it means with status update because the only defined bit in the status is the busoff but I do not think it means changing it.

Based on the assumptions 1), 2), I think that since the vhost-user-can device
has the "authority" to change the config.status to VIRTIO_CAN_S_CTRL_BUSOFF,
it could also reset it to the default value when control messages are received.
That gives the opportunity to the user retry sending CAN messages after
performing ip link set down can0 && ip link set up can0 on the VM side and
check if the previous BUSOFF error is still being received or was a one time error.
The same approach is followed by the virtio-can device in QEMU (here and here).

Please let me know what you think of the above, and if you have something else
in mind on how to treat that BUSOFF case.

Last point on my side, is that I have not yet found an obvious way, to trigger
a notification from the device side, when the configuration has been altered.
For the time being, I will keep a note in the code and we can fill that later,
since, practically, that does not have any functional impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation! It seems that the BUSOFF bit in the status is set to indicate the unsolicited CAN controller state change from started to stopped due to a detected bus off condition. However, I do not know what shall happen if the driver sends a VIRTIO_CAN_SET_CTRL_MODE_START and the BUSOFF bit is set. The current implementation and the QEMU's one just clear the BUSOFF bit when getting VIRTIO_CAN_SET_CTRL_MODE_START. Since the spec is not specific about this, I would leave it as it is.

trace!("Ctrl queue: msg type 0x{:?} unknown", request.msg_type);
Err(Error::HandleCtrlUnknown)
}
}?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if, instead of returning with err, we should log the err, and then, write some response, e.g., VIRTIO_CAN_RESULT_NOT_OK. I do not know if it is correct to not return the descriptor chain that trigger the issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if, instead of returning with err, we should log the err, and then, write some response, e.g., VIRTIO_CAN_RESULT_NOT_OK.

Fully agree on that, I have already added it. I think that we might not need to return the descriptor chain that triggers the issue, since that is triggered always by the "CTRL_QUEUE".

@TimosAmpel TimosAmpel force-pushed the vhost-device-can-rfc branch 2 times, most recently from ba180e2 to 7d37146 Compare July 5, 2024 16:13
// If no just drop the message, otherwise update config and return.
if (response.can_id.to_native() & CAN_ERR_FLAG) != 0 {
if (response.can_id.to_native() & CAN_ERR_BUSOFF) != 0 {
// TODO: Trigger a config_change notification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, we should notify the driver that the configuration has changed, right? How do other rust-vmm devices have handled this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. I did search of it, it's possible that I missed something but I didn't find any device of the existing ones notifying back the driver when changing the configuration. Although, I have found the relevant message that the vhost-devices should send via the socket "CONFIG_CHANGE_MSG" when a config change takes place but I didn't find any function on the backend side to trigger this message transmission.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it as a comment and then try to fix it later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, let's proceed like that

if self.controller.write().unwrap().ctrl_state == CAN_CS_STARTED {
VIRTIO_CAN_RESULT_NOT_OK
} else {
// TODO: Trigger a config_change notification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the spec, the device may omit the change notification (section 5.20.5.1):

As the configuration status change is caused by a request from the driver the device is
allowed to omit the configuration change notification here. The device marks the request used when the
CAN controller has finalized the transition to the requested controller mode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's correct, so in the control queue case the code can stay as it is. In the rest of the cases, where the device changes the config status to BUSOFF unsolicitedly, I think we should trigger that notification.

This implementation supports the exchange of CAN/CANFD messages
virtual ("vcan") and real HW CAN device ("can0"). It was tested with:
a) virtio-can driver found in the following patch serie:
    - https://lwn.net/Articles/934187/

b) QEMU's vhost-user-can device:
    1) Option 1: Upstream QEMU's vhost-user-device

        qemu-system-x86_64  \
                -m 4096 \
                -numa node,memdev=mem \
                -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
                -chardev socket,id=char1,path=/tmp/can.sock \
                -device vhost-user-device-pci,chardev=char1,virtio-id=36,num_vqs=3,config_size=16 \
                ...

    2) Option 2: A new QEMU vhost-user-can device can be found in the
       following repo:
        - https://github.com/virtualopensystems/qemu/tree/vhu-can-rfc

For more information, please check the README.md file under
"staging/vhost-device-can/".

Signed-off-by: Timos Ampelikiotis <t.ampelikiotis@virtualopensystems.com>
@MatiasVara
Copy link
Contributor

MatiasVara commented Jul 8, 2024

Maybe this implementation should take into account this https://lore.kernel.org/virtio-comment/ZovA5gwG+rK0cRT5@fedora/T/#u once it is acked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants