[OpenAMP-RP][RFC] Enable RPMsg endpoint recovery on existing RPMsg channel

List overview All Threads
Download

newer

older

[OpenAMP/open-amp] 8c93bf: apps:...

[OpenAMP/open-amp] 3ff669: rpmsg:...

ben levinsky

20 Apr 2022 20 Apr '22

10:28 p.m.

Hi All,

As stated in a previous OpenAMP meeting, there was the use case of either the remote or host attempting to re-establish connection if one side goes down.

A proposal for this is as follows:

Changes to existing structures:

1.Add a new feature to the bitmap for VirtIO RPMsg with a name like "VIRTIO_RPMSG_F_RECOVERY" and an extra bit here

2. Add a 2 bit field to struct rpmsg_device and struct fw_rsc_vdev to denote if recovery functionality is supported and/or active that can have the following 3 values:

- 0 - recovery not supported

- 1 - recovery supported but RPMsg endpoint reconnection is not yet allowed.

- 2 - recovery supported and RPMsg endpoint reconnection is allowed.

Changes to host's resource table parsing:

When a host parses the resource table of the remote processor, check if recovery feature is set and update the vdev's features accordingly.

Changes to host's endpoint creation:

1. First setup, set struct fw_rsc_vdev reovery flag to "recovery supported but reconnection is not yet allowed"

2. After initial endpoint creation is successful, set recovery flag in resource table's instance of the struct fw_rsc_vdev to "reconnection allowed" and set resource table's copy of the rpmsg_vdev so that the recovery field denotes reconnection is allowed

3. If recovery is fully setup then upon subsequent host endpoint creation

a. host will not re-initialize virtqueues

b. if rpmsg address is already set, do not error out

c. unlock rdev virtqueue lock so that messages can send again

Changes to remote's endpoint creation:

2. If recovery is fully setup then the following changes apply:

a. remote will not wait for virtqueues' initialization by host

b. if rpmsg address is already set, do not error out

c. unlock rdev virtqueue lock so that messages can send again

Changes to rpmsg-send:

None. The virtqueue lock is opened upon endpoint re-connection.

Kind Regards,

Ben

Hi All,

As stated in a previous OpenAMP meeting, there was the use case of either the remote or host attempting to re-establish connection if one side goes down.

A proposal for this is as follows:

Changes to existing structures:

1.Add a new feature to the bitmap for VirtIO RPMsg with a name like "VIRTIO_RPMSG_F_RECOVERY" and an extra bit here

2. Add a 2 bit field to struct rpmsg_device and struct fw_rsc_vdev to denote if recovery functionality is supported and/or active that can have the following 3 values:

- 0 - recovery not supported

- 1 - recovery supported but RPMsg endpoint reconnection is not yet allowed.

- 2 - recovery supported and RPMsg endpoint reconnection is allowed.

Changes to host's resource table parsing:

When a host parses the resource table of the remote processor, check if recovery feature is set and update the vdev's features accordingly.

Changes to host's endpoint creation:

1. First setup, set struct fw_rsc_vdev reovery flag to "recovery supported but reconnection is not yet allowed"

3. If recovery is fully setup then upon subsequent host endpoint creation

a. host will not re-initialize virtqueues

b. if rpmsg address is already set, do not error out

c. unlock rdev virtqueue lock so that messages can send again

Changes to remote's endpoint creation:

2. If recovery is fully setup then the following changes apply:

a. remote will not wait for virtqueues' initialization by host

b. if rpmsg address is already set, do not error out

c. unlock rdev virtqueue lock so that messages can send again

Changes to rpmsg-send:

None. The virtqueue lock is opened upon endpoint re-connection.

Kind Regards,

Ben

Show replies by date

Arnaud POULIQUEN

21 Apr 21 Apr

2:43 p.m.

Hi Ben,

Thanks for your description! Please find in line few questions /comments based on my first understanding.

ST Restricted

...

-----Original Message----- From: ben levinsky via Openamp-rp <openamp- rp@lists.openampproject.org> Sent: jeudi 21 avril 2022 00:29 To: openamp-rp@lists.openampproject.org Subject: [Openamp-rp] [OpenAMP-RP][RFC] Enable RPMsg endpoint recovery on existing RPMsg channel

Hi All,

As stated in a previous OpenAMP meeting, there was the use case of either the remote or host attempting to re-establish connection if one side goes down.

A proposal for this is as follows:

Changes to existing structures:

1.Add a new feature to the bitmap for VirtIO RPMsg with a name like "VIRTIO_RPMSG_F_RECOVERY" and an extra bit here

Add a 2 bit field to struct rpmsg_device and struct fw_rsc_vdev to

denote if recovery functionality is supported and/or active that can have the following

Changing fw_rsc_vdev struct probably means resource table Version 2

...

3 values:

0 - recovery not supported

1 - recovery supported but RPMsg endpoint reconnection is not yet

allowed.

2 - recovery supported and RPMsg endpoint reconnection is allowed.

Changes to host's resource table parsing:

When a host parses the resource table of the remote processor, check if recovery feature is set and update the vdev's features accordingly.

Changes to host's endpoint creation:

First setup, set struct fw_rsc_vdev reovery flag to "recovery

supported but reconnection is not yet allowed"

After initial endpoint creation is successful, set recovery flag in

resource table's instance of the struct fw_rsc_vdev to "reconnection allowed" and set resource table's copy of the rpmsg_vdev so that the recovery field denotes reconnection is allowed

If recovery is fully setup then upon subsequent host endpoint

creation
            a. host will not re-initialize virtqueues

Virtqueue is a local structure that can be dynamically allocated, how to preserve it? you also need to reparse the resources table to remap the shared resource and the rpmsg buffers, Right?

...

            b. if rpmsg address is already set, do not error out

No clear, could you detail this point? Which layer oversees storing local endpoint before the crash and where to store them?

...

            c. unlock rdev virtqueue lock so that messages can

send again

- When virtqueue lock occurs ? - How do you differentiate channels that will need to be reinitialized with ones that can re-connect? - how do you manage queued RPMSg buffers for reinitialization vs re-connection - how to handle Linux mechanism based on NS announcement to probe the rpmsg device? - how to release RPMsg buffer gotten from the vring used or available list just before the crash.

Regarding your description I also wonder if there is a synergy with the flow control signaling [1].

[1] https://lkml.org/lkml/2022/1/18/867

Thanks, Arnaud

...

Changes to remote's endpoint creation:
If recovery is fully setup then the following changes apply:
         a. remote will not wait for virtqueues' initialization 
by host
            b. if rpmsg address is already set, do not error out

            c. unlock rdev virtqueue lock so that messages can 
send again

Changes to rpmsg-send:

None. The virtqueue lock is opened upon endpoint re-connection.

Kind Regards,

Ben

Ed Mooring

3 May 3 May

5:47 a.m.

Thanks for framing the problem clearly.

I’m top-posting this because I have a few high-level comments.

First, the clean restart process would be a lot simpler if our virtio transport layer supported Virtio v1.2’s vring reset mechanism. There has been a recent effort to provide this for vring PCI in the Linux kernel ([PATCH v7 00/26] virtio pci support VIRTIO_F_RING_RESET https://lore.kernel.org/all/20220308123518.33800-1-xuanzhuo@linux.alibaba.com/#r), but I don’t know how far along the path to merging it is. The Virtio 1.2 spec appears to have been in the very late stages of approval for some time.

Second, RPMsg over virtio is a different kind of device from any of the other devices in the virtio spec. It’s like MMIO, except that config space writes are not automatically caught, and the remoteproc framework adds some wrinkles. If someone is VERY patient, it might be nice to get an RPMsg device section into the next iteration of the spec.

Third, this needs to be “reversible”. The driver/device connection is inherently asymmetric, and the Virtio spec, and the Linux implementation tend to assume that they are the driver side. Some of our use cases need the MCU to be the driver and the rich OS to be the device. It appears there is some attention being paid to this in the system reference work, but I’m not following that very closely right now.

Fourth, the RPMsg protocol itself needs to be enhanced to support things like informing the sender that the receiver has been reset. Doing this in a backwards-compatible fashion will be interesting.

We need to think about all of these things to construct a more robust communications infrastructure.

Regards, Ed M

...

On Apr 20, 2022, at 3:28 PM, ben levinsky via Openamp-rp openamp-rp@lists.openampproject.org wrote:

Hi All,

As stated in a previous OpenAMP meeting, there was the use case of either the remote or host attempting to re-establish connection if one side goes down.

A proposal for this is as follows:

Changes to existing structures:

1.Add a new feature to the bitmap for VirtIO RPMsg with a name like "VIRTIO_RPMSG_F_RECOVERY" and an extra bit here

Add a 2 bit field to struct rpmsg_device and struct fw_rsc_vdev to

denote if recovery functionality is supported and/or active that can have the following 3 values:

0 - recovery not supported

1 - recovery supported but RPMsg endpoint reconnection is not yet

allowed.

2 - recovery supported and RPMsg endpoint reconnection is allowed.

Changes to host's resource table parsing:

When a host parses the resource table of the remote processor, check if recovery feature is set and update the vdev's features accordingly.

Changes to host's endpoint creation:

First setup, set struct fw_rsc_vdev reovery flag to "recovery

supported but reconnection is not yet allowed"

After initial endpoint creation is successful, set recovery flag in

resource table's instance of the struct fw_rsc_vdev to "reconnection allowed" and set resource table's copy of the rpmsg_vdev so that the recovery field denotes reconnection is allowed

If recovery is fully setup then upon subsequent host endpoint

creation
              a. host will not re-initialize virtqueues

              b. if rpmsg address is already set, do not error out

              c. unlock rdev virtqueue lock so that messages can send
again

Changes to remote's endpoint creation:
If recovery is fully setup then the following changes apply:
         a. remote will not wait for virtqueues' initialization
by host
              b. if rpmsg address is already set, do not error out

              c. unlock rdev virtqueue lock so that messages can send
again

Changes to rpmsg-send:

None. The virtqueue lock is opened upon endpoint re-connection.

Kind Regards,

Ben

Openamp-rp mailing list -- openamp-rp@lists.openampproject.org To unsubscribe send an email to openamp-rp-leave@lists.openampproject.org

Thanks for framing the problem clearly.

I’m top-posting this because I have a few high-level comments.

First, the clean restart process would be a lot simpler if our virtio transport layer supported Virtio v1.2’s vring reset mechanism. There has been a recent effort to provide this for vring PCI in the Linux kernel ([1][PATCH v7 00/26] virtio pci support VIRTIO_F_RING_RESET), but I don’t know how far along the path to merging it is. The Virtio 1.2 spec appears to have been in the very late stages of approval for some time.

Fourth, the RPMsg protocol itself needs to be enhanced to support things like informing the sender that the receiver has been reset. Doing this in a backwards-compatible fashion will be interesting.

We need to think about all of these things to construct a more robust communications infrastructure.

Regards,

Ed M

On Apr 20, 2022, at 3:28 PM, ben levinsky via Openamp-rp <[2]openamp-rp@lists.openampproject.org> wrote:

Hi All, As stated in a previous OpenAMP meeting, there was the use case of either the remote or host attempting to re-establish connection if one side goes down. A proposal for this is as follows: Changes to existing structures: 1.Add a new feature to the bitmap for VirtIO RPMsg with a name like "VIRTIO_RPMSG_F_RECOVERY" and an extra bit here 2. Add a 2 bit field to struct rpmsg_device and struct fw_rsc_vdev to denote if recovery functionality is supported and/or active that can have the following 3 values: - 0 - recovery not supported - 1 - recovery supported but RPMsg endpoint reconnection is not yet allowed. - 2 - recovery supported and RPMsg endpoint reconnection is allowed. Changes to host's resource table parsing: When a host parses the resource table of the remote processor, check if recovery feature is set and update the vdev's features accordingly. Changes to host's endpoint creation: 1. First setup, set struct fw_rsc_vdev reovery flag to "recovery supported but reconnection is not yet allowed" 2. After initial endpoint creation is successful, set recovery flag in resource table's instance of the struct fw_rsc_vdev to "reconnection allowed" and set resource table's copy of the rpmsg_vdev so that the recovery field denotes reconnection is allowed 3. If recovery is fully setup then upon subsequent host endpoint creation a. host will not re-initialize virtqueues b. if rpmsg address is already set, do not error out c. unlock rdev virtqueue lock so that messages can send again Changes to remote's endpoint creation: 2. If recovery is fully setup then the following changes apply: a. remote will not wait for virtqueues' initialization by host b. if rpmsg address is already set, do not error out c. unlock rdev virtqueue lock so that messages can send again Changes to rpmsg-send: None. The virtqueue lock is opened upon endpoint re-connection. Kind Regards, Ben -- Openamp-rp mailing list -- [3]openamp-rp@lists.openampproject.org To unsubscribe send an email to [4]openamp-rp-leave@lists.openampproject.org

References

1. https://lore.kernel.org/all/20220308123518.33800-1-xuanzhuo@linux.alibaba.co... 2. mailto:openamp-rp@lists.openampproject.org 3. mailto:openamp-rp@lists.openampproject.org 4. mailto:openamp-rp-leave@lists.openampproject.org

Bill Mills

5 May 5 May

2:56 p.m.

Ben, Ed & all,

On 5/3/22 1:47 AM, Ed Mooring via Openamp-rp wrote:

...

Thanks for framing the problem clearly.

I’m top-posting this because I have a few high-level comments.

First, the clean restart process would be a lot simpler if our virtio
transport layer supported Virtio v1.2’s vring reset mechanism. There
has been a recent effort to provide this for vring PCI in the Linux
kernel ([1][PATCH v7 00/26] virtio pci support VIRTIO_F_RING_RESET),
but I don’t know how far along the path to merging it is. The Virtio
1.2 spec appears to have been in the very late stages of approval for
some time.

I agree that looking at the virtio spec model for reset is the correct model. However, I am not sure we need ring level reset. A vdev level reset should work for rpmsg.

If rpmsg was riding on top of virtio-mmio or virtio-pci the path would be clear.

Each transport provides a device level reset. If the driver side detects the problem, it issues a reset and starts again reclaiming any buffers it had loaned to the driver side.

If the device side detects the issue it sets the error status at the device level and waits around for the driver to reset it.

To me the issue is that remoteproc virtio transport does not have a defined way to reset the vdev w/o resetting the whole remoteproc. Fixing that would give use a solution to this and would fit into the virtio spec.

You solution kind of looks like this Ben but I am not sure about some of the statements. It sounds like you are trying to do some things only partically and that looks too complex and out of spec to me. Maybe I am wrong about what you are saying. We can discuss in a couple of minutes.

Bill

...

Second, RPMsg over virtio is a different kind of device from any of the
other devices in the virtio spec. It’s  like MMIO, except that config
space writes are not automatically caught, and the remoteproc framework
adds some wrinkles. If someone is VERY patient, it might be nice to get
an RPMsg device section into the next iteration of the spec.

Third, this needs to be “reversible”. The driver/device connection is
inherently asymmetric, and the Virtio spec, and the Linux
implementation tend to assume that they are the driver side. Some of
our use cases need the MCU to be the driver and the rich OS to be the
device. It appears there is some attention being paid to this in the
system reference work, but I’m not following that very closely right
now.

Fourth, the RPMsg protocol itself needs to be enhanced to support
things like informing the sender that the receiver has been reset.
Doing this in a backwards-compatible fashion will be interesting.

We need to think about all of these things to construct a more robust
communications infrastructure.

Regards,

Ed M

On Apr 20, 2022, at 3:28 PM, ben levinsky via Openamp-rp
<[2]openamp-rp@lists.openampproject.org> wrote:

  Hi All,
  As stated in a previous OpenAMP meeting, there was the use case of
  either the remote or host attempting to re-establish connection if
one
  side goes down.
  A proposal for this is as follows:
  Changes to existing structures:
  1.Add a new feature to the bitmap for VirtIO RPMsg with a name like
  "VIRTIO_RPMSG_F_RECOVERY" and an extra bit here
  2. Add a 2 bit field to struct rpmsg_device and struct fw_rsc_vdev to
  denote if recovery functionality is supported and/or active that can
  have the following 3 values:
  - 0 - recovery not supported
  - 1 - recovery supported but  RPMsg endpoint reconnection is not yet
  allowed.
  - 2 - recovery supported and   RPMsg endpoint reconnection is
allowed.
  Changes to host's resource table parsing:
  When a host parses the resource table of the remote processor, check
if
  recovery feature is set and update the vdev's features accordingly.
  Changes to host's endpoint creation:
  1. First setup, set struct fw_rsc_vdev reovery flag to "recovery
  supported but reconnection is not yet allowed"
  2. After initial endpoint creation is successful, set recovery flag
in
  resource table's instance of the struct fw_rsc_vdev to "reconnection
  allowed" and set resource table's copy of the rpmsg_vdev so that the
  recovery field denotes reconnection is allowed
  3. If recovery is fully setup then upon subsequent host endpoint
  creation
                  a. host will not re-initialize virtqueues
                  b. if rpmsg address is already set, do not error out
                  c. unlock rdev virtqueue lock so that messages can
send
  again
  Changes to remote's endpoint creation:
  2. If recovery is fully setup then the following changes apply:
                  a. remote will not wait for virtqueues'
initialization
  by host
                  b. if rpmsg address is already set, do not error out
                  c. unlock rdev virtqueue lock so that messages can
send
  again
  Changes to rpmsg-send:
  None. The virtqueue lock is opened upon endpoint re-connection.
  Kind Regards,
  Ben
--
Openamp-rp mailing list -- [3]openamp-rp@lists.openampproject.org
To unsubscribe send an email to
[4]openamp-rp-leave@lists.openampproject.org

References

1. https://lore.kernel.org/all/20220308123518.33800-1-xuanzhuo@linux.alibaba.com/#r
2. mailto:openamp-rp@lists.openampproject.org
3. mailto:openamp-rp@lists.openampproject.org
4. mailto:openamp-rp-leave@lists.openampproject.org

-- Openamp-rp mailing list -- openamp-rp@lists.openampproject.org To unsubscribe send an email to openamp-rp-leave@lists.openampproject.org

-- Bill Mills Principal Technical Consultant, Linaro +1-240-643-0836 TZ: US Eastern Work Schedule: Tues/Wed/Thur

1185

days inactive

1200

days old

openamp-rp@lists.openampproject.org

3 comments

participants

tags (0)

participants (4)

Arnaud POULIQUEN
ben levinsky
Bill Mills
Ed Mooring