Re: Testing on attach-detach recovery - Openamp-rp

3 Jul 2025


      On 7/3/25 2:49 AM, Arnaud POULIQUEN wrote:
...
On 7/2/25 19:00, Tanmay Shah wrote:
...
On 7/2/25 10:47 AM, Arnaud POULIQUEN wrote:
...
On 7/2/25 17:23, Tanmay Shah wrote:
...
On 7/2/25 2:18 AM, Arnaud POULIQUEN wrote:
...
On 7/1/25 23:19, Tanmay Shah wrote:
...
On 7/1/25 1:06 PM, Tanmay Shah wrote:
>
>
> On 7/1/25 12:56 PM, Tanmay Shah wrote:
>>
>>
>> On 7/1/25 12:18 PM, Arnaud POULIQUEN wrote:
>>>
>>>
>>> On 7/1/25 17:16, Tanmay Shah wrote:
>>>>
>>>>
>>>> On 7/1/25 3:07 AM, Arnaud POULIQUEN wrote:
>>>>> Hi Tanmay,
>>>>>
>>>>> On 6/27/25 23:29, Tanmay Shah wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> I am implementing remoteproc recovery on attach-detach use case.
>>>>>> I have implemented the feature in the platform driver, and it works for
>>>>>> boot
>>>>>> recovery.
>>>>>
>>>>> Few questions to better understand your use case.
>>>>>
>>>>> 1) The linux remoteproc firmware attach to a a remote processor, and you
>>>>> generate a crash of the remote processor, right?
>>>>>
>>>>
>>>> Yes correct.
>>>>
>>>>> 1) How does the remoteprocessor reboot? On a remoteproc request or it
>>>>> is an
>>>>> autoreboot independent from the Linux core?
>>>>>
>>>>
>>>> It is auto-reboot independent from the linux core.
>>>>
>>>>> 2) In case of auto reboot, when does the remoteprocessor send an even to
>>>>> the
>>>>> Linux remoteproc driver ? beforeor after the reset?
>>>>>
>>>>
>>>> Right now, when Remote reboots, it sends crash event to remoteproc driver
>>>> after
>>>> reboot.
>>>>
>>>>> 3) Do you expect to get core dump on crash?
>>>>>
>>>>
>>>> No coredump expected as of now, but only recovery. Eventually will
>>>> implement
>>>> coredump functionality as well.
>>>>
>>>>>>
>>>>>> However, I am stuck at the testing phase.
>>>>>>
>>>>>> When should firmware report the crash ? After reboot ? or during some
>>>>>> kind of
>>>>>> crash handler ?
>>>>>>
>>>>>> So far, I am reporting crash after rebooting remote processor, but it
>>>>>> doesn't
>>>>>> seem to work i.e. I don't see rpmsg devices created after recovery.>
>>>>>> What should be the correct process to test this feature ? How other
>>>>>> platforms
>>>>>> are testing this?
>>>>>
>>>>> I have never tested it on ST board. As a first analysis, in case of
>>>>> autoreboot
>>>>> of the remote processor, it look like you should detach and reattach to
>>>>> recover.
>>>>
>>>> That is what's done from the remoteproc framework.
>>>>
>>>>> - On detach the rpmsg devices should be unbind
>>>>> - On attach the remote processor should request RPmsg channels using
>>>>> the NS
>>>>> announcement mechanism
>>>>>
>>>>
>>>> Main issue is, Remote firmware needs to wait till all above happens. Then
>>>> only
>>>> initialize virtio devices. Currently we don't have any way to notify
>>>> recovery
>>>> progress from linux to remote fw in the remoteproc framework. So I might
>>>> have to
>>>> introduce some platform specific mechanism in remote firmware to wait for
>>>> recovery to complete successfully.
>>>
>>> I guess the rproc->clean_table contains a copy of the resource table
>>> that is
>>> reapplied on attach, and the virtio devices should be re-probed, right?
>>>
>>> During the virtio device probe, the vdev status in the resource table is
>>> updated
>>> to 7 when virtio is ready to communicate. Virtio should then call
>>> rproc_virtio_notify() to inform the remote processor of the status update.
>>> At this stage, your remoteproc driver should be able to send a mailbox
>>> message
>>> to inform the remote side about the recovery completion.
>>>
>>
>> I think I spot the problem now.
>>
>> Linux side: file: remoteproc_core.c
>>         rproc_attach_recovery
>>             __rproc_detach
>>                 cleans up the resource table and re-loads it
>>             __rproc_attach
>>                 stops and re-starts subdevices
>>
>>
>> Remote side:
>>         Remote re-boots after crash
>>         Detects crash happened previously
>>         notify crash to Linux
>>             (Linux is executing above flow meanwhile)
>>         starts creating virtio devices
>>         **rproc_virtio_create_vdev - parse vring & create vdev device**
>>         **rproc_virtio_wait_remote_ready - wait for remote ready** [1]
>>
>> I think Remote should wait on DRIVER_OK bit, before creating virtio devices.
>> The temporary solution I implemented was to make sure vrings addresses are
>> not 0xffffffff like following:
>>
>>      while(rsc->rpmsg_vring0.da == FW_RSC_U32_ADDR_ANY ||
>>                   rsc->rpmsg_vring1.da == FW_RSC_U32_ADDR_ANY) {
>>                     usleep(100);
>>                     metal_cache_invalidate(rsc, rproc->rsc_len);
>>             }
>>
>> Above works, but I think better solution is to change sequence where remote
>> waits before creating virtio devices.
>
> I am sorry, I should have said, remote should wait before parsing and
> assigning vrings to virtio device.
>
>>
>>
>> [1] https://github.com/OpenAMP/open-amp/
>> blob/391671ba24840833d882c1a75c5d7307703b1cf1/lib/remoteproc/
>> remoteproc.c#L994
>>
Actually upon further checking, I think above code is okay. I see that
wait_remote_ready is called before vrings are setup on remote fw side.
However, during recovery time on remote side, somehow I still have to
implement
platform specific wait for vrings to setup correctly.
From linux side, DRIVER_OK bit is set before vrings are setup correctly.
Because of that, when remote firmware sets up wrong vring addresses and then
rpmsg channels are not created.
I am investigating on this further.
Do you reset the vdev status as requested by the virtio spec?
https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#...
Regards,
Arnaud
Yes I do. I am actually restoring deafult resource table on firmware side, which
will set rpmsg_vdev status to 0.
However, when printing vrings right before wait_remote_ready, I see vrings are
not set correctly from linux side:
`vring0 = 0xFFFFFFFF, vring1 = 0xFFFFFFFF`
That makes sense if values corresponds to the initial values of the resource
table
rproc->clean_table should contain a copy of these initial values.
...
However, the rproc state was still moved to attach when checked from remoteproc
sysfs.
Does the rproc_handle_resources() is called before going back in attached state?
You are right. I think __rproc_attach() isn't calling rproc_handle_resources().
But recovery is supported by other platforms so I think recovery should work
without calling rproc_handle_resources().
Right. Having taken a deeper look at the code, it seems that there is an issue.
In rproc_reset_rsc_table_on_detach(), we clean the resource table without
calling rproc_resource_cleanup().
It seems to me that rproc_reset_rsc_table_on_detach() should not be called in
__rproc_detach() but rather in rproc_detach() after calling
rproc_resource_cleanup().
Yes that sounds correct. It's long-weekend here in US. So, I will try 
this next week and update.
Thanks,
Tanmay
...
...
May be re-storing resource table from firmware side after reboot isn't a good
idea. I will try without it.
...
...
`cat /sys/class/remoteproc/remoteproc0/state`
attached
Somehow the sync between remote fw and linux isn't right.
...
...
>>
>> Thanks,
>> Tanmay
>>> Regards
>>> Arnaud
>>>
>>>
>>>>
>>>>> Regards,
>>>>> Arnaud
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Tanmay
>>>>
>>
>