Hi all,
During the last system device tree call, we agreed it would be helpful
to have a document explaining how to use the proposed new device tree
bindings to describe heterogeneous systems and solve typical problems,
such as memory reservations for multiple domains, multiple interrupts
controllers, etc.
Tomas and I wrote the following document (also attached: feel free to
use pandoc to convert it into html if you prefer to read it that way).
It includes an introduction to system device tree, a short description
of the bindings, and how to use them to solve common problems. I hope
that it will help create a common understanding of the problems we are
trying to solve and the potential solutions. I also attached the full
system device tree example as reference.
Cheers,
Stefano
System Device Tree Concepts
===========================
System Device Trees extends traditional Device Trees to handle
heterogeneous SoCs with multiple CPUs and Execution Domains. An
Execution Domain can be seen as an address space that is running a
software image, whether an operating system, a hypervisor or firmware
that has a set of cpus, memory and devices attached to it. I.e. Each
individual CPU/core that is not part of an SMP cluster is a separate
Execution Domain as is the different Execution Levels on an ARMv8-A
architecture. Trusted and not trusted environment can also be viewed as
separate Execution Domains.
A design goal of System Device Trees is that no current client of Device
Trees should have to change at all, unless it wants to take advantage of
the extra information. This means that Linux in particular does not need
to change since it will see a Device Tree that it can handle with the
current implementation, potentially with some extra information it can
ignore.
System Device Trees must handle two types of heterogeneous additions:
1. Being able to specify different cpu clusters and the actual memory
and devices hard-wired to them
- This is done through the new Hardware Descriptions, such as
"cpu,cluster" and "indirect-bus"
- This information is provided by the SoC vendor and is typically
fixed for a given SoC/board
2. Being able to assign hardware resources that can be configured by
software to be used by one or more Execution Domains
- This is done through the Execution Domain configuration
- This information is provided by a System Architect and will be
different for different use cases, even for the same board
- E.g. How much memory and which devices goes to Linux vs. an
RTOS can be different from one boot to another
- This information should be separated from the hard-wired
information for two reasons
- A different persona will add and edit the information
- Configuration should be separated from specification since it
has a different rate of change
The System Device Trees and Execution Domain information are used in two
major use cases:
1. Exclusively on the host by using a tool like Lopper that will "prune"
the System Device Tree
- Each domain will get its own "traditional" Device Tree that only
sees one address space and has one "cpus" node, etc.
- Lopper has pluggable backends to it can also generate information
for clients that is using a different format
- E.g. It can generate a bunch of "#defines" that can be
included and compiled in to an RTOS
2. System Device Trees can be used by a "master" target environment that
manages multiple Execution Domains:
- a firmware that can set up hardware protection and use it to
restart individual domains
- E.g. Protect the Linux memory so the R5 OS can't reach it
- any other operating system or hypervisor that has sub-domains
- E.g. Xen can use the Execution Domains to get info about the Xen
guests (also called domains)
- E.g. Linux could use the default domain for its own
configuration and the domains to manage other CPUs
- Since System Device Trees are backwards compatible with Device
Trees, the only changes needed in Linux would be any new code
taking advantage of the Domain information
- a default master has access to all resources (CPUs, memories,
devices), it has to make sure it stops using the resource
itself when it "gives it away" to a sub-domain
There is a concept of a default Execution Domain in System Device Trees,
which corresponds to /cpus. The default domain is compatible with the
current traditional Device Tree. It is useful for a couple of reasons:
1. As a way to specify the default place to assign added hardware (see
use case #1)
- A default domain does not have to list the all the HW resources
allocated to it. It gets everything not allocated elsewhere by
Lopper.
- This minimizes the amount of information needed in the Domain
configuration.
- This is also useful for dynamic hardware such as add-on boards and
FPGA images that are adding new devices.
2. The default domain can be used to specify what a master environment
sees (see use case #2)
- E.g. the default domain is what is configuring Linux or Xen, while
the other domains specify domains to be managed by the master
System Device Tree Hardware Description
=======================================
To turn system device tree into a reality we are introducing a few new
concepts. They enable us to describe a system with multiple cpus
clusters and potentially different address mappings for each of them
(i.e. a device could be seen at different addresses from different cpus
clusters).
The new concepts are:
- Multiple top level "cpus,cluster" nodes to describe heterogeneous CPU
clusters.
- "indirect-bus": a new type of bus that does not automatically map to
the parent address space (i.e. not automatically visible).
- An "address-map" property to express the different address mappings of
the different cpus clusters and to map indirect-buses.
The following is a brief example to show how they can be used together:
/* default cluster */
cpus {
cpu@0 {
};
cpu@1 {
};
};
/* additional R5 cluster */
cpus_r5: cpus-cluster@0 {
compatible = "cpus,cluster";
/* specifies address mappings */
address-map = <0xf9000000 &amba_rpu 0xf9000000 0x10000>;
cpu@0 {
};
cpu@1 {
};
};
amba_rpu: indirect-bus@f9000000 {
compatible = "indirect-bus";
};
In this example we can see:
- two cpus clusters, one of them is the default top-level cpus node
- an indirect-bus "amba_rpu" which is not visible to the top-level cpus
node
- the cpus_r5 cluster can see amba_rpu because it is explicitly mapped
using the address-map property
Devices only physically accessible from one of the two clusters should
be placed under an indirect-bus as appropriate. For instance, in the
following example we can see how interrupts controllers are expressed:
/* default cluster */
cpus {
};
/* additional R5 cluster */
cpus_r5: cpus-cluster@0 {
compatible = "cpus,cluster";
/* specifies address mappings */
address-map = <0xf9000000 &amba_rpu 0xf9000000 0x10000>;
};
/* bus only accessible by cpus */
amba_apu: bus@f9000000 {
compatible = "simple-bus";
gic_a72: interrupt-controller@f9000000 {
};
};
/* bus only accessible by cpus_r5 */
amba_rpu: indirect-bus@f9000000 {
compatible = "indirect-bus";
gic_r5: interrupt-controller@f9000000 {
};
};
gic_a72 is accessible by /cpus, but not by cpus_r5, because amba_apu is
not present in the address-map of cpus_r5.
gic_r5 is visible to cpus_r5, because it is present in the address map
of cpus_r5. gic_r5 is not visible to /cpus because indirect-bus doesn't
automatically map to the parent address space, and /cpus doesn't have an
address-map property in the example.
Relying on the fact that each interrupt controller is correctly visible
to the right cpus cluster, it is possible to express interrupt routing
from a device to multiple clusters. For instance:
amba: bus@f1000000 {
compatible = "simple-bus";
ranges;
#interrupt-cells = <3>;
interrupt-map-pass-thru = <0xffffffff 0xffffffff 0xffffffff>;
interrupt-map-mask = <0x0 0x0 0x0>;
interrupt-map = <0x0 0x0 0x0 &gic_a72 0x0 0x0 0x0>,
<0x0 0x0 0x0 &gic_r5 0x0 0x0 0x0>;
can@ff060000 {
compatible = "xlnx,canfd-2.0";
reg = <0x0 0xff060000 0x0 0x6000>;
interrupts = <0x0 0x14 0x1>;
};
};
In this example, all devices under amba, including can@ff060000, have
their interrupts routed to both gic_r5 and gic_a72.
Memory only physically accessible by one of the clusters can be placed
under an indirect-bus like any other device types. However, normal
memory is usually physically accessible by all clusters. It is just a
software configuration that splits memory into ranges and assigns a
range for each execution domain. Software configurations are explained
below.
Execution Domains
=================
An execution domain is a collection of software, firmware, and board
configurations that enable an operating system or an application to run
a cpus cluster. With multiple cpus clusters in a system it is natural to
have multiple execution domains, at least one per cpus cluster. There
can be more than one execution domain for each cluster, with
virtualization or non-lockstep execution (for cpus clusters that support
it). Execution domains are configured and added at a later stage by a
software architect.
Execution domains are expressed by a new node "openamp,domain"
compatible. Being a configuration rather than a description, their
natural place is under /chosen or under a similar new top-level node. In
this example, I used /domains:
domains {
openamp_r5 {
compatible = "openamp,domain-v1";
cpus = <&cpus_r5 0x2 0x80000000>;
memory = <0x0 0x0 0x0 0x8000000>;
access = <&can@ff060000>;
};
};
An openamp,domain node contains information about:
- cpus: physical cpus on which the software is running on
- memory: memory assigned to the domain
- access: any devices configured to be only accessible by a domain
The access list is an array of links to devices that are configured to
be only accessible by an execution domain, using bus firewalls or
similar technologies.
The memory range assigned to an execution domain is expressed by the
memory property. It needs to be a subset of the physical memory in the
system. The memory property can also be used to express memory sharing
between domains:
domains {
openamp_r5 {
compatible = "openamp,domain-v1";
memory = <0x0 0x0 0x0 0x8000000 0x8 0x0 0x0 0x10000>;
};
openamp_a72 {
compatible = "openamp,domain-v1";
memory = <0x0 0x8000000 0x0 0x80000000 0x8 0x0 0x0 0x10000>;
};
};
In this example, a 16 pages range starting at 0x800000000 is shared
between two domains.
In a system device tree without a default cpus cluster (no top-level
cpus node), lopper figures out memory assignment for each domain by
looking at the memory property under each "openamp,domain" node. In a
device tree with a top-level cpus cluster, and potentially a legacy OS
running on it, we might want to "hide" the memory reservation for other
clusters from /cpus. We can do that with /reserved-memory:
reserved-memory {
#address-cells = <0x2>;
#size-cells = <0x2>;
ranges;
memory_r5@0 {
compatible = "openamp,domain-memory-v1";
reg = <0x0 0x0 0x0 0x8000000>;
};
};
The purpose of memory_r5@0 is to let the default execution domain know
that it shouldn't use the 0x0-0x8000000 memory range because it is
reserved for use by other domains.
/reserved-memory and /chosen are top-level nodes dedicated to
configurations, rather than hardware description. Each execution domain
might need similar configurations, hence, chosen and reserved-memory are
also specified under each openamp,domain node for domains specific
configurations. The top-level /reserved-memory and /chosen nodes remain in
place for the default execution domain. As an example:
/chosen -> configuration for a legacy OS running on /cpus
/reserved-memory -> reserved memory for a legacy OS running on /cpus
/domains/openamp_r5/chosen -> configuration for the domain "openamp_r5"
/domains/openamp_r5/reserved-memory -> reserved memory for "openamp_r5"
Hi all!
[ tl;dr - we're trying to organise a sprint - see the end ]
Due to the Linux on Arm meetup this week, quite a few of our DT
community happened to be in Cambridge together last night. We took the
opportunity to get together for food and beer, and we naturally ended
up mulling over a whole range of things, not least the ongoing System
DT design work.
As is the nature of this kind of gathering, we had a whole range of
opinions and ideas, much more than I can faithfully recall to share
here in great detail. Grant has been keen to try and keep system-dt
compatible with Linux DT usage if that's possible, and Stefano has
been working with that in mind. Rob is concerned about how things
might scale with lots of domains, for example. Olof is keen to talk
about a higher-level design to make it easier to express complex
layouts. And so on.
There is one thing that *did* stand out: we agreed that it's probably
time to try and get all of the interested people in a room together
for a few days with a whiteboard, to really thrash out what's needed
and how to achieve it. With the right people, we can go through all
our needs, pick up the ideas we have and hopefully get some prototypes
ready to evaluate.
So, who's interested? I believe all the people in CC here are likely
keen to be involved. Anybody else?
Although we have Linaro Connect in Budapest at the end of March,
that's really *not* a good place to try and meet up, for a couple of
reasons: not everybody will be there, and we'll have too many
distractions to be able to focus on this. Let's *not* do that.
AFAICS we're geographically roughly split between US/Canada and
Europe, so I'm thinking a week in either the US or Europe, some time
in the next 2-3 months:
* Would a week before or after Connect work, in Europe (16th-20th
March, or 30th March to 3rd April)? I can look for options in the
UK easily, with maybe either Arm or Linaro hosting.
* Alternatively, a week's meeting in the US, deliberately avoiding
the Connect week so we don't wipe people out with travel: Maybe 2-6
or 9-13 March? Or push back into April if that's too short notice,
of course. Would Xilinx be able to host something for us, maybe? Or
another possibility might be the Linaro office in Nashua?
I'm open to other suggestions for time and venue. Let's try to make
something work here?
Cheers,
Steve
--
Steve McIntyre steve.mcintyre(a)linaro.org
<http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
Week after Connect would work too. (before Connect is Netconf in Canada).
On Mon, 10 Feb 2020 at 15:29, Steve McIntyre via System-dt <
system-dt(a)lists.openampproject.org> wrote:
> On Mon, Feb 10, 2020 at 11:17:35AM +0000, Grant Likely wrote:
> >On 06/02/2020 15:59, Steve McIntyre wrote:
> >> Hi all!
> >>
> >> [ tl;dr - we're trying to organise a sprint - see the end [...]
> >> > AFAICS we're geographically roughly split between US/Canada and
> >> Europe, so I'm thinking a week in either the US or Europe, some time
> >> in the next 2-3 months:
> >>
> >> * Would a week before or after Connect work, in Europe (16th-20th
> >> March, or 30th March to 3rd April)? I can look for options in the
> >> UK easily, with maybe either Arm or Linaro hosting.
> >
> >The week before isn't great, but the week after would work. I would want
> >to travel home over the weekend.
> >
> >> * Alternatively, a week's meeting in the US, deliberately avoiding
> >> the Connect week so we don't wipe people out with travel: Maybe 2-6
> >> or 9-13 March? Or push back into April if that's too short notice,
> >> of course. Would Xilinx be able to host something for us, maybe? Or
> >> another possibility might be the Linaro office in Nashua?
> >
> >The 2nd-3rd could work for me. I may already be traveling to California
> >for this week to attend OCP Summit (4th-5th March) and the LF Member
> >Meeting 10th-12th.
>
> Thanks Grant!
>
> Any more dates please? Feel free to just mail me privately rather than
> to the lists - I'm filling in the data provided into a spreadsheet to
> help with planning.
>
> Cheers,
> --
> Steve McIntyre steve.mcintyre(a)linaro.org
> <http://www.linaro.org/> Linaro.org | Open source software for ARM SoCs
>
> --
> System-dt mailing list
> System-dt(a)lists.openampproject.org
> https://lists.openampproject.org/mailman/listinfo/system-dt
>
--
François-Frédéric Ozog | *Director Linaro Edge & Fog Computing Group*
T: +33.67221.6485
francois.ozog(a)linaro.org | Skype: ffozog
Hi Steve,
A sprint sounds like a good idea. Xilinx would be very happy to host it
here in San Jose, both March 2-6 (preferred) or March 9-13 would work.
Otherwise, it would be challenging for Tomas and me to join a meeting in
Europe outside of Linaro Connect. I know you suggested to rule out
Linaro Connect, but I think we could make it work as a focused colocated
event in Budapest the same week. Could that be an option?
Cheers,
Stefano
On Fri, 7 Feb 2020, Steve McIntyre wrote:
> Hey Loic,
>
> Cool, that's very helpful thanks! :-)
>
> I'm building a spreadsheet of possible dates now as people share their
> availability. </hint>
>
> Cheers,
>
> Steve
>
> On Fri, Feb 07, 2020 at 03:48:45PM +0000, Loic PALLARDY wrote:
> >Hi Steve,
> >
> >Sure, ST is interested to participate to a sprint on System Device tree definition.
> >I can propose ST Le Mans office to host this event in Europe (Direct train access from CDG airport, only 5min by walk from train station).
> >
> >Regards,
> >Loic
> >
> >> -----Original Message-----
> >> From: dte-interest(a)linaro.org <dte-interest(a)linaro.org> On Behalf Of
> >> Nathalie Chan King Choy
> >> Sent: jeudi 6 février 2020 18:48
> >> To: Francois Ozog <francois.ozog(a)linaro.org>; Steve McIntyre
> >> <steve.mcintyre(a)linaro.org>
> >> Cc: Stefano Stabellini <stefanos(a)xilinx.com>; dte-all(a)linaro.org; Bruce
> >> Ashfield <brucea(a)xilinx.com>; devicetree-spec(a)vger.kernel.org; Rob
> >> Herring <Rob.Herring(a)arm.com>; Mark Brown <mark.brown(a)arm.com>;
> >> Benjamin Gaignard <benjamin.gaignard(a)linaro.org>; Olof Johansson
> >> <olof(a)lixom.net>; Arnd Bergmann <arnd(a)linaro.org>
> >> Subject: RE: [System-dt] System DT - thinking a sprint would help
> >>
> >> Hi Steve,
> >>
> >> Additional folks who spoke during the last System DT call & not shown on the
> >> CC list were:
> >> Loic from ST
> >> Etsam & Dan from MGC
> >> Tomas from Xilinx
> >>
> >> @Loic, Etsam, Dan, Tomas: Are you guys interested?
> >>
> >> Thanks & regards,
> >> Nathalie
> >>
> >> > -----Original Message-----
> >> > From: System-dt <system-dt-bounces(a)lists.openampproject.org> On
> >> Behalf
> >> > Of Francois Ozog via System-dt
> >> > Sent: Thursday, February 6, 2020 9:05 AM
> >> > To: Steve McIntyre <steve.mcintyre(a)linaro.org>
> >> > Cc: Stefano Stabellini <stefanos(a)xilinx.com>; dte-all(a)linaro.org; Bruce
> >> > Ashfield <brucea(a)xilinx.com>; devicetree-spec(a)vger.kernel.org; Rob
> >> > Herring <Rob.Herring(a)arm.com>; Mark Brown <mark.brown(a)arm.com>;
> >> > Benjamin Gaignard <benjamin.gaignard(a)linaro.org>; Olof Johansson
> >> > <olof(a)lixom.net>; system-dt(a)lists.openampproject.org; Arnd Bergmann
> >> > <arnd(a)linaro.org>
> >> > Subject: Re: [System-dt] System DT - thinking a sprint would help
> >> >
> >> > EXTERNAL EMAIL
> >> >
> >> > count me in.
> >> >
> >> >
> >> > On Thu, 6 Feb 2020 at 16:59, Steve McIntyre <steve.mcintyre(a)linaro.org>
> >> > wrote:
> >> > >
> >> > > Hi all!
> >> > >
> >> > > [ tl;dr - we're trying to organise a sprint - see the end ]
> >> > >
> >> > > Due to the Linux on Arm meetup this week, quite a few of our DT
> >> > > community happened to be in Cambridge together last night. We took
> >> the
> >> > > opportunity to get together for food and beer, and we naturally ended
> >> > > up mulling over a whole range of things, not least the ongoing System
> >> > > DT design work.
> >> > >
> >> > > As is the nature of this kind of gathering, we had a whole range of
> >> > > opinions and ideas, much more than I can faithfully recall to share
> >> > > here in great detail. Grant has been keen to try and keep system-dt
> >> > > compatible with Linux DT usage if that's possible, and Stefano has
> >> > > been working with that in mind. Rob is concerned about how things
> >> > > might scale with lots of domains, for example. Olof is keen to talk
> >> > > about a higher-level design to make it easier to express complex
> >> > > layouts. And so on.
> >> > >
> >> > > There is one thing that *did* stand out: we agreed that it's probably
> >> > > time to try and get all of the interested people in a room together
> >> > > for a few days with a whiteboard, to really thrash out what's needed
> >> > > and how to achieve it. With the right people, we can go through all
> >> > > our needs, pick up the ideas we have and hopefully get some prototypes
> >> > > ready to evaluate.
> >> > >
> >> > > So, who's interested? I believe all the people in CC here are likely
> >> > > keen to be involved. Anybody else?
> >> > >
> >> > > Although we have Linaro Connect in Budapest at the end of March,
> >> > > that's really *not* a good place to try and meet up, for a couple of
> >> > > reasons: not everybody will be there, and we'll have too many
> >> > > distractions to be able to focus on this. Let's *not* do that.
> >> > >
> >> > > AFAICS we're geographically roughly split between US/Canada and
> >> > > Europe, so I'm thinking a week in either the US or Europe, some time
> >> > > in the next 2-3 months:
> >> > >
> >> > > * Would a week before or after Connect work, in Europe (16th-20th
> >> > > March, or 30th March to 3rd April)? I can look for options in the
> >> > > UK easily, with maybe either Arm or Linaro hosting.
> >> > >
> >> > > * Alternatively, a week's meeting in the US, deliberately avoiding
> >> > > the Connect week so we don't wipe people out with travel: Maybe 2-6
> >> > > or 9-13 March? Or push back into April if that's too short notice,
> >> > > of course. Would Xilinx be able to host something for us, maybe? Or
> >> > > another possibility might be the Linaro office in Nashua?
> >> > >
> >> > > I'm open to other suggestions for time and venue. Let's try to make
> >> > > something work here?
Hi all,
Please join the call on Zoom: https://zoom.us/my/openampproject
(If you need the meeting ID, it's 9031895760)
The notes from the previous call (Jan 22) can be found on the OpenAMP wiki at this link:
https://github.com/OpenAMP/open-amp/wiki/System-DT-Meeting-Notes-2020#2020J…
Action items from the previous call:
* Stefano: Document a little bit more how the model works
** Remove reserved memory
** Add top-level use-case FAQ (e.g. how to do peripheral assignment to SW)
** Consider putting a qualifier word before "domain" to make it more specific
* Everyone: Try to poke holes in the model. Good to have hard questions to think through & answer
* Rob: Prototype proposal of changing root
* Nathalie: co-ordinate next call over email (2 weeks from now doesn't work b/c Rob can't make it)
For info about the list, link to the archives, to unsubscribe yourself, or
for someone to subscribe themselves, visit:
https://lists.openampproject.org/mailman/listinfo/system-dt
For information about the System Device Trees effort, including a link to
the intro presentation from Linaro Connect SAN19:
https://github.com/OpenAMP/open-amp/wiki/System-Device-Trees
Best regards,
Nathalie C. Chan King Choy
Project Manager focused on Open Source and Community
Hi all,
The notes from the Jan 22, 2020 call are posted on the OpenAMP wiki:
https://github.com/OpenAMP/open-amp/wiki/System-DT-Meeting-Notes-2020#2020J…
Action items:
* Stefano: Document a little bit more how the model works
o Remove reserved memory
o Add top-level use-case FAQ (e.g. how to do peripheral assignment to SW)
o Consider putting a qualifier word before "domain" to make it more specific
* Everyone: Try to poke holes in the model. Good to have hard questions to think through & answer
* Rob: Prototype proposal of changing root
* Nathalie: co-ordinate next call over email (2 weeks from now doesn't work b/c Rob can't make it)
Have a great weekend,
Nathalie
On Wed, Jan 22, 2020 at 8:35 AM Driscoll, Dan <dan_driscoll(a)mentor.com> wrote:
>
> Not sure if this will help and I know this is quite verbose, but, from what I see, we are converging on things that make sense to us at Mentor given our focus in this area.
>
> We have been using a device tree based approach to system partitioning for the last 3-4 years and here are some points we have learned:
>
> * To separate what we call the "system definition" (ie resource partitioning) from the hardware description, we have what we called a "System Definition Tree" or SDT file (found it kind of funny that SDT was also chosen for System Device Tree)
> * The SDT is a separate file that uses device tree syntax, but does NOT describe hardware, but rather sub-systems / partitioning using the hardware definition found in the DTS
> * The SDT file #includes the hardware description (ie top-level DTS file) and references nodes from this DT, so this keeps the 2 clearly separated (system definition versus hardware definition)
Do you have any public examples of this? Might be helpful.
Regarding the separation, How do you really separate the config and
h/w desc? The h/w desc already has some amount of configuration in it
and the tooling has to be aware of what h/w can be configured. Take
for example, you want to assign cpus to domains (openamp domain or
execution context in this use). You can make this link in either
direction:
domain to cpu:
domain0: domain-cfg {
assigned-cpus = <&cpu0>;
};
cpu to domain:
&cpu0 {
assigned-domain = <&domain0>;
};
There's no difference in complexity to generate either one and both
ways are separate from the h/w description at the source level. The
primary difference is the separation in the final built DT. Does that
matter? If so, then you'd pick the first method. However, we already
have things in h/w description that you may want to configure. For
example, flow control support for a UART which already has a defined
way to configure it (a property in the uart node). So both ways are
probably going to have to be supported.
Rob
Sorry - left off the mailing list.
-----Original Message-----
From: Driscoll, Dan
Sent: Wednesday, January 22, 2020 8:32 AM
To: 'Tomas Evensen' <tomase(a)xilinx.com>; Bjorn Andersson <bjorn.andersson(a)linaro.org>; Rob Herring <robh(a)kernel.org>
Cc: Stefano Stabellini <stefanos(a)xilinx.com>; Rob Herring <Rob.Herring(a)arm.com>; Raghuraman, Arvind <Arvind_Raghuraman(a)mentor.com>; Anjum, Etsam <Etsam_Anjum(a)mentor.com>; Humayun, Waqar <Waqar_Humayun(a)mentor.com>
Subject: RE: [System-dt] software domains and top level nodes
Not sure if this will help and I know this is quite verbose, but, from what I see, we are converging on things that make sense to us at Mentor given our focus in this area.
We have been using a device tree based approach to system partitioning for the last 3-4 years and here are some points we have learned:
* To separate what we call the "system definition" (ie resource partitioning) from the hardware description, we have what we called a "System Definition Tree" or SDT file (found it kind of funny that SDT was also chosen for System Device Tree)
* The SDT is a separate file that uses device tree syntax, but does NOT describe hardware, but rather sub-systems / partitioning using the hardware definition found in the DTS
* The SDT file #includes the hardware description (ie top-level DTS file) and references nodes from this DT, so this keeps the 2 clearly separated (system definition versus hardware definition)
* Related to the previous point, we don't use the chosen node to encompass the system definition and have our own bindings for "machine" nodes in the SDT (equivalent to "domain" nodes in current discussion)
* I think putting all of this info in the chosen node doesn't really solve any real problems and just makes things confusing - having new bindings for system definition / domains / etc, that live outside the chosen node are, to me, much cleaner as the chosen node already has uses for different software that use DT
* Each machine (domain) node has similar attributes as discussed in the thread below - memory, cpus, devices, chosen, etc - in addition to some other attributes we use to help determine how our tooling processes these machine nodes.
* For instance, machines have a "role" attribute that indicates if a machine is a "virtual" machine (hypervisor uses), a "remote" machine (OpenAMP remote), a "master" machine (OpenAMP master), etc.
* Obviously there are lots of permutations and combinations that can occur and need to be accommodated such as running a hypervisor on a Cortex-A SMP cluster and ALSO running an OpenAMP master / remote configuration between the Cortex-A cluster (could be a guest OS or could be the hypervisor) and a Cortex-R cluster
* We also have other attributes we use to help package all of the deployable content (ie remote images, guest OS images, etc), but this is outside the scope of these discussions
So, there are 2 problems that need to be solved (as, I think, this group has been considering):
1. Adding necessary hardware description to current device trees so they FULLY describe the hardware (ie heterogeneous SoCs with different subsystems / clusters, device access, interrupt routing, sharing of memory / devices, etc)
2. How to define the partitioning of #1 so a tool can create multiple usable device trees for each software context in a system
I am not enough of an expert in #1 to help extensively here, but for #2, we have been doing this for 3-4 years now and have released commercial products that use device trees for this same purpose so hopefully we can help guide things here.
Our biggest problems right now are that #1 doesn't exist (ie we have been extending existing device trees for SoCs to fully describe them and we are doing this in a way that isn't "clean") and there are a few other areas in our machine definition / bindings that are flimsy as well.
I guess I would like to see us get #1 fully defined before talking too much about #2 as I think the hardware description should stand on its own (ie doesn't depend on any new bindings defined for #2).
Dan
-----Original Message-----
From: System-dt [mailto:system-dt-bounces@lists.openampproject.org] On Behalf Of Tomas Evensen via System-dt
Sent: Tuesday, January 21, 2020 7:43 PM
To: Bjorn Andersson <bjorn.andersson(a)linaro.org>; Rob Herring <robh(a)kernel.org>
Cc: Stefano Stabellini <stefanos(a)xilinx.com>; system-dt(a)lists.openampproject.org; Rob Herring <Rob.Herring(a)arm.com>
Subject: Re: [System-dt] software domains and top level nodes
One of the things we have tried to achieve with System Device Trees is to make sure we separate the HW description from the domain configuration that typically is done by a different person.
That is, you don't want to have to edit or rewrite the parts that describes the HW in order to describe what memory, devices, cpus goes where.
Take an example where 2 cpus can either be configured to
a) work together and see the same memory/devices (SMP for example), or
b) be separated into two different domains running different OSes with different meory/devices.
So you have either one or two domains for those two cpus.
In this case I don't know that you want the "configurer" to have to go in and rewrite the file to use a different number of domains depending on the situation.
FWIW,
Tomas
On 1/21/20, 3:57 PM, "System-dt on behalf of Bjorn Andersson via System-dt" <system-dt-bounces(a)lists.openampproject.org on behalf of system-dt(a)lists.openampproject.org> wrote:
EXTERNAL EMAIL
On Tue 21 Jan 13:18 PST 2020, Rob Herring via System-dt wrote:
[..]
> To flip all this around, what if domains become the top-level structure:
>
> domain@0 {
> chosen {};
> cpus {};
> memory@0 {};
> reserved-memory {};
> };
>
> domain@1 {
> chosen {};
> cpus {};
> memory@800000 {};
> reserved-memory {};
> };
>
I like this suggestion, as this both creates a natural grouping and
could allow for describing domain-specific hardware as subtrees in each
domain.
Regards,
Bjorn
> The content of all the currently top-level nodes don't need to change.
> The OS's would be modified to treat a domain node as the root node
> which shouldn't be very invasive. Then everything else just works as
> is.
>
> This could still have other nodes at the (real) root or links from one
> domain to another. I haven't thought thru that part, but I think this
> structure can only help because it removes the notion that the root
> has a specific cpu view.
>
> Rob
> --
> System-dt mailing list
> System-dt(a)lists.openampproject.org
> https://lists.openampproject.org/mailman/listinfo/system-dt
--
System-dt mailing list
System-dt(a)lists.openampproject.org
https://lists.openampproject.org/mailman/listinfo/system-dt
--
System-dt mailing list
System-dt(a)lists.openampproject.org
https://lists.openampproject.org/mailman/listinfo/system-dt
One of the things we have tried to achieve with System Device Trees is to make sure we separate the HW description from the domain configuration that typically is done by a different person.
That is, you don't want to have to edit or rewrite the parts that describes the HW in order to describe what memory, devices, cpus goes where.
Take an example where 2 cpus can either be configured to
a) work together and see the same memory/devices (SMP for example), or
b) be separated into two different domains running different OSes with different meory/devices.
So you have either one or two domains for those two cpus.
In this case I don't know that you want the "configurer" to have to go in and rewrite the file to use a different number of domains depending on the situation.
FWIW,
Tomas
On 1/21/20, 3:57 PM, "System-dt on behalf of Bjorn Andersson via System-dt" <system-dt-bounces(a)lists.openampproject.org on behalf of system-dt(a)lists.openampproject.org> wrote:
EXTERNAL EMAIL
On Tue 21 Jan 13:18 PST 2020, Rob Herring via System-dt wrote:
[..]
> To flip all this around, what if domains become the top-level structure:
>
> domain@0 {
> chosen {};
> cpus {};
> memory@0 {};
> reserved-memory {};
> };
>
> domain@1 {
> chosen {};
> cpus {};
> memory@800000 {};
> reserved-memory {};
> };
>
I like this suggestion, as this both creates a natural grouping and
could allow for describing domain-specific hardware as subtrees in each
domain.
Regards,
Bjorn
> The content of all the currently top-level nodes don't need to change.
> The OS's would be modified to treat a domain node as the root node
> which shouldn't be very invasive. Then everything else just works as
> is.
>
> This could still have other nodes at the (real) root or links from one
> domain to another. I haven't thought thru that part, but I think this
> structure can only help because it removes the notion that the root
> has a specific cpu view.
>
> Rob
> --
> System-dt mailing list
> System-dt(a)lists.openampproject.org
> https://lists.openampproject.org/mailman/listinfo/system-dt
--
System-dt mailing list
System-dt(a)lists.openampproject.org
https://lists.openampproject.org/mailman/listinfo/system-dt
On Fri, Jan 17, 2020 at 5:30 PM Stefano Stabellini via System-dt
<system-dt(a)lists.openampproject.org> wrote:
>
> Hi all,
>
> I would like to follow-up on system device tree and specifically on one
> of the action items from the last call.
>
> Rob raised the interesting question of what is the interaction between
> the new system device tree concepts and the top level nodes (memory,
> reserved-memory, cpus, chosen).
>
> I am going to write here my observations.
Some questions inline, but they're really rhetorical questions for my
response at the end.
>
> As a short summary, the system device tree concepts are:
>
> - Multiple top level "cpus,cluster" nodes to describe heterogenous CPU
> clusters.
> - A new "indirect-bus" which is a type of bus that does not
> automatically map to the parent address space.
> - An address-map property to express the different address mappings of
> the different cpus clusters and can be used to map indirect-bus nodes.
>
> These new nodes and properties allow us to describe multiple
> heterogenous cpus clusters with potentially different address mappings,
> which can be expressed using indirect-bus and address-map.
>
> We also have new concepts for software domains configurations:
>
> - Multiple "openamp,domain" nodes (currently proposed under /chosen) to
> specify software configurations and MPU configurations.
> - A new "access" property under each "openamp,domain" node with links to
> nodes accessible from the cpus cluster.
>
> Openamp,domain nodes allow us to define the cpus cluster and set of
> hardware resources that together form a software domain. The access
> property defines the list of resources available to one particular
> cluster and maps well into MPU configurations (sometimes called
> "firewall configurations" during the calls.)
>
> See the attached full example.
>
>
> I am going to go through the major top level nodes and expand on how
> the new concepts affect them.
>
>
> /cpus
> =====
>
> /cpus is the top level node that contains the description of the cpus in
> the system. With system device tree, it is not the only cpus cluster,
> additional cpus clusters can be described by other top level nodes
> compatible with "cpus,cluster". However, /cpus remains the default
> cluster. An OS reading device tree should assume that it is running on
> /cpus. From a compatibility perspective, if an OS doesn't understand or
> recognize the other "cpus,cluster" nodes, it can ignore them, and just
> process /cpus.
>
> Buses compatible with "indirect-bus" do not map automatically to the
> parent address space, which means that /cpus won't be able to access
> them, unless an address-map property is specified under /cpus to express
> the mapping. This is the only new limitation introduced for /cpus.
> Again, from a compatibility perspective an OS that doesn't understand
> the address-map property would just ignore both it and the bus, so
> again, it is an opt-in new functionality.
>
>
> So far in my examples "openamp,domain" nodes refer to "cpus,cluster"
> nodes only, not to /cpus. There is a question on whether we want to
> allow "openamp,domain" nodes to define a software domain running on
> /cpus. We could go either way, but for simplicity I think we can avoid
> it.
>
> "openamp,domain" nodes express accessibility restrictions while /cpus is
> meant to be able to access everything by default. If we want to specify
> hard accessibility settings for all clusters, it is possible to write a
> pure system device tree without /cpus, where all cpus clusters are
> described by "cpus,cluster" nodes and there is no expectation that an OS
> will be able to use it without going through some transformations by
> lopper (or other tools.)
>
>
> /chosen
> =======
>
> The /chosen node is used for software configurations, such as bootargs
> (Linux command line). When multiple "openamp,domains" nodes are present
> the configurations directly under /chosen continue to refer to the
> software running on /cpus, while domain specific configurations need to
> go under each domain node.
>
> As an example:
>
> - /chosen/bootargs refers to the software running on /cpus
> - /chosen/openamp_r5/bootargs refers to the openamp_r5 domain
>
>
> /memory
> =======
>
> The /memory node describes the main memory in the system. Like for any
> device node, all cpus clusters can address it.
Not really true. You could have memory regions not accessible by some cpus.
> indirect-bus and
> address-map can be used to express addressing differences.
>
> It might be required to carve out special memory reservations for each
> domain. These configurations are expressed under /reserved-memory as we
> do today for any other reserved regions.
What about a symmetric case where say you have 4 domains and want to
divide main memory into 4 regions?
> /reserved-memory
> ================
>
> /reserved-memory is used to describe particular reserved memory regions
> for special use by software. With system device tree /reserved-memory
> becomes useful to describe domain specific memory reservations too.
> Memory ranges for special use by "openamp,domain" nodes are expressed
> under /reserved-memory following the usual set of rules. Each
> "openamp,domain" node links to any relevant reserved-memory regions using
> the access property. The rest is to be used by /cpus.
>
> For instance:
>
> - /reserved-memory/memory_r5 is linked and used by /chosen/openamp_r5
> - other regions under /reserved-memory, not linked by any
> "openamp,domain" nodes, go to the default /cpus
So the code that parses /reserved-memory has to look up something
elsewhere to determine if each child node applies? That's fairly
invasive to the existing handling of /reserved-memory.
Also, a reserved region could have different addresses for different
CPUs. Basically, /reserved-memory doesn't have an address, but
inherits the root addressing. That makes it a bit of an oddball. We
need to handle both shared and non-shared reserved regions.
Shared-memory for IPC is commonly described here for example.
> We should use a specific compatible string to identify reserved memory
> regions meant for openamp,domain nodes, so that a legacy OS will safely
> ignore them. I added
>
> compatible = "openamp,domain-memory-v1";
That doesn't really scale. If we don't care about legacy OS support,
then every node will have this?
I don't really like the asymmetric structure of all this. While having
a default view for existing OS seems worthwhile, as soon as there's a
more symmetric use case it becomes much more invasive and OS parsing
for all the above has to be adapted. We need to design for 100
domains.
To flip all this around, what if domains become the top-level structure:
domain@0 {
chosen {};
cpus {};
memory@0 {};
reserved-memory {};
};
domain@1 {
chosen {};
cpus {};
memory@800000 {};
reserved-memory {};
};
The content of all the currently top-level nodes don't need to change.
The OS's would be modified to treat a domain node as the root node
which shouldn't be very invasive. Then everything else just works as
is.
This could still have other nodes at the (real) root or links from one
domain to another. I haven't thought thru that part, but I think this
structure can only help because it removes the notion that the root
has a specific cpu view.
Rob