[DRMAA-WG] MonitoringSession

Discussion:

Daniel Gruber

2010-11-08 09:55:26 UTC

Hi,

in the MonitorinSession we have on machine level machineSockets and coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.

0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.

If there is no objection I'll take this as accepted.

Cheers

Daniel

Andre Merzky

2010-11-08 13:44:24 UTC

Permalink

+1

On Mon, Nov 8, 2010 at 10:55 AM, Daniel Gruber

Post by Daniel Gruber
Hi,
in the MonitorinSession we have on machine level machineSockets and coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.
0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.
If there is no objection I'll take this as accepted.
Cheers
Daniel
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg

--
Nothing is ever easy...

Peter Tröger

2010-11-08 14:14:03 UTC

Permalink

Hi,

I can agree to the new "threadsPerCore" attribute, but would prefer to have "1" as default value. From our understanding of a core, each one can always execute at least one thread. It would also allow to compute an estimation of the number of parallel threads, without looking on the specific numbers.

Best,
Peter.

Post by Daniel Gruber
Hi,
in the MonitorinSession we have on machine level machineSockets and coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.
0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.
If there is no objection I'll take this as accepted.
Cheers
Daniel
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

Daniel Gruber

2010-11-08 14:27:26 UTC

Permalink

Ok. For simplicity we take 1 as default value with the
drawback that we loose information if the SMT value
is available (and correct) or not.

Regards,

Daniel

Post by Peter TrÃ¶ger
Hi,
I can agree to the new "threadsPerCore" attribute, but would prefer to have "1" as default value. From our understanding of a core, each one can always execute at least one thread. It would also allow to compute an estimation of the number of parallel threads, without looking on the specific numbers.
Best,
Peter.

Post by Daniel Gruber
Hi,
in the MonitorinSession we have on machine level machineSockets and coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.
0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.
If there is no objection I'll take this as accepted.
Cheers
Daniel
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

Mariusz Mamoński

2010-11-08 15:01:03 UTC

Permalink

Hi all,

If we are talking about the monitoring session... what do you think
about the idea of:

1. creating a new data struct MachineInfo with all the predefined
machine attributes (e.g.: threadsPerCore) + "readonly attribute
Dictionary drmsSpecific;" (an extension point) and providing one
method: "MachineInfo getMachineInfo(in string machineName) for
accessing all of them
2. adding a new attribute "slotsCount", which denotes the maximum
number of single-process jobs that can run on given machine
concurrently (use case: system administrator may either choose
configuration where one process runs per physical core or hardware
thread or choose choose any number that is totally independent from
hardware configuration)

Cheers,

Post by Daniel Gruber
Ok. For simplicity we take 1 as default value with the
drawback that we loose information if the SMT value
is available (and correct) or not.
Regards,
Daniel

--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg

--
Mariusz

Daniel Templeton

2010-11-08 15:05:25 UTC

Permalink

The slotsCount attribute would be challenging for something like OGE,
where the number of slots available on a machine is taken from the sum
of the slots in all the enabled queue instances on that machine modulo
the active resource quota sets modulo any host-level slots settings
modulo the queue subordination rules. It's also not very meaningful,
because many of those things can and do change dynamically.

Daniel

Post by Mariusz MamoÅski
Hi all,
If we are talking about the monitoring session... what do you think
1. creating a new data struct MachineInfo with all the predefined
machine attributes (e.g.: threadsPerCore) + "readonly attribute
Dictionary drmsSpecific;" (an extension point) and providing one
method: "MachineInfo getMachineInfo(in string machineName) for
accessing all of them
2. adding a new attribute "slotsCount", which denotes the maximum
number of single-process jobs that can run on given machine
concurrently (use case: system administrator may either choose
configuration where one process runs per physical core or hardware
thread or choose choose any number that is totally independent from
hardware configuration)
Cheers,

Post by Daniel Gruber
Ok. For simplicity we take 1 as default value with the
drawback that we loose information if the SMT value
is available (and correct) or not.
Regards,
Daniel

Post by Daniel Gruber
Hi,
in the MonitorinSession we have on machine level machineSockets and coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.
0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.
If there is no objection I'll take this as accepted.
Cheers
Daniel
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

Daniel Gruber

2010-11-08 15:12:45 UTC

Permalink

Adding slotsCount to machineInfo would destroy the separation
between queue level and host level information. Different
users could have different slotCount on the same machine.
The information must be retrieved on queue level (we have
the queueMaxSlotsAllowed()) method for that.

Cheers

Daniel

Post by Daniel Gruber
Ok. For simplicity we take 1 as default value with the
drawback that we loose information if the SMT value
is available (and correct) or not.
Regards,
Daniel

Post by Daniel Gruber
Hi,
in the MonitorinSession we have on machine level machineSockets and coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.
0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.
If there is no objection I'll take this as accepted.
Cheers
Daniel
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

Mariusz Mamoński

2010-11-08 15:58:03 UTC

Permalink

Can we split the discussion into two paths? ;-)

Does anyone vote against 1. point (MachineInfo)?

Regarding 2. point:

We could make maxSlotsCount optional (like the threadsPerCore) and
define it only as the upper bound (and only for the purpose of
computing host/cluster "capacity"). Alternatively: what do you think
about the idea of:
- changing threadsPerCore -> maxThreads
- relaxing meaning of the threadsPerCore/maxThreads: as "maximal
number of threads that can run *simultaneously* on given machine"
without saying if this is imposed by hardware configuration or system
policy.

Cheers,

Adding slotsCount to machineInfo would destroy the separation between queue
level and host level information. Different users could have different
slotCount on the same machine. The information must be retrieved on queue
level (we have the queueMaxSlotsAllowed()) method for that.

but in this case the slots can spawn multiple hosts

Cheers
Daniel

Post by Mariusz MamoÅski
Hi all,
?If we are talking about the monitoring session... what do you think
?1. creating a new data struct MachineInfo with all the predefined
machine attributes (e.g.: threadsPerCore) + ?"readonly attribute
Dictionary drmsSpecific;" (an extension point) and providing one
method: "MachineInfo getMachineInfo(in string machineName) for
accessing all of them
?2. adding a new attribute "slotsCount", which denotes the maximum
number of single-process jobs that can run on given machine
concurrently (use case: system administrator may either choose
configuration where one process runs per physical core or hardware
thread or choose choose any number that is totally independent from
hardware configuration)
Cheers,

Post by Daniel Gruber
Ok. For simplicity we take 1 as default value with the
drawback that we loose information if the SMT value
is available (and correct) or not.
Regards,
Daniel

Post by Peter TrÃ¶ger
Hi,
I can agree to the new "threadsPerCore" attribute, but would prefer to
have "1" as default value. From our understanding of a core, each one can
always execute at least one thread. It would also allow to compute an
estimation of the number of parallel threads, without looking on the
specific numbers.
Best,
Peter.

--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg

--
Mariusz

Daniel Gruber

2010-11-08 17:34:07 UTC

Permalink

Post by Mariusz MamoÅski
Can we split the discussion into two paths? ;-)
Does anyone vote against 1. point (MachineInfo)?

When there is a method "getMachineInfo(in string machineName)"
we would also require "getQueueInfo(in stirng queueName)" for
consistency. Then in total we would have "getQueueNames",
"getMachineNames", "getMachineInfo", "getQueueInfo",
and "getAllJobs". Which would result in a pretty clear
interface. But the current approach has its strengths in
simplicity when accessing the values. I would keep the interface
simple and portable in case of monitoring since this is IMHO
not our core competence. Hence I would keep the current
approach and vote against it.

Post by Mariusz MamoÅski
We could make maxSlotsCount optional (like the threadsPerCore) and
define it only as the upper bound (and only for the purpose of
computing host/cluster "capacity"). Alternatively: what do you think
- changing threadsPerCore -> maxThreads
- relaxing meaning of the threadsPerCore/maxThreads: as "maximal
number of threads that can run *simultaneously* on given machine"
without saying if this is imposed by hardware configuration or system
policy.

The other machine values (load, sockets, cores, threads, physical mem,
virtual mem, machine os, OS version, machine arch) are not user dependent
hence it would break consistency. Queue values should be user dependent.

Daniel

Post by Mariusz MamoÅski
Cheers,

Adding slotsCount to machineInfo would destroy the separation between queue
level and host level information. Different users could have different
slotCount on the same machine. The information must be retrieved on queue
level (we have the queueMaxSlotsAllowed()) method for that.

but in this case the slots can spawn multiple hosts

Cheers
Daniel

Post by Daniel Gruber
Ok. For simplicity we take 1 as default value with the
drawback that we loose information if the SMT value
is available (and correct) or not.
Regards,
Daniel

Post by Peter TrÃ¶ger
Hi,
I can agree to the new "threadsPerCore" attribute, but would prefer to
have "1" as default value. From our understanding of a core, each one can
always execute at least one thread. It would also allow to compute an
estimation of the number of parallel threads, without looking on the
specific numbers.
Best,
Peter.

Post by Daniel Gruber
Hi,
in the MonitorinSession we have on machine level machineSockets and
coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.
0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.
If there is no objection I'll take this as accepted.
Cheers
Daniel
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

Mariusz Mamoński

2010-11-08 17:59:32 UTC

Permalink

Post by Daniel Gruber

Post by Mariusz MamoÅski
Can we split the discussion into two paths? ;-)
Does anyone vote against 1. point (MachineInfo)?

When there is a method "getMachineInfo(in string machineName)"
we would also require "getQueueInfo(in stirng queueName)" for consistency.

agree.

Post by Daniel Gruber
Then in total we would have "getQueueNames", "getMachineNames",
"getMachineInfo", "getQueueInfo", and "getAllJobs". Which would result in a
pretty clear interface. But the current approach has its strengths in
simplicity when accessing the values. I would keep the interface simple and
portable in case of monitoring since this is IMHO not our core competence.

what about efficiency? in some systems the cost of fetching one of the
host attribute is equal to fetching all of them and IMHO user is
usually interested in at least few of them.

Post by Daniel Gruber
Hence I would keep the current approach and vote against it.

Post by Mariusz MamoÅski
?We could make maxSlotsCount optional (like the threadsPerCore) and
define it only as the upper bound (and only for the purpose of
computing host/cluster "capacity"). Alternatively: what do you think
?- changing threadsPerCore -> maxThreads
?- relaxing meaning of the threadsPerCore/maxThreads: as "maximal
number of threads that can run *simultaneously* on given machine"
without saying if this is imposed by hardware configuration or system
policy.

Post by Mariusz MamoÅski
Cheers,

Adding slotsCount to machineInfo would destroy the separation between queue
level and host level information. Different users could have different
slotCount on the same machine. The information must be retrieved on queue
level (we have the queueMaxSlotsAllowed()) method for that.

but in this case the slots can spawn multiple hosts

Cheers
Daniel

Post by Daniel Gruber
Ok. For simplicity we take 1 as default value with the
drawback that we loose information if the SMT value
is available (and correct) or not.
Regards,
Daniel

Post by Peter TrÃ¶ger
Hi,
I can agree to the new "threadsPerCore" attribute, but would prefer to
have "1" as default value. From our understanding of a core, each one can
always execute at least one thread. It would also allow to compute an
estimation of the number of parallel threads, without looking on the
specific numbers.
Best,
Peter.

Post by Daniel Gruber
Hi,
in the MonitorinSession we have on machine level machineSockets and
coresPerSocket.
To be consequent we should also add threadsPerCore. At least OGE/SGE does
support this. I added it into our spreadsheet.
If this is not supported by a DRM/OS it could return 0 as value for unknown.
0 for coresPerSocket and machineSockets is not allowed since we should
define coresPerSocket*machineSockets=="processors" in case a DRM or OS
does not support this kind of architectural information. I suggest to leave
it open for the DRMAA implementation if it maps the "processors" information
to coresPerSocket or machineSockets in case of missing architectural
details.
If there is no objection I'll take this as accepted.
Cheers
Daniel
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg

--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg

--
Mariusz

Peter Tröger

2010-11-09 10:04:27 UTC

Permalink

Hi,

Post by Mariusz MamoÅski

Post by Daniel Gruber

Post by Mariusz MamoÅski
Can we split the discussion into two paths? ;-)
Does anyone vote against 1. point (MachineInfo)?

When there is a method "getMachineInfo(in string machineName)"
we would also require "getQueueInfo(in stirng queueName)" for consistency.

agree.

what about efficiency? in some systems the cost of fetching one of the
host attribute is equal to fetching all of them and IMHO user is
usually interested in at least few of them.

The whole discussion already took place during several phone calls, and Mariusz never managed to get his 'consistency' proposal through - even though he tried hard ;-).
I vote against opening new API structure discussions at this point. There was enough time for such objections in the past.

Post by Mariusz MamoÅski

Post by Daniel Gruber

Same counter argument from my side. We had long and painful slot-related discussions during the phone calls. The overall agreement was that DRMAA cannot apply any meaning to the slot concept, so we just treat it is opaque monitoring data. Check the meeting minutes.

Best,
Peter.