Discussion:
[DRMAA-WG] Conference call - June 1th - 19:00 UTC
Peter Tröger
2011-05-30 22:00:57 UTC
Permalink
Dear all,

the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet
on Skype, please find me under my user name "potsdam_pit".

Preliminary meeting agenda:

1. Meeting secretary for this meeting?
2. DRMAAv2 Draft 5 (see attachment)

Best regards,
Peter.






-------------- next part --------------
A non-text attachment was scrubbed...
Name: drmaav2_draft5_annotated.pdf
Type: application/pdf
Size: 671789 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/drmaa-wg/attachments/20110531/333a2c46/attachment-0001.pdf
Mariusz Mamoński
2011-06-01 21:16:53 UTC
Permalink
Hi,
Post by Peter Tröger
Dear all,
the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet on
Skype, please find me under my user name "potsdam_pit".
1. Meeting secretary for this meeting?
2. DRMAAv2 Draft 5 (see attachment)
Best regards,
Peter.
a new spreadsheet tab wich tries to summarize how different resource
limits are handled in GE/LSF/Torque:

https://spreadsheets.google.com/spreadsheet/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE&hl=en_US#gid=13

and the proposition of restructuring the section 5.6.25 ( text in
brackets [] == my comment):



5.6.26 resourceLimits [not hardResourceLimits]

This attribute specifies the limits on resource utilization of the
job(s) on the execution host(s). The valid dictionary keys and their
value semantics are defined in Section 4.3.

The CORE_FILE_SIZE, DATA_SEG_SIZE, FILE_SIZE, OPEN_FILES, STACK_SIZE,
VIRTUAL_MEMORY limits SHOULD be implemented as the soft resource
limits. An implementation MAY map them to an setrlimit call in the
operating system. [I think the actual usecase for those resources is
to increase the system default limit rather than actually limit the
application]

The WALLCLOCK_TIME and CPU_TIME should be implemented as hard resource
limits, i.e. exceeding the resource limit SHOULD eventually lead to
termination of a job either by the DRM system or the application
itself. The DRM system MAY frist notify the application upon reaching
the limit (e.g. by sending a signal that can be handled) before trying
to ultimately terminate it (e.g. by sending SIGKILL signal).

All the resource limits SHOULD be enforced on per process [not job] basics.
Post by Peter Tröger
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Mariusz
Peter Tröger
2011-06-08 18:33:14 UTC
Permalink
I like the proposal, makes sense to me.

Best regards,
Peter.
Post by Mariusz Mamoński
Hi,
a new spreadsheet tab wich tries to summarize how different resource
https://spreadsheets.google.com/spreadsheet/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE&hl=en_US#gid=13
and the proposition of restructuring the section 5.6.25 ( text in
5.6.26 resourceLimits [not hardResourceLimits]
This attribute specifies the limits on resource utilization of the
job(s) on the execution host(s). The valid dictionary keys and their
value semantics are defined in Section 4.3.
The CORE_FILE_SIZE, DATA_SEG_SIZE, FILE_SIZE, OPEN_FILES, STACK_SIZE,
VIRTUAL_MEMORY limits SHOULD be implemented as the soft resource
limits. An implementation MAY map them to an setrlimit call in the
operating system. [I think the actual usecase for those resources is
to increase the system default limit rather than actually limit the
application]
The WALLCLOCK_TIME and CPU_TIME should be implemented as hard resource
limits, i.e. exceeding the resource limit SHOULD eventually lead to
termination of a job either by the DRM system or the application
itself. The DRM system MAY frist notify the application upon reaching
the limit (e.g. by sending a signal that can be handled) before trying
to ultimately terminate it (e.g. by sending SIGKILL signal).
All the resource limits SHOULD be enforced on per process [not job] basics.
Post by Peter Tröger
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
Mariusz Mamoński
2011-06-01 21:28:36 UTC
Permalink
Hi,
Post by Peter Tröger
Dear all,
the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet on
Skype, please find me under my user name "potsdam_pit".
1. Meeting secretary for this meeting?
2. DRMAAv2 Draft 5 (see attachment)
Best regards,
Peter.
a new spreadsheet tab wich tries to summarize how different resource
limits are handled in GE/LSF/Torque:

https://spreadsheets.google.com/spreadsheet/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE&hl=en_US#gid=13

and the proposition of restructuring the section 5.6.25 ( text in
brackets [] == my comment):



5.6.26 resourceLimits [not hardResourceLimits]

This attribute specifies the limits on resource utilization of the
job(s) on the execution host(s). The valid dictionary keys and their
value semantics are defined in Section 4.3.

The CORE_FILE_SIZE, DATA_SEG_SIZE, FILE_SIZE, OPEN_FILES, STACK_SIZE,
VIRTUAL_MEMORY limits SHOULD be implemented as the soft resource
limits. An implementation MAY map them to an setrlimit call in the
operating system. [I think the actual usecase for those resources is
to increase the system default limit rather than actually limit the
application]

The WALLCLOCK_TIME and CPU_TIME should be implemented as hard resource
limits, i.e. exceeding the resource limit SHOULD eventually lead to
termination of a job either by the DRM system or the application
itself. The DRM system MAY frist notify the application upon reaching
the limit (e.g. by sending a signal that can be handled) before trying
to terminate it.
Post by Peter Tröger
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
Cheers,
--
Mariusz
Peter Tröger
2011-06-08 18:26:05 UTC
Permalink
Participants: Peter, Daniel G. Mariusz, Roger

* JobInfo:slots
* Line 545: Not necessary from Daniel's perspective
* Exclusive complex: Possible to book a whole machine with only
one slot (e.g. memory amount) - difficult to detect
* Better to make clear DRMAA semantics, JobInfo:slots should be
in between the range from the job template
* Line 547:
* Currently similar to MPI approach of host file
* Might have large memory footprint (more slots)
* Alternative with additional struct
* Only necessary for reporting
* Use case: Cluster monitoring, generate MPI machine file
based on this information
* Decision: Introduce new structure with machine name and
slot count
* Decision: Remove optional sentence
* Complete IDL list in draft 5 lacks DrmaaCapability structure
* Line 899: remove optional
* Line 933: Should be clarified that it is intended to fill
out templates and structs
* Set and get give impression that they are intended for
DrmaaReflective interface attributes itself
* Line 735:
* make it mandatory -> AR might not be implemented, but AR
created outside should be supported then
* InvalidValue as generic value feasible
* Research on resource limits
* Line 244: DATA_SEG_SIZE has no use case, but we only take out
things if they are not implementable, so leave it in
* Decision: Job failing cannot be promised on resource
violation (see Google spreadsheet), application might catch signal
* Decision: Add sentence that application will be notified by
some OS-depending means
* Line 762 - might be wrong, rethin it
Post by Peter Tröger
Dear all,
the next DRMAA conf call is scheduled for June 1th, 19:00 UTC. We meet
on Skype, please find me under my user name "potsdam_pit".
1. Meeting secretary for this meeting?
2. DRMAAv2 Draft 5 (see attachment)
Best regards,
Peter.
Loading...