Discussion:
[DRMAA-WG] Conference call - Apr 6th - 19:00 UTC
Peter Tröger
2011-04-03 22:28:09 UTC
Permalink
Dear all,

the next DRMAA conf call is scheduled for Apr 6th, 19:00 UTC.The phone conference line is sponsored by Oracle:

Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285

The conference bridge MAY no longer work (Dan ?), in this case, we will organize something based on Skype.

Preliminary meeting agenda:

1. Meeting secretary for this meeting?
2. Latest updates from the participants
3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)

The attachment draft update already incorporates the comments from Andre Merzcy and Daniel S. Katz. Thanks for their input !

Best regards,
Peter.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110404/8b1ec787/attachment-0002.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drmaav2_draft2_annotated.pdf
Type: application/pdf
Size: 620018 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/drmaa-wg/attachments/20110404/8b1ec787/attachment-0001.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110404/8b1ec787/attachment-0003.html
Peter Tröger
2011-04-06 18:33:07 UTC
Permalink
Dear all,

as expected, the Oracle bridge is no longer available for us. I would propose to use Skype this time - please link with me under my user name "potsdam_pit", so that I can start a conference phone call in 30 minutes.

Thanks,
Peter.
Post by Peter Tröger
Dear all,
Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285
The conference bridge MAY no longer work (Dan ?), in this case, we will organize something based on Skype.
1. Meeting secretary for this meeting?
2. Latest updates from the participants
3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
The attachment draft update already incorporates the comments from Andre Merzcy and Daniel S. Katz. Thanks for their input !
Best regards,
Peter.
<drmaav2_draft2_annotated.pdf>
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110406/16905d35/attachment.html
Peter Tröger
2011-04-06 20:30:14 UTC
Permalink
Participants: Daniel Gruber, Mariusz Mamonski, Andre Merzcy, Peter Tr?ger.

Organizational aspects:

- Oracle bridge is no longer available for us
- Skype conference call worked fine, we continue like this
- Daniel will check for possibilities with Univa
- If US participants are still missing next week, we will move to a more Europe-friendly time slot

DRMAAv2 Draft 2:

- Decision to remove last sentence in line 101
- Boolean UNSET mapping should also be part of the language binding
- Example from Andre: Struct might map to dictionary, which can just leave out keys in case of UNSET
- Discussion about throwing out IRIX / TRUE64, not accepted since this enumeration was already heavily discussed
- Line 182, add CRAY: rejected, we are not aware of any relevant DRM system available on CRAY; its also not an operating system
- Line 198: Question about POWER, turned out that POWER is a subset of the PPC instruction set architecture, so the current solution is fine
- Section 4.2: Discussion about adding GPU support
- There are no good standards for GPU instruction set architectures, so having abstract GPU type definitions would be hard
- Current DRM system support is also mostly based on targeting some Linux host with specialized resource demand formulations
- This is solved way better with job categories
- Line 246: Comparison of wall clock time definitions in several DRM systems
- Weak agreement of defining it as time in RUNNING state plus time in SUSPENDED state (ok for Condor, Grid Engine)
- Mariusz still tries to find an example were SUSPENDED state is not included
- Final decision next weak,especially if inclusion of SUSPENDED is marked as "MAY" or "MUST"
- Line 249, question by Daniel Katz: Yes, this is a standard feature, e.g. for advance reservation support. Add note in the rationale section.
- Line 272: Remove first sentence, since this violates the "opaque concept" statement in the next sentence.
- Line 277: New proposal by Mariusz - replace "maxWallclockTime" with a generic dictionary for queue attributes
- Would allow to report DRM-specific properties of a queue, in the same opaque sense as the queue name
- Only helpful for portal case, should not be the base for programmatic decisions
- No clear decision, deferred to next week

The next conference call with Skype will take place in one week (Apr 13th, 19:00 UTC)

Best regards,
Peter.
Post by Peter Tröger
Dear all,
Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285
The conference bridge MAY no longer work (Dan ?), in this case, we will organize something based on Skype.
1. Meeting secretary for this meeting?
2. Latest updates from the participants
3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
The attachment draft update already incorporates the comments from Andre Merzcy and Daniel S. Katz. Thanks for their input !
Best regards,
Peter.
<drmaav2_draft2_annotated.pdf>
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110406/5708b2c4/attachment.html
Mariusz Mamoński
2011-04-13 15:59:02 UTC
Permalink
Post by Peter Tröger
Participants: Daniel Gruber, Mariusz Mamonski, Andre Merzcy, Peter Tr?ger.
- Oracle bridge is no longer available for us
- Skype conference call worked fine, we continue like this
- Daniel will check for possibilities with Univa
- If US participants are still missing next week, we will move to a more
Europe-friendly time slot
- Decision to remove last sentence in line 101
- Boolean UNSET mapping should also be part of the language binding
- Example from Andre: Struct might map to dictionary, which can just leave
out keys in case of UNSET
- Discussion about throwing out IRIX / TRUE64, not accepted since this
enumeration was already heavily discussed
- Line 182, add CRAY: rejected, we are not aware of any relevant DRM system
available on CRAY; its also not an operating system
- Line 198: Question about POWER, turned out that POWER is a subset of the
PPC instruction set architecture, so the current solution is fine
- Section 4.2: Discussion about adding GPU support
- There are no good standards for GPU instruction set architectures, so
having abstract GPU type definitions would be hard
- Current DRM system support is also mostly based on targeting some Linux
host with specialized resource demand formulations
- This is solved way better with job categories
- Line 246: Comparison of wall clock time definitions in several DRM systems
- Weak agreement of defining it as time in RUNNING state plus time in
SUSPENDED state (ok for Condor, Grid Engine)
- Mariusz still tries to find an example were SUSPENDED state is not included
found! ;-) Platform LSF. I did the following experiment:

1. submitted job with WALLCLOCK time limit 1 min:

$bsub -W 00:01 sleep 600 # 10 min sleep
Job <114> is submitted to default queue <medium_priority>.
...
jobs get killed while reaching the wallclock time
...

$bjobs -l 114
...
Wed Apr 13 14:56:55: Completed <exit>; TERM_RUNLIMIT: job killed after reaching
LSF run time limit.


2. submitted job with WALLCLOCK time limit 1 min:

$ date
Wed Apr 13 14:32:52 BST 2011
$bsub -W 00:01 sleep 600
Job <113> is submitted to default queue <medium_priority>.

$ bstop 113
Job <113> is being stopped

... after some time...

$ date
Wed Apr 13 14:55:16 BST 2011
$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
113 mpiuser USUSP medium_pri x7500 ex-9-0 sleep 600 Apr 13 14:33

$ bresume 113
Job <113> is being resumed

jobs finished immediately (sleep counts the time when the process was
suspended)

$bjobs -l 113
...
Wed Apr 13 14:33:09: Started on <ex-9-0>, Execution Home </home/mpiuser>, Execu
tion CWD </home/mpiuser>;
Wed Apr 13 14:55:35: Done successfully. The CPU time used is 0.0 seconds.

as you can see job was in SUSPEND + RUNNING state > 12 min >
wallclocktime limit = 1min.
Post by Peter Tröger
- Final decision next weak,especially if inclusion of SUSPENDED is marked as
"MAY" or "MUST"
- Line 249, question by Daniel Katz: Yes, this is a standard feature, e.g.
for advance reservation support. Add note in the rationale section.
- Line 272: Remove first sentence, since this violates the "opaque concept"
statement in the next sentence.
- Line 277: New proposal by Mariusz - replace "maxWallclockTime" with a
generic dictionary for queue attributes
- Would allow to report DRM-specific properties of a queue, in the same
opaque sense as the queue name
- Only helpful for portal case, should not be the base for programmatic decisions
- No clear decision, deferred to next week
The next conference call with Skype will take place in one week (Apr 13th, 19:00 UTC)
Best regards,
Peter.
Dear all,
the next DRMAA conf call is scheduled for Apr 6th, 19:00 UTC.The phone
Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285
The conference bridge MAY no longer work (Dan ?), in this case, we will
organize something based on Skype.
1. Meeting secretary for this meeting?
2.?Latest updates from the participants
3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
The attachment draft update already incorporates the comments from Andre
Merzcy and Daniel S. Katz. Thanks for their input !
Best regards,
Peter.
<drmaav2_draft2_annotated.pdf>
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Mariusz
Daniel Gruber
2011-04-13 17:58:27 UTC
Permalink
Interesting case Mariusz. It could be a LSF bug or an implementation
difficulty (maybe they don't check suspended jobs for limits, because
they do not need resources). It would be clearer if you could construct
a case where the the job has a runtime of N seconds. After starting
it should be suspended immediately then after N seconds it should
be unsuspended. Now when the job resumes the question is if it
is running another N seconds or will it be deleted immediately.
Taking the sleep binary itself could be also problematic since AFAIK
it sets a timer and suspends itself.

Cheers,

Daniel
Post by Mariusz Mamoński
Post by Peter Tröger
Participants: Daniel Gruber, Mariusz Mamonski, Andre Merzcy, Peter Tr?ger.
- Oracle bridge is no longer available for us
- Skype conference call worked fine, we continue like this
- Daniel will check for possibilities with Univa
- If US participants are still missing next week, we will move to a more
Europe-friendly time slot
- Decision to remove last sentence in line 101
- Boolean UNSET mapping should also be part of the language binding
- Example from Andre: Struct might map to dictionary, which can just leave
out keys in case of UNSET
- Discussion about throwing out IRIX / TRUE64, not accepted since this
enumeration was already heavily discussed
- Line 182, add CRAY: rejected, we are not aware of any relevant DRM system
available on CRAY; its also not an operating system
- Line 198: Question about POWER, turned out that POWER is a subset of the
PPC instruction set architecture, so the current solution is fine
- Section 4.2: Discussion about adding GPU support
- There are no good standards for GPU instruction set architectures, so
having abstract GPU type definitions would be hard
- Current DRM system support is also mostly based on targeting some Linux
host with specialized resource demand formulations
- This is solved way better with job categories
- Line 246: Comparison of wall clock time definitions in several DRM systems
- Weak agreement of defining it as time in RUNNING state plus time in
SUSPENDED state (ok for Condor, Grid Engine)
- Mariusz still tries to find an example were SUSPENDED state is not included
$bsub -W 00:01 sleep 600 # 10 min sleep
Job <114> is submitted to default queue <medium_priority>.
...
jobs get killed while reaching the wallclock time
...
$bjobs -l 114
...
Wed Apr 13 14:56:55: Completed <exit>; TERM_RUNLIMIT: job killed after reaching
LSF run time limit.
$ date
Wed Apr 13 14:32:52 BST 2011
$bsub -W 00:01 sleep 600
Job <113> is submitted to default queue <medium_priority>.
$ bstop 113
Job <113> is being stopped
... after some time...
$ date
Wed Apr 13 14:55:16 BST 2011
$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
113 mpiuser USUSP medium_pri x7500 ex-9-0 sleep 600 Apr 13 14:33
$ bresume 113
Job <113> is being resumed
jobs finished immediately (sleep counts the time when the process was
suspended)
$bjobs -l 113
...
Wed Apr 13 14:33:09: Started on <ex-9-0>, Execution Home </home/mpiuser>, Execu
tion CWD </home/mpiuser>;
Wed Apr 13 14:55:35: Done successfully. The CPU time used is 0.0 seconds.
as you can see job was in SUSPEND + RUNNING state > 12 min >
wallclocktime limit = 1min.
Post by Peter Tröger
- Final decision next weak,especially if inclusion of SUSPENDED is marked as
"MAY" or "MUST"
- Line 249, question by Daniel Katz: Yes, this is a standard feature, e.g.
for advance reservation support. Add note in the rationale section.
- Line 272: Remove first sentence, since this violates the "opaque concept"
statement in the next sentence.
- Line 277: New proposal by Mariusz - replace "maxWallclockTime" with a
generic dictionary for queue attributes
- Would allow to report DRM-specific properties of a queue, in the same
opaque sense as the queue name
- Only helpful for portal case, should not be the base for programmatic decisions
- No clear decision, deferred to next week
The next conference call with Skype will take place in one week (Apr 13th, 19:00 UTC)
Best regards,
Peter.
Dear all,
the next DRMAA conf call is scheduled for Apr 6th, 19:00 UTC.The phone
Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285
The conference bridge MAY no longer work (Dan ?), in this case, we will
organize something based on Skype.
1. Meeting secretary for this meeting?
2. Latest updates from the participants
3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
The attachment draft update already incorporates the comments from Andre
Merzcy and Daniel S. Katz. Thanks for their input !
Best regards,
Peter.
<drmaav2_draft2_annotated.pdf>
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Mariusz
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------


Notice from Univa Postmaster:


This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.



---------------------------------------------------------------------
Thijs Metsch
2011-04-14 07:59:00 UTC
Permalink
Only read this with one eye - but this isn't a bug in LSF or anything...You should use anything different then the sleep command to test this...

Depending on the implementation sleep cannot really be suspended. Most implementations - my guess at least - will calculate an wakeup time and block the queue (OS level) as long as that time isn't over - it's not really a while loop spinning :-)

-Thijs


-----Original Message-----
From: drmaa-wg-bounces at ogf.org on behalf of Daniel Gruber
Sent: Wed 13.04.2011 19:58
To: Mariusz Mamonski
Cc: drmaa-wg at ogf.org
Subject: Re: [DRMAA-WG] Meeting Minutes - Conference call - Apr 6th - 19:00UTC

Interesting case Mariusz. It could be a LSF bug or an implementation
difficulty (maybe they don't check suspended jobs for limits, because
they do not need resources). It would be clearer if you could construct
a case where the the job has a runtime of N seconds. After starting
it should be suspended immediately then after N seconds it should
be unsuspended. Now when the job resumes the question is if it
is running another N seconds or will it be deleted immediately.
Taking the sleep binary itself could be also problematic since AFAIK
it sets a timer and suspends itself.

Cheers,

Daniel
Post by Mariusz Mamoński
Post by Peter Tröger
Participants: Daniel Gruber, Mariusz Mamonski, Andre Merzcy, Peter Tr?ger.
- Oracle bridge is no longer available for us
- Skype conference call worked fine, we continue like this
- Daniel will check for possibilities with Univa
- If US participants are still missing next week, we will move to a more
Europe-friendly time slot
- Decision to remove last sentence in line 101
- Boolean UNSET mapping should also be part of the language binding
- Example from Andre: Struct might map to dictionary, which can just leave
out keys in case of UNSET
- Discussion about throwing out IRIX / TRUE64, not accepted since this
enumeration was already heavily discussed
- Line 182, add CRAY: rejected, we are not aware of any relevant DRM system
available on CRAY; its also not an operating system
- Line 198: Question about POWER, turned out that POWER is a subset of the
PPC instruction set architecture, so the current solution is fine
- Section 4.2: Discussion about adding GPU support
- There are no good standards for GPU instruction set architectures, so
having abstract GPU type definitions would be hard
- Current DRM system support is also mostly based on targeting some Linux
host with specialized resource demand formulations
- This is solved way better with job categories
- Line 246: Comparison of wall clock time definitions in several DRM systems
- Weak agreement of defining it as time in RUNNING state plus time in
SUSPENDED state (ok for Condor, Grid Engine)
- Mariusz still tries to find an example were SUSPENDED state is not included
$bsub -W 00:01 sleep 600 # 10 min sleep
Job <114> is submitted to default queue <medium_priority>.
...
jobs get killed while reaching the wallclock time
...
$bjobs -l 114
...
Wed Apr 13 14:56:55: Completed <exit>; TERM_RUNLIMIT: job killed after reaching
LSF run time limit.
$ date
Wed Apr 13 14:32:52 BST 2011
$bsub -W 00:01 sleep 600
Job <113> is submitted to default queue <medium_priority>.
$ bstop 113
Job <113> is being stopped
... after some time...
$ date
Wed Apr 13 14:55:16 BST 2011
$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
113 mpiuser USUSP medium_pri x7500 ex-9-0 sleep 600 Apr 13 14:33
$ bresume 113
Job <113> is being resumed
jobs finished immediately (sleep counts the time when the process was
suspended)
$bjobs -l 113
...
Wed Apr 13 14:33:09: Started on <ex-9-0>, Execution Home </home/mpiuser>, Execu
tion CWD </home/mpiuser>;
Wed Apr 13 14:55:35: Done successfully. The CPU time used is 0.0 seconds.
as you can see job was in SUSPEND + RUNNING state > 12 min >
wallclocktime limit = 1min.
Post by Peter Tröger
- Final decision next weak,especially if inclusion of SUSPENDED is marked as
"MAY" or "MUST"
- Line 249, question by Daniel Katz: Yes, this is a standard feature, e.g.
for advance reservation support. Add note in the rationale section.
- Line 272: Remove first sentence, since this violates the "opaque concept"
statement in the next sentence.
- Line 277: New proposal by Mariusz - replace "maxWallclockTime" with a
generic dictionary for queue attributes
- Would allow to report DRM-specific properties of a queue, in the same
opaque sense as the queue name
- Only helpful for portal case, should not be the base for programmatic decisions
- No clear decision, deferred to next week
The next conference call with Skype will take place in one week (Apr 13th, 19:00 UTC)
Best regards,
Peter.
Dear all,
the next DRMAA conf call is scheduled for Apr 6th, 19:00 UTC.The phone
Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285
The conference bridge MAY no longer work (Dan ?), in this case, we will
organize something based on Skype.
1. Meeting secretary for this meeting?
2. Latest updates from the participants
3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
The attachment draft update already incorporates the comments from Andre
Merzcy and Daniel S. Katz. Thanks for their input !
Best regards,
Peter.
<drmaav2_draft2_annotated.pdf>
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Mariusz
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------


Notice from Univa Postmaster:


This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.



---------------------------------------------------------------------

--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110414/84f463d3/attachment.html
Daniel Gruber
2011-04-14 08:13:00 UTC
Permalink
Hi Thijs,

thanks for your insight :) The main conclusion was that in LSF the wallclock time does
not include the time the job is in suspended state unlike other DRMS like Condor
or Univa Grid Engine. This is going to be reflected in the standard.

Cheers,

Daniel
Post by Thijs Metsch
Only read this with one eye - but this isn't a bug in LSF or anything...You should use anything different then the sleep command to test this...
Depending on the implementation sleep cannot really be suspended. Most implementations - my guess at least - will calculate an wakeup time and block the queue (OS level) as long as that time isn't over - it's not really a while loop spinning :-)
-Thijs
-----Original Message-----
From: drmaa-wg-bounces at ogf.org on behalf of Daniel Gruber
Sent: Wed 13.04.2011 19:58
To: Mariusz Mamonski
Cc: drmaa-wg at ogf.org
Subject: Re: [DRMAA-WG] Meeting Minutes - Conference call - Apr 6th - 19:00UTC
Interesting case Mariusz. It could be a LSF bug or an implementation
difficulty (maybe they don't check suspended jobs for limits, because
they do not need resources). It would be clearer if you could construct
a case where the the job has a runtime of N seconds. After starting
it should be suspended immediately then after N seconds it should
be unsuspended. Now when the job resumes the question is if it
is running another N seconds or will it be deleted immediately.
Taking the sleep binary itself could be also problematic since AFAIK
it sets a timer and suspends itself.
Cheers,
Daniel
Post by Mariusz Mamoński
Post by Peter Tröger
Participants: Daniel Gruber, Mariusz Mamonski, Andre Merzcy, Peter Tr?ger.
- Oracle bridge is no longer available for us
- Skype conference call worked fine, we continue like this
- Daniel will check for possibilities with Univa
- If US participants are still missing next week, we will move to a more
Europe-friendly time slot
- Decision to remove last sentence in line 101
- Boolean UNSET mapping should also be part of the language binding
- Example from Andre: Struct might map to dictionary, which can just leave
out keys in case of UNSET
- Discussion about throwing out IRIX / TRUE64, not accepted since this
enumeration was already heavily discussed
- Line 182, add CRAY: rejected, we are not aware of any relevant DRM system
available on CRAY; its also not an operating system
- Line 198: Question about POWER, turned out that POWER is a subset of the
PPC instruction set architecture, so the current solution is fine
- Section 4.2: Discussion about adding GPU support
- There are no good standards for GPU instruction set architectures, so
having abstract GPU type definitions would be hard
- Current DRM system support is also mostly based on targeting some Linux
host with specialized resource demand formulations
- This is solved way better with job categories
- Line 246: Comparison of wall clock time definitions in several DRM systems
- Weak agreement of defining it as time in RUNNING state plus time in
SUSPENDED state (ok for Condor, Grid Engine)
- Mariusz still tries to find an example were SUSPENDED state is not included
$bsub -W 00:01 sleep 600 # 10 min sleep
Job <114> is submitted to default queue <medium_priority>.
...
jobs get killed while reaching the wallclock time
...
$bjobs -l 114
...
Wed Apr 13 14:56:55: Completed <exit>; TERM_RUNLIMIT: job killed after reaching
LSF run time limit.
$ date
Wed Apr 13 14:32:52 BST 2011
$bsub -W 00:01 sleep 600
Job <113> is submitted to default queue <medium_priority>.
$ bstop 113
Job <113> is being stopped
... after some time...
$ date
Wed Apr 13 14:55:16 BST 2011
$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
113 mpiuser USUSP medium_pri x7500 ex-9-0 sleep 600 Apr 13 14:33
$ bresume 113
Job <113> is being resumed
jobs finished immediately (sleep counts the time when the process was
suspended)
$bjobs -l 113
...
Wed Apr 13 14:33:09: Started on <ex-9-0>, Execution Home </home/mpiuser>, Execu
tion CWD </home/mpiuser>;
Wed Apr 13 14:55:35: Done successfully. The CPU time used is 0.0 seconds.
as you can see job was in SUSPEND + RUNNING state > 12 min >
wallclocktime limit = 1min.
Post by Peter Tröger
- Final decision next weak,especially if inclusion of SUSPENDED is marked as
"MAY" or "MUST"
- Line 249, question by Daniel Katz: Yes, this is a standard feature, e.g.
for advance reservation support. Add note in the rationale section.
- Line 272: Remove first sentence, since this violates the "opaque concept"
statement in the next sentence.
- Line 277: New proposal by Mariusz - replace "maxWallclockTime" with a
generic dictionary for queue attributes
- Would allow to report DRM-specific properties of a queue, in the same
opaque sense as the queue name
- Only helpful for portal case, should not be the base for programmatic decisions
- No clear decision, deferred to next week
The next conference call with Skype will take place in one week (Apr 13th, 19:00 UTC)
Best regards,
Peter.
Dear all,
the next DRMAA conf call is scheduled for Apr 6th, 19:00 UTC.The phone
Phone number (toll-free from US): +001-866-545-5227
Access code: 5988285
The conference bridge MAY no longer work (Dan ?), in this case, we will
organize something based on Skype.
1. Meeting secretary for this meeting?
2. Latest updates from the participants
3. Solving the remaining issues in DRMAAv2 Draft 2 (see attachment)
The attachment draft update already incorporates the comments from Andre
Merzcy and Daniel S. Katz. Thanks for their input !
Best regards,
Peter.
<drmaav2_draft2_annotated.pdf>
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Mariusz
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.
---------------------------------------------------------------------
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
---------------------------------------------------------------------


Notice from Univa Postmaster:


This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. This message has been content scanned by the Univa Mail system.



---------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110414/38322fe7/attachment-0001.html
Loading...