Peter Tröger
2010-03-25 23:44:08 UTC
Dear all,
this week, I had a DRMAAv2 presentation at the conference of the
German grid initiative (D-Grid). Even though it was the last session
on the last day, attendance was pretty good. I got some interesting
remarks I wanted to share:
- Typical D-Grid installations have PBS or SGE, sometimes Torque. No
Condor. LSF is on the agenda.
- With the ability to check for core dump file existence in JobInfo,
they wondered if DRMAA could also offer to actually get this file.
- One user community in D-Grid typically has "pre-jobs" that prepare a
node for the real work with some software installation. DRMAAv2 with
it's waitAnyTerminated() looked good enough for them.
- One request from the audience was automated re-queueing - if a job
goes to Failed state, it should be re-queued automatically. This is a
typical massive scale cluster resp. grid problem, were machines
outages are normal. Condor (of course) has that, I am not sure about
the others.
- Another commonly agreed request was intermediate result preview. The
problem is that some simulations run for hours, and you want to know
pretty early if it is worthwhile to complete the run. LSF has a
feature were you can look on job's stdout while it runs, even with non-
interactive jobs. I don't know about other systems.
- One SLA expert in the auditorium was happy about the startTime /
endTime / duration approach in the AR template. He called that
"relaxed reservation".
- Another guest recommended GLUE2 as input for our monitoring
attributes. It's like JSDL and DCIM - everything optional, but maybe
good for semantics.
- It was requested that we check the monitoring attributes against
Globus MDS and Unicore TSI.
I was also asked about the time frame for DRMAAv2 implementations -
really. Not only the D-Grid audience seems to be highly interested in
using DRMAAv2, I got the same kind of feedback also at OGF28. I hope
this is enough motivation for everybody in the upcoming finalization
phase ...
Slides are attached, feel free to re-use them.
Best,
Peter.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dgrid-dresden.pdf
Type: application/pdf
Size: 1386166 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/drmaa-wg/attachments/20100326/87c7c144/attachment-0001.pdf
-------------- next part --------------
this week, I had a DRMAAv2 presentation at the conference of the
German grid initiative (D-Grid). Even though it was the last session
on the last day, attendance was pretty good. I got some interesting
remarks I wanted to share:
- Typical D-Grid installations have PBS or SGE, sometimes Torque. No
Condor. LSF is on the agenda.
- With the ability to check for core dump file existence in JobInfo,
they wondered if DRMAA could also offer to actually get this file.
- One user community in D-Grid typically has "pre-jobs" that prepare a
node for the real work with some software installation. DRMAAv2 with
it's waitAnyTerminated() looked good enough for them.
- One request from the audience was automated re-queueing - if a job
goes to Failed state, it should be re-queued automatically. This is a
typical massive scale cluster resp. grid problem, were machines
outages are normal. Condor (of course) has that, I am not sure about
the others.
- Another commonly agreed request was intermediate result preview. The
problem is that some simulations run for hours, and you want to know
pretty early if it is worthwhile to complete the run. LSF has a
feature were you can look on job's stdout while it runs, even with non-
interactive jobs. I don't know about other systems.
- One SLA expert in the auditorium was happy about the startTime /
endTime / duration approach in the AR template. He called that
"relaxed reservation".
- Another guest recommended GLUE2 as input for our monitoring
attributes. It's like JSDL and DCIM - everything optional, but maybe
good for semantics.
- It was requested that we check the monitoring attributes against
Globus MDS and Unicore TSI.
I was also asked about the time frame for DRMAAv2 implementations -
really. Not only the D-Grid audience seems to be highly interested in
using DRMAAv2, I got the same kind of feedback also at OGF28. I hope
this is enough motivation for everybody in the upcoming finalization
phase ...
Slides are attached, feel free to re-use them.
Best,
Peter.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dgrid-dresden.pdf
Type: application/pdf
Size: 1386166 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/drmaa-wg/attachments/20100326/87c7c144/attachment-0001.pdf
-------------- next part --------------