Nadav Brandes
2011-04-29 11:10:39 UTC
Hi guys,
My team and I have finished going over the latest draft of DRMAA2, and we
have some comments, suggestions and questions about it.
We want to hear your opinion about these issues.
1. Given a *jobId*, you can easily get its *Job* object using the method
*JobSession::getJobs(in JobInfo filter)*, if you give has as a filter a *
JobInfo* with the wanted *jobId* (maybe it would be an easier shorthand
if DRMAA had a method *JobSession::getJob(string jobId)*, but this is a
different issue). *But*, given a *jobArrayId*, there is no way to get its
*JobArray* object, which is a great limit of DRMAA that doesn't really
let users to use the *JobArray* feature in DRMAA as it is used in most
batch systems. I think that there should be added a similar method
*JobSession::getJobArrays(in
JobArrayInfo filter)*, or at least a method *JobSession::getJobArray(string
jobArrayId)*.
2. A very important feature that many batch systems support is the
ability to limit the number of jobs in a job array that may run
simultaneously (in LSF it's called "Slot Limit" and you can read about it at
http://www-cecpv.u-strasbg.fr/Documentations/lsf
/html/lsf6.1_admin/G_jobarrays.html#26618). I think that DRMAA can also
support this feature by:
1. Change the method *JobSession::runBulkJobs* so it will also accept
an optional argument *in long slotLimit* (if it's *UNSET* then no slot
limit will be assigned to the new job array).
2. Add a new method *JobArray::changeSlotLimit(in long slotLimit)*
3. There are some parameters that most batch systems allow changing for
already submitted jobs, but DRMAA doesn't support changing them. For
example, DRMAA doesn't let you change the priority or queue of an already
submitted jobs. I think that methods *Job::changePriority(in long
priority) *and *Job::changeQueue(in string queueName)* should be added.
4. Many batch systems allow rerunning existing jobs. Although DRMAA has a
field called *rerunnable* in the *JobTemplate* struct, it doesn't allow
users to actually rerun jobs. Maybe a method *Job::rerun()* could be
added to DRMAA.
5. I have a question. Does DRMAA support Generic Resources? (for example,
if I have a cluster where some of its nodes have GPU cards, and I want to
submit jobs that require a certain amount of GPUs, so I would like the batch
system to manage it for me, as many batch systems know how to manage).
Thank you for reading all of this. I would very like to hear what you think
about each of the bullets above.
Regards,
Nadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110429/a8c80193/attachment.html
My team and I have finished going over the latest draft of DRMAA2, and we
have some comments, suggestions and questions about it.
We want to hear your opinion about these issues.
1. Given a *jobId*, you can easily get its *Job* object using the method
*JobSession::getJobs(in JobInfo filter)*, if you give has as a filter a *
JobInfo* with the wanted *jobId* (maybe it would be an easier shorthand
if DRMAA had a method *JobSession::getJob(string jobId)*, but this is a
different issue). *But*, given a *jobArrayId*, there is no way to get its
*JobArray* object, which is a great limit of DRMAA that doesn't really
let users to use the *JobArray* feature in DRMAA as it is used in most
batch systems. I think that there should be added a similar method
*JobSession::getJobArrays(in
JobArrayInfo filter)*, or at least a method *JobSession::getJobArray(string
jobArrayId)*.
2. A very important feature that many batch systems support is the
ability to limit the number of jobs in a job array that may run
simultaneously (in LSF it's called "Slot Limit" and you can read about it at
http://www-cecpv.u-strasbg.fr/Documentations/lsf
/html/lsf6.1_admin/G_jobarrays.html#26618). I think that DRMAA can also
support this feature by:
1. Change the method *JobSession::runBulkJobs* so it will also accept
an optional argument *in long slotLimit* (if it's *UNSET* then no slot
limit will be assigned to the new job array).
2. Add a new method *JobArray::changeSlotLimit(in long slotLimit)*
3. There are some parameters that most batch systems allow changing for
already submitted jobs, but DRMAA doesn't support changing them. For
example, DRMAA doesn't let you change the priority or queue of an already
submitted jobs. I think that methods *Job::changePriority(in long
priority) *and *Job::changeQueue(in string queueName)* should be added.
4. Many batch systems allow rerunning existing jobs. Although DRMAA has a
field called *rerunnable* in the *JobTemplate* struct, it doesn't allow
users to actually rerun jobs. Maybe a method *Job::rerun()* could be
added to DRMAA.
5. I have a question. Does DRMAA support Generic Resources? (for example,
if I have a cluster where some of its nodes have GPU cards, and I want to
submit jobs that require a certain amount of GPUs, so I would like the batch
system to manage it for me, as many batch systems know how to manage).
Thank you for reading all of this. I would very like to hear what you think
about each of the bullets above.
Regards,
Nadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110429/a8c80193/attachment.html