Discussion:
[DRMAA-WG] About obtaining the machines names in a parallel job
Yves Caniou
2010-03-18 09:11:25 UTC
Permalink
Dear All,

Discussions yesterday were great.
I understand why you don't want to put a mean to get the hostnamesfile for an
MPI code, since it's should be transparently done in the configName (correct
name if my rememberings are well).

But I thought of a different use case: a code is just launched on all
machines. This code is a socket based one, thus it needs to know the other
machine names to be able to run correctly.
Of course, this could be bypassed with the use of an external machine where a
daemon runs, and where running codes can register -- I think of it like an
omniNames running for example. Another solution is to encapsulate
applications in an MPI code just to, maybe, have that information.

But don't you think that the cost is very big (if possible: a lot of policy is
to not let run user code on the frontal, and a machine only knows that itself
is taking part to the parallel run) compared to the possibility to at least
having the possibility to copy the file containing the hostnames to all
reserved nodes?

Bon courage for the discussions today!
Cheers.

.Yves.
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
Peter Tröger
2010-03-24 12:54:40 UTC
Permalink
Hi Yves,

thanks for a good discussion in Munich, I hope we can rely on your
user perspective also in the future.
Post by Yves Caniou
I understand why you don't want to put a mean to get the
hostnamesfile for an
MPI code, since it's should be transparently done in the configName (correct
name if my rememberings are well).
But I thought of a different use case: a code is just launched on all
machines. This code is a socket based one, thus it needs to know the other
machine names to be able to run correctly.
Of course, this could be bypassed with the use of an external
machine where a
daemon runs, and where running codes can register -- I think of it like an
omniNames running for example. Another solution is to encapsulate
applications in an MPI code just to, maybe, have that information.
For me, it sounds like getting the information about allocated
machines (for a job) on each of the execution hosts. I wonder if this
information is provided by the different DRM systems. Does that depend
on the parallelization technology, such as the chosen MPI library ?

Best,
Peter.
Post by Yves Caniou
But don't you think that the cost is very big (if possible: a lot of policy is
to not let run user code on the frontal, and a machine only knows that itself
is taking part to the parallel run) compared to the possibility to at least
having the possibility to copy the file containing the hostnames to all
reserved nodes?
Bon courage for the discussions today!
Cheers.
.Yves.
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
Daniel Templeton
2010-03-24 13:01:54 UTC
Permalink
The way SGE (and I think LSF) handles parallel jobs is that there is
always a master/slave concept. The DRM system allocates the nodes,
starts the master task, and tells it where all the slaves are. The
master task is then responsible for starting the slave tasks, usually
via the DRM.

Maybe I'm missing some context, but this conversation sounds *way*
outside of the context of DRMAA to me. DRMAA has nothing to do with how
a job is launched. DRMAA is purely on the job management side:
submission, monitoring, and control.

Daniel
Post by Peter Tröger
Hi Yves,
thanks for a good discussion in Munich, I hope we can rely on your
user perspective also in the future.
Post by Yves Caniou
I understand why you don't want to put a mean to get the
hostnamesfile for an
MPI code, since it's should be transparently done in the configName (correct
name if my rememberings are well).
But I thought of a different use case: a code is just launched on all
machines. This code is a socket based one, thus it needs to know the other
machine names to be able to run correctly.
Of course, this could be bypassed with the use of an external
machine where a
daemon runs, and where running codes can register -- I think of it like an
omniNames running for example. Another solution is to encapsulate
applications in an MPI code just to, maybe, have that information.
For me, it sounds like getting the information about allocated
machines (for a job) on each of the execution hosts. I wonder if this
information is provided by the different DRM systems. Does that depend
on the parallelization technology, such as the chosen MPI library ?
Best,
Peter.
Post by Yves Caniou
But don't you think that the cost is very big (if possible: a lot of policy is
to not let run user code on the frontal, and a machine only knows that itself
is taking part to the parallel run) compared to the possibility to at least
having the possibility to copy the file containing the hostnames to all
reserved nodes?
Bon courage for the discussions today!
Cheers.
.Yves.
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
Yves Caniou
2010-03-25 04:35:54 UTC
Permalink
Hi,

The fact that the master task starts the slaves relying on the DRM may not be
the most frequent case. Furthermore, even in the paradigm master/slave, the
master has to know the name of the slaves, that' where Daniel's line "tells
it where all the slaves are" is really important for me: at least one node
should have the possibility to know the name of resources involved in the
reservation. As we discuss during the OGF session, generally the identity of
the nodes is stored in a file whose filename depends on the deployed DRM.
What I suggest is at least one of the two things:
- the possibility for at least one node to know the identity of the other, by
using a meta-DRM-DRMAA name for example.
- the possibility to copy the file to all nodes as a user request in the
prologue (should be possible since the master knows the file anyway).

My preference goes naturally to the second, since the user doesn't have to
care to distribute the information if needed (which could force him to pack
his application in a false MPI program only to dispatch the information, or
fork a "scp"-not-better-thing...)

Peter, I've also seen (at least!) something that was really interesting in
your report, concerning the two classes of parallel job support. Does this
mean that people involved in DRMAA consider the possibility to submit not
only command line programs but script as well?

Cheers.

.Yves.
Post by Daniel Templeton
The way SGE (and I think LSF) handles parallel jobs is that there is
always a master/slave concept. The DRM system allocates the nodes,
starts the master task, and tells it where all the slaves are. The
master task is then responsible for starting the slave tasks, usually
via the DRM.
Maybe I'm missing some context, but this conversation sounds *way*
outside of the context of DRMAA to me. DRMAA has nothing to do with how
submission, monitoring, and control.
Daniel
Post by Peter Tröger
Hi Yves,
thanks for a good discussion in Munich, I hope we can rely on your
user perspective also in the future.
Post by Yves Caniou
I understand why you don't want to put a mean to get the
hostnamesfile for an
MPI code, since it's should be transparently done in the configName (correct
name if my rememberings are well).
But I thought of a different use case: a code is just launched on all
machines. This code is a socket based one, thus it needs to know the other
machine names to be able to run correctly.
Of course, this could be bypassed with the use of an external machine where a
daemon runs, and where running codes can register -- I think of it like an
omniNames running for example. Another solution is to encapsulate
applications in an MPI code just to, maybe, have that information.
For me, it sounds like getting the information about allocated
machines (for a job) on each of the execution hosts. I wonder if this
information is provided by the different DRM systems. Does that depend
on the parallelization technology, such as the chosen MPI library ?
Best,
Peter.
Post by Yves Caniou
But don't you think that the cost is very big (if possible: a lot of policy is
to not let run user code on the frontal, and a machine only knows that itself
is taking part to the parallel run) compared to the possibility to at least
having the possibility to copy the file containing the hostnames to all
reserved nodes?
Bon courage for the discussions today!
Cheers.
.Yves.
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
Peter Tröger
2010-03-29 10:01:59 UTC
Permalink
We had that discussion already at OGF. We ended up in not making any assumption about how parallel jobs are instantiated - this is decided by the submitted application, which might be a shell script

If we would we assume some execution model, such as MPI-alike master slave process spawning, we end up in loosing other parallel application classes. The configurationName attribute is intend to abstract all these assumptions on the execution host runtime environment. Setting the right configuration name would enable the spawning procedure the user wants to use.

I started a first list, in order to make the discussion a little bit more explicit:

http://wikis.sun.com/display/DRMAAv2/Suggested+Configuration+Names

Machine list availability could be promised by one / some of the listed configuration variants.

Best,
Peter.
Post by Yves Caniou
Hi,
The fact that the master task starts the slaves relying on the DRM may not be
the most frequent case. Furthermore, even in the paradigm master/slave, the
master has to know the name of the slaves, that' where Daniel's line "tells
it where all the slaves are" is really important for me: at least one node
should have the possibility to know the name of resources involved in the
reservation. As we discuss during the OGF session, generally the identity of
the nodes is stored in a file whose filename depends on the deployed DRM.
- the possibility for at least one node to know the identity of the other, by
using a meta-DRM-DRMAA name for example.
- the possibility to copy the file to all nodes as a user request in the
prologue (should be possible since the master knows the file anyway).
My preference goes naturally to the second, since the user doesn't have to
care to distribute the information if needed (which could force him to pack
his application in a false MPI program only to dispatch the information, or
fork a "scp"-not-better-thing...)
Peter, I've also seen (at least!) something that was really interesting in
your report, concerning the two classes of parallel job support. Does this
mean that people involved in DRMAA consider the possibility to submit not
only command line programs but script as well?
Cheers.
.Yves.
Post by Daniel Templeton
The way SGE (and I think LSF) handles parallel jobs is that there is
always a master/slave concept. The DRM system allocates the nodes,
starts the master task, and tells it where all the slaves are. The
master task is then responsible for starting the slave tasks, usually
via the DRM.
Maybe I'm missing some context, but this conversation sounds *way*
outside of the context of DRMAA to me. DRMAA has nothing to do with how
submission, monitoring, and control.
Daniel
Post by Peter Tröger
Hi Yves,
thanks for a good discussion in Munich, I hope we can rely on your
user perspective also in the future.
Post by Yves Caniou
I understand why you don't want to put a mean to get the
hostnamesfile for an
MPI code, since it's should be transparently done in the configName (correct
name if my rememberings are well).
But I thought of a different use case: a code is just launched on all
machines. This code is a socket based one, thus it needs to know the other
machine names to be able to run correctly.
Of course, this could be bypassed with the use of an external machine where a
daemon runs, and where running codes can register -- I think of it like an
omniNames running for example. Another solution is to encapsulate
applications in an MPI code just to, maybe, have that information.
For me, it sounds like getting the information about allocated
machines (for a job) on each of the execution hosts. I wonder if this
information is provided by the different DRM systems. Does that depend
on the parallelization technology, such as the chosen MPI library ?
Best,
Peter.
Post by Yves Caniou
But don't you think that the cost is very big (if possible: a lot of policy is
to not let run user code on the frontal, and a machine only knows that itself
is taking part to the parallel run) compared to the possibility to at least
having the possibility to copy the file containing the hostnames to all
reserved nodes?
Bon courage for the discussions today!
Cheers.
.Yves.
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
--
drmaa-wg mailing list
drmaa-wg at ogf.org
http://www.ogf.org/mailman/listinfo/drmaa-wg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2208 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/drmaa-wg/attachments/20100329/ef9cf1ac/attachment.bin
Mariusz Mamoński
2010-03-31 19:00:32 UTC
Permalink
some late comments on this thread (really sorry for redundancy):

1. While discussing about support for parallel application in DRMAA on
the last day, we decide to propose a general ConfigurationName
(JobCategory) for non-standard parallel environment called something
like "SelfManaged"

2. "it needs to know the other machine names to be able to run
correctly." - as Dan said this functionality is already provided by
the DRMS (usually via environment variables PBS_NODEFILE in torque,
LSB_DJOB_HOSTFILE in LSF, PE_HOSTFILE in SGE). The problem may be that
this "API" is of course not standarized...

3. There was suggestion to use similar trick as with BULK_JOB_INDEX
(i.e. provide the name of the DRMS specific variable in DRMAA), but
the syntax of machinefiles may also differ among DRMS.

4. One can imagine that in DRMAA (maybe not 2.0 but 3.0 ;-) there
would be an API supporting spawning parallel applications (only within
a parallel job!), this could like:

sequence<string> getAllocatedMachines()

spawnProcess(machineName, executable, args, env,)

i know that this would make DRMAA little heavier, but i guess *MPI*
people could be interested in such standardized interface, as by know
they usually have to write a separate "driver" for each DRMS system.

some contr-argument to this API could be: does your parallel
application can be moved to different environment with
zero-configuration cost?

Cheers,
Post by Peter Tröger
We had that discussion already at OGF. We ended up in not making any assumption about how parallel jobs are instantiated - this is decided by the submitted application, which might be a shell script
If we would we assume some execution model, such as MPI-alike master slave process spawning, we end up in loosing other parallel application classes. The configurationName attribute is intend to abstract all these assumptions on the execution host runtime environment. Setting the right configuration name would enable the spawning procedure the user wants to use.
http://wikis.sun.com/display/DRMAAv2/Suggested+Configuration+Names
Machine list availability could be promised by one / some of the listed configuration variants.
Best,
Peter.
Post by Yves Caniou
Hi,
The fact that the master task starts the slaves relying on the DRM may not be
the most frequent case. Furthermore, even in the paradigm master/slave, the
master has to know the name of the slaves, that' where Daniel's line "tells
it where all the slaves are" is really important for me: at least one node
should have the possibility to know the name of resources involved in the
reservation. As we discuss during the OGF session, generally the identity of
the nodes is stored in a file whose filename depends on the deployed DRM.
- the possibility for at least one node to know the identity of the other, by
using a meta-DRM-DRMAA name for example.
- the possibility to copy the file to all nodes as a user request in the
prologue (should be possible since the master knows the file anyway).
My preference goes naturally to the second, since the user doesn't have to
care to distribute the information if needed (which could force him to pack
his application in a false MPI program only to dispatch the information, or
fork a "scp"-not-better-thing...)
Peter, I've also seen (at least!) something that was really interesting in
your report, concerning the two classes of parallel job support. Does this
mean that people involved in DRMAA consider the possibility to submit not
only command line programs but script as well?
Cheers.
.Yves.
Post by Daniel Templeton
The way SGE (and I think LSF) handles parallel jobs is that there is
always a master/slave concept. ?The DRM system allocates the nodes,
starts the master task, and tells it where all the slaves are. ?The
master task is then responsible for starting the slave tasks, usually
via the DRM.
Maybe I'm missing some context, but this conversation sounds *way*
outside of the context of DRMAA to me. ?DRMAA has nothing to do with how
submission, monitoring, and control.
Daniel
Post by Peter Tröger
Hi Yves,
thanks for a good discussion in Munich, I hope we can rely on your
user perspective also in the future.
Post by Yves Caniou
I understand why you don't want to put a mean to get the
hostnamesfile for an
MPI code, since it's should be transparently done in the configName (correct
name if my rememberings are well).
But I thought of a different use case: a code is just launched on all
machines. This code is a socket based one, thus it needs to know the other
machine names to be able to run correctly.
Of course, this could be bypassed with the use of an external machine where a
daemon runs, and where running codes can register -- I think of it like an
omniNames running for example. Another solution is to encapsulate
applications in an MPI code just to, maybe, have that information.
For me, it sounds like getting the information about allocated
machines (for a job) on each of the execution hosts. I wonder if this
information is provided by the different DRM systems. Does that depend
on the parallelization technology, such as the chosen MPI library ?
Best,
Peter.
Post by Yves Caniou
But don't you think that the cost is very big (if possible: a lot of policy is
to not let run user code on the frontal, and a machine only knows that itself
is taking part to the parallel run) compared to the possibility to at least
having the possibility to copy the file containing the hostnames to all
reserved nodes?
Bon courage for the discussions today!
Cheers.
.Yves.
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
?* in Information Technology Center, The University of Tokyo,
? ?2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
? ?tel: +81-3-5841-0540
?* in National Institute of Informatics
? ?2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
? ?tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
--
? drmaa-wg mailing list
? drmaa-wg at ogf.org
? http://www.ogf.org/mailman/listinfo/drmaa-wg
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Yves Caniou
Associate Professor at Universit? Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
D?l?gation CNRS in Japan French Laboratory of Informatics (JFLI),
?* in Information Technology Center, The University of Tokyo,
? ?2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
? ?tel: +81-3-5841-0540
?* in National Institute of Informatics
? ?2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
? ?tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
--
?drmaa-wg mailing list
?drmaa-wg at ogf.org
?http://www.ogf.org/mailman/listinfo/drmaa-wg
--
Mariusz
Loading...