How to request exclusive acces to a host in SGE
- Details
- Hits: 6400
Sometimes you need to have a host exclusively to yourself. Here goes ..
[heri@fep-53-2 ~]$ qsub -l excl=true -q ibm-quad.q -pe openmpi*1 2
echo "I run alone ... always."
echo `uptime`
Your job 97628 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ watch qstat
[heri@fep-53-2 ~]$ qstat -j 97628
Original link:
http://wikis.sun.com/display/gridengine62u3/Configuring+Exclusive+Scheduling
I/O intensive jobs with multiple local readers/writers
- Details
- Hits: 7230
Ok, say you have a messy MPI program and you need to read and write all data locally.
You can do this. SGE creates a temporary directory $TMPDIR and writes the node information in $PE_HOSTFILE.
This directory only exists on the head node. You can use a passwordless ssh authentication to create temporary directories on all nodes and copy your files via NFS there. After you're done, get your files back and clean after yourself!
Prerequsite:
- I/O intensive jobs in shared home batch system (Read full article)
- Passwordless ssh authentication between WNs (Read full article)
So here goes:
[heri@fep-53-2 ~]$ qsub -q ibm-quad.q -pe openmpi*1 3 -S /bin/bash
echo "Local temp dir on head node: [$TMPDIR]"
echo "File that contains all nodes and slots: [$PE_HOSTFILE]"
export MYNODES=`cat $PE_HOSTFILE | cut -f 1 -d ' '`
echo "My running nodes: [`echo $MYNODES | tr '\n' ' '`]"
echo "Checking local temp directory on each node ... (optional)"
for x in $MYNODES;
do
echo "Trying node $x"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "mkdir $TMPDIR" 2>&1
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "cp ~/BigFile.blob $TMPDIR" 2>&1
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "ls -l $TMPDIR"
done
echo "Do not forget to delete those files!"
Your job 94763 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ cat STDIN.*94764
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
Local temp dir on head node: [/scratch/tmp/94764.1.ibm-quad.q]
File that contains all nodes and slots: [/opt/n1sge6/sge-6.2u3/NCitCluster/spool/quad-wn17/active_jobs/94764.1/pe_hostfile]
My running nodes: [quad-wn17.grid.pub.ro quad-wn14.grid.pub.ro quad-wn18.grid.pub.ro ]
Checking local temp directory on each node ... (optional)
Trying node quad-wn17.grid.pub.ro
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
mkdir: cannot create directory `/scratch/tmp/94764.1.ibm-quad.q': File exists
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec 7 02:52 BigFile.blob
Trying node quad-wn14.grid.pub.ro
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec 7 02:52 BigFile.blob
Trying node quad-wn18.grid.pub.ro
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec 7 02:52 BigFile.blob
Do not forget to delete those files!
Hmm, when you create the dir, you might want to try to check if it's already there by using the test command.
How to check the default shell and change it for my job in Sun Grid Engine
- Details
- Hits: 8903
Ok, our default shell is /bin/csh.
How to check that? Either run a job, or check the configuration yourself.
[heri@fep-53-2 ~]$ qsub -q ibm-quad.q
echo $SHELL
Your job 94735 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ watch qstat
[heri@fep-53-2 ~]$ cat STDIN.o94735
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/bin/csh
[heri@fep-53-2 ~]$ qconf -sq ibm-quad.q | grep shell
shell /bin/csh
shell_start_mode posix_compliant
Say you really don't like the defaults. Try this:
[heri@fep-53-2 ~]$ qsub -q ibm-quad.q -S /bin/bash
echo $SHELL
export MYVAR="If you can read this, it works."
echo $MYVAR
Your job 94739 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ watch qstat
[heri@fep-53-2 ~]$ cat STDIN.o94739
/bin/bash
If you can read this, it works.
[heri@fep-53-2 ~]$
I/O intensive jobs on a shared home directory
- Details
- Hits: 5256
Our default setup includes shared home directories exported via NFS. If you have I/O intensive apps you may have a bottleneck. (and yes, if you kill the storage system, your jobs get deleted). The trick is to know witch node is writing and to copy the input data locally.
Sun Grid Engine creates for each job, by default, a temporary directory. In our case this is a local directory /scratch/tmp/... . When you run a job this happens:
[heri@fep-53-2 ~]$ qsub -q ibm-quad.q
echo $TMPDIR
cp BigFile.blob $TMPDIR/
ls -l $TMPDIR
echo `hostname`
Your job 94740 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ cat STDIN.o94740
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/scratch/tmp/94740.1.ibm-quad.q
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec 7 01:42 BigFile.blob
quad-wn19.grid.pub.ro
But Carefull! If you run an MPI job, your script only runs on the master node 0.
If you really want to read in paralell from the local filesystem you could:
- either copy the files via NFS to each of my nodes (on second thought ... )
- copy via SSH to each of my nodes
For example, let's run an MPI job with the restriction of only one mpi process per node. (Full article here)
Prereq: passphraseless default key. (Full article here)
Makefile cu parametru de compilator
- Details
- Hits: 5460
Iata o solutie draguta pentru folosirea mai multor compilatoare.
Problema: Am un program pe care vreau sa-l compilez cu mai multe compilatoare, iar fiecare compilator are optiuni diferite. Nu vreau sa am mai multe target-uri gen build_sun, build_gcc etc, ci vreau sa am o singur target.