Cluster Grid

Font Size

SCREEN

Layout

Menu Style

Cpanel

How to request exclusive acces to a host in SGE

 

Sometimes you need to have a host exclusively to yourself. Here goes ..

 

[heri@fep-53-2 ~]$ qsub -l excl=true -q ibm-quad.q -pe openmpi*1 2
echo "I run alone ... always."
echo `uptime`
Your job 97628 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ watch qstat
[heri@fep-53-2 ~]$ qstat -j 97628

 

Original link:

http://wikis.sun.com/display/gridengine62u3/Configuring+Exclusive+Scheduling 

 

I/O intensive jobs with multiple local readers/writers

Ok, say you have a messy MPI program and you need to read and write all data locally.

You can do this. SGE creates a temporary directory $TMPDIR and writes the node information in $PE_HOSTFILE.

This directory only exists on the head node. You can use a passwordless ssh authentication to create temporary directories on all nodes and copy your files via NFS there. After you're done, get your files back and clean after yourself!

 

Prerequsite:

  1. I/O intensive jobs in  shared home batch system (Read full article)
  2. Passwordless ssh authentication between WNs (Read full article)

So here goes:

 [heri@fep-53-2 ~]$ qsub -q ibm-quad.q -pe openmpi*1 3 -S /bin/bash
echo "Local temp dir on head node: [$TMPDIR]"
echo "File that contains all nodes and slots: [$PE_HOSTFILE]"

export MYNODES=`cat $PE_HOSTFILE | cut -f 1 -d ' '`
echo "My running nodes: [`echo $MYNODES | tr '\n' ' '`]"

echo "Checking local temp directory on each node ... (optional)"
for x in $MYNODES;
do
   echo "Trying node $x"
   ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "mkdir $TMPDIR" 2>&1
   ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "cp ~/BigFile.blob $TMPDIR" 2>&1
   ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "ls -l $TMPDIR"
done

echo "Do not forget to delete those files!"
Your job 94763 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ cat STDIN.*94764
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
Local temp dir on head node: [/scratch/tmp/94764.1.ibm-quad.q]
File that contains all nodes and slots: [/opt/n1sge6/sge-6.2u3/NCitCluster/spool/quad-wn17/active_jobs/94764.1/pe_hostfile]
My running nodes: [quad-wn17.grid.pub.ro quad-wn14.grid.pub.ro quad-wn18.grid.pub.ro ]
Checking local temp directory on each node ... (optional)
Trying node quad-wn17.grid.pub.ro
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
mkdir: cannot create directory `/scratch/tmp/94764.1.ibm-quad.q': File exists
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec  7 02:52 BigFile.blob
Trying node quad-wn14.grid.pub.ro
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec  7 02:52 BigFile.blob
Trying node quad-wn18.grid.pub.ro
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec  7 02:52 BigFile.blob
Do not forget to delete those files!

Hmm, when you create the dir, you might want to try to check if it's already there by using the test command.

 

How to check the default shell and change it for my job in Sun Grid Engine

Ok, our default shell is /bin/csh.

How to check that? Either run a job, or check the configuration yourself.

 

[heri@fep-53-2 ~]$ qsub -q ibm-quad.q
echo $SHELL
Your job 94735 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ watch qstat
[heri@fep-53-2 ~]$ cat STDIN.o94735
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/bin/csh
[heri@fep-53-2 ~]$ qconf -sq ibm-quad.q | grep shell
shell                 /bin/csh
shell_start_mode      posix_compliant

Say you really don't like the defaults. Try this:

 

[heri@fep-53-2 ~]$ qsub -q ibm-quad.q -S /bin/bash
echo $SHELL
export MYVAR="If you can read this, it works."
echo $MYVAR
Your job 94739 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ watch qstat
[heri@fep-53-2 ~]$ cat STDIN.o94739
/bin/bash
If you can read this, it works.
[heri@fep-53-2 ~]$

I/O intensive jobs on a shared home directory

Our default setup includes shared home directories exported via NFS. If you have I/O intensive apps you may have a bottleneck. (and yes, if you kill the storage system, your jobs get deleted). The trick is to know witch node is writing and to copy the input data locally.

Sun Grid Engine creates for each job, by default, a temporary directory. In our case this is a local directory /scratch/tmp/... . When you run a job this happens:

[heri@fep-53-2 ~]$ qsub -q ibm-quad.q
echo $TMPDIR
cp BigFile.blob $TMPDIR/
ls -l $TMPDIR
echo `hostname`
Your job 94740 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ cat STDIN.o94740
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/scratch/tmp/94740.1.ibm-quad.q
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec  7 01:42 BigFile.blob
quad-wn19.grid.pub.ro

But Carefull! If you run an MPI job, your script only runs on the master node 0.

If you really want to read in paralell from the local filesystem you could:

  1. either copy the files via NFS to each of my nodes (on second thought ... )
  2. copy via SSH to each of my nodes

For example, let's run an MPI job with the restriction of only one mpi process per node. (Full article here)

Prereq: passphraseless default key. (Full article here)

 

 

 

 

 

Makefile cu parametru de compilator

Iata o solutie draguta pentru folosirea mai multor compilatoare.

Problema: Am un program pe care vreau sa-l compilez cu mai multe compilatoare, iar fiecare compilator are optiuni diferite. Nu vreau sa am mai multe target-uri gen build_sun, build_gcc etc, ci vreau sa am o singur target.

Read more: