Cluster Grid

Font Size

SCREEN

Layout

Menu Style

Cpanel

I/O intensive jobs with multiple local readers/writers

Ok, say you have a messy MPI program and you need to read and write all data locally.

You can do this. SGE creates a temporary directory $TMPDIR and writes the node information in $PE_HOSTFILE.

This directory only exists on the head node. You can use a passwordless ssh authentication to create temporary directories on all nodes and copy your files via NFS there. After you're done, get your files back and clean after yourself!

 

Prerequsite:

  1. I/O intensive jobs in  shared home batch system (Read full article)
  2. Passwordless ssh authentication between WNs (Read full article)

So here goes:

 [heri@fep-53-2 ~]$ qsub -q ibm-quad.q -pe openmpi*1 3 -S /bin/bash
echo "Local temp dir on head node: [$TMPDIR]"
echo "File that contains all nodes and slots: [$PE_HOSTFILE]"

export MYNODES=`cat $PE_HOSTFILE | cut -f 1 -d ' '`
echo "My running nodes: [`echo $MYNODES | tr '\n' ' '`]"

echo "Checking local temp directory on each node ... (optional)"
for x in $MYNODES;
do
   echo "Trying node $x"
   ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "mkdir $TMPDIR" 2>&1
   ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "cp ~/BigFile.blob $TMPDIR" 2>&1
   ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "ls -l $TMPDIR"
done

echo "Do not forget to delete those files!"
Your job 94763 ("STDIN") has been submitted
[heri@fep-53-2 ~]$ cat STDIN.*94764
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
Local temp dir on head node: [/scratch/tmp/94764.1.ibm-quad.q]
File that contains all nodes and slots: [/opt/n1sge6/sge-6.2u3/NCitCluster/spool/quad-wn17/active_jobs/94764.1/pe_hostfile]
My running nodes: [quad-wn17.grid.pub.ro quad-wn14.grid.pub.ro quad-wn18.grid.pub.ro ]
Checking local temp directory on each node ... (optional)
Trying node quad-wn17.grid.pub.ro
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
mkdir: cannot create directory `/scratch/tmp/94764.1.ibm-quad.q': File exists
Warning: Permanently added 'quad-wn17.grid.pub.ro,172.16.4.17' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec  7 02:52 BigFile.blob
Trying node quad-wn14.grid.pub.ro
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn14.grid.pub.ro,172.16.4.14' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec  7 02:52 BigFile.blob
Trying node quad-wn18.grid.pub.ro
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn18.grid.pub.ro,172.16.4.18' (RSA) to the list of known hosts.
total 1572
-rw-rw-r-- 1 heri heri 1602381 Dec  7 02:52 BigFile.blob
Do not forget to delete those files!

Hmm, when you create the dir, you might want to try to check if it's already there by using the test command.