Cluster Grid

Font Size

SCREEN

Layout

Menu Style

Cpanel

Remote Display on FEP / Worker Nodes

 

Many debugging programs and profile apps require remote display. This becomes an issue in large clusters. There is usually one entry point in this case fep.grid.pub.ro (currently fep-53-3.grid.pub.ro). To achieve this on the NCIT Cluster we have a two stage process:

  1. get remote desktop to fep working
  2. get the display from the worker node to fep

Step 1:

Step-by-step configuration of NX Client

Start -> NX Connection Wizard -> Next

Session: [NCIT] Fep

Host: fep-53-3.grid.pub.ro

Next

Select Unix - Custom - Settings ...

Run the console

Floating window

Ok -> Next

Show Advanced Configuration dialog -> Finish

 

Now, select Key ... -> Click Default -> Save -> Save -> Ok

You can now login: username / password is from curs.cs.pub.ro

 

Step 2:

Now you must forward the display from the cluster machine to fep. Suppose the machine is named quad-wn16. On fep, record the display variable and run xhost + to allow incomming X connections:

 

[alexandru.herisanu@fep-53-3 ~]$ echo $DISPLAY
:1000.0
[alexandru.herisanu@fep-53-3 ~]$ xhost +
access control disabled, clients can connect from any host
[alexandru.herisanu@fep-53-3 ~]$

 

Now, from the remote machine (script will be run by means of qsub), you only have to set the DISPLAY variable to fep and you screen like this and run your program:

 

[alexandru.herisanu@quad-wn16 ~]$ export DISPLAY=fep-53-3.grid.pub.ro:1000.0
[alexandru.herisanu@quad-wn16 ~]$ xclock

Done.

 

SSH keys and passwordless authentication between worker nodes

Hmm, this might be handy: how to get a paswordless authentication between worker nodes.

Why? Say you want to copy files from an MPI head-node on the other nodes, or you have a server (example debugging) that needs to connect to all MPI nodes without using a password. (and you do not have the right to configure your queue)

 

The commands are pretty basic: ssh-keygen. If you really wish, you can restrict its use only in this cluster by using from="172.16.*.*" at the beginning of the line like this:

 

[heri@fep-53-2 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/export/home/heri/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /export/home/heri/.ssh/id_rsa.
Your public key has been saved in /export/home/heri/.ssh/id_rsa.pub.
The key fingerprint is:
ba:a8:bd:28:05:28:3a:0b:44:27:8a:d4:0b:c3:df:35
This email address is being protected from spambots. You need JavaScript enabled to view it.
[heri@fep-53-2 ~]$ echo -ne "from=\"172.16.*.*\" " >> ~/.ssh/authorized_keys2
[heri@fep-53-2 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
[heri@fep-53-2 ~]$ chmod 600 ~/.ssh/authorized_keys2

 

So ... let's try it out. Remember, it will be the first authentication, so you do not have a known_hosts entry.

 

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa MY_HOST

 

Does the trick. You may get your list of nodes from the $PE_HOSTFILE variable.

 

 [heri@fep-53-2 ~]$ qsub -q ibm-quad.q -pe openmpi*1 2 -S /bin/bash
echo "File that contains all nodes and slots: [$PE_HOSTFILE]"

export MYNODES=`cat $PE_HOSTFILE | cut -f 1 -d ' '`
for x in $MYNODES;
do
   ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/id_rsa $x "echo Hi my name is `hostname`!"
done
Your job 94760 ("STDIN") has been submitted
[heri@quad-wn07 ~]$ cat STDIN.*94760
Warning: Permanently added 'quad-wn26.grid.pub.ro,172.16.4.26' (RSA) to the list of known hosts.
Warning: Permanently added 'quad-wn12.grid.pub.ro,172.16.4.12' (RSA) to the list of known hosts.
File that contains all nodes and slots: [/opt/n1sge6/sge-6.2u3/NCitCluster/spool/quad-wn26/active_jobs/94760.1/pe_hostfile]
Hi my name is quad-wn26.grid.pub.ro!
Hi my name is quad-wn26.grid.pub.ro!

 

 

 

Quickstart

Daca un job ocupa un singur nod, acesta se ruleaza cu comanda:

qsub –q numele_cozii –cwd nume_script

Aceasta comanda va rula script-ul nume_script pe o singura masina din grupul numele_cozii iar directorul din care se va rula scriptul, este directorul curent.

 

Cozi disponibile

 

Nr. Crt

Nume coada

Nr noduri

Procesor

Frecventa

Chip-uri

Core-uri

Memorie

1

fs-p4.q *

56

Pentium 4

3 Ghz

1

1

2Gb

2

fs-dual.q *

32

Intel Xeon

3 Ghz

2

1

2Gb

3

ibm-quad.q

28

Intel Xeon

2 Ghz

2

4

16Gb

4

ibm-opteron.q

2

AMD Opteron

2,55 Ghz

2

6

16Gb

 

* aceasta coada nu e disponibila studentilor prin SGE

 

In cazul in care dorim sa rulam pe mai multe core-uri sau pe mai mule noduri trebuie sa informam sistemul de batch despre acest lucru.


qsub –q numele_cozii –pe nume_environment [nr sloturi] –cwd nume_script

Specificarea unui parallel environment este modul de a anunta sistemul de batch Sun Grid Engine, ca trebuie sa ne acorde mai mult de un singur nod de rulare. Un Paralell environment este definit print-un nume si un numar de noduri cerut.

Numele acestor medii de lucru MPI este definit la nivelul clusterului astfel:

 

 

Nr. crt

Nume

Sloturi maxime

Nr. Demoni per host

Politica de scheduling

Vers. MPI

1

openmpi

224

8

pe_slots

Openmpi 1.2.3

2

openmpi*1

28

1

 

Openmpi 1.2.3

3

intelmpi

224

8

pe_slots

Intel MPI

4

Intelmpi*1

28

1

 

Intel MPI

5

hpmpi

224

8

fill_up

HP MPI

 

 

C/C++ compilation on Windows

There are multiple posibilities to develop C/C++ applications on windows and then running them on the cluster. The preferred method is using an IDE (Eclipse http://www.eclipse.org or Netbeans http://www.netbeans.org) locally and transfer your project to the cluster.

 

It is hard to do this when you can not  compile your program using GCC.

You should install CYGWIN or MINGW for that.

 

Read more: