File input/output operations on HDFS

All operations are made from worker nodes. The way SGE integration works, it  creates the masters, slaves and mapred-site.xml dynamically and starts the JobTrackers for you. The Hadoop configuration is generated in $TMPDIR. Here are some short examples:

hadoop --config \$TMPDIR/conf fs -lsr /
hadoop --config \$TMPDIR/conf fs -mkdir myjob
hadoop --config \$TMPDIR/conf fs -put file01 myjob
hadoop --config \$TMPDIR/conf fs -put file02 myjob
hadoop --config \$TMPDIR/conf fs -lsr /

Ofcourse you have to write this using qsub. \$TMPDIR means we want this var inside our script at runtime. So here goes (example1.sh):

#!/bin/bash
#
# List contents of the HDFS

qsub -q ibm-nehalem.q -pe hadoop 4 -N HadoopExample -cwd <<EOF

module load java/jdk1.6.0_23-64bit
module load libraries/hadoop-0.20.2

hadoop --config \$TMPDIR/conf fs -lsr /
hadoop --config \$TMPDIR/conf fs -mkdir myjob
hadoop --config \$TMPDIR/conf fs -put file01 myjob
hadoop --config \$TMPDIR/conf fs -put file02 myjob
hadoop --config \$TMPDIR/conf fs -lsr /

EOF

Let's run it and see the results.

[alexandru.herisanu@fep-53-1 hadoop]$ ./example1.sh

http://blogs.oracle.com/ravee/entry/creating_hadoop_pe_under_sge

[alexandru.herisanu@fep-53-1 hadoop]$ hadoop dfs

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]