File input/output operations on HDFS
- Details Hits: 7664
All operations are made from worker nodes. The way SGE integration works, it creates the masters, slaves and mapred-site.xml dynamically and starts the JobTrackers for you. The Hadoop configuration is generated in $TMPDIR. Here are some short examples:
hadoop --config \$TMPDIR/conf fs -lsr /
hadoop --config \$TMPDIR/conf fs -mkdir myjob
hadoop --config \$TMPDIR/conf fs -put file01 myjob
hadoop --config \$TMPDIR/conf fs -put file02 myjob
hadoop --config \$TMPDIR/conf fs -lsr /
Ofcourse you have to write this using qsub. \$TMPDIR means we want this var inside our script at runtime. So here goes (example1.sh):
#!/bin/bash
#
# List contents of the HDFS
qsub -q ibm-nehalem.q -pe hadoop 4 -N HadoopExample -cwd <<EOF
module load java/jdk1.6.0_23-64bit
module load libraries/hadoop-0.20.2
hadoop --config \$TMPDIR/conf fs -lsr /
hadoop --config \$TMPDIR/conf fs -mkdir myjob
hadoop --config \$TMPDIR/conf fs -put file01 myjob
hadoop --config \$TMPDIR/conf fs -put file02 myjob
hadoop --config \$TMPDIR/conf fs -lsr /
EOF
Let's run it and see the results.
[alexandru.herisanu@fep-53-1 hadoop]$ ./example1.sh
http://blogs.oracle.com/ravee/entry/creating_hadoop_pe_under_sge
[alexandru.herisanu@fep-53-1 hadoop]$ hadoop dfs
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]