On my previous post, I talked about how to install Hadoop.
Now, I will describe how to run a simple Hadoop code on your machine.
Since I am using a Maven and its pom.xml, hadoop jar files should be available on your (local) maven repository.
1. If you want to install hadop jar files from your local machine, use the command shown below:
(If you want to install the jar files from the Maven repository, you can skip this process and define jar info shown on the repository correctly on the pom.xml)
Then,(If you want to install the jar files from the Maven repository, you can skip this process and define jar info shown on the repository correctly on the pom.xml)
- hadoop-common-2.6.0.jar, hadoop-nfs-2.6.0.jar in a $HADOOP_PREFIX/share/hadoop/common directory
- hadoop-mapreduce-client-common-2.6.0.jar, hadoop-mapreduce-client-core-2.6.0.jar in a $HADOOP_PREFIX/share/hadoop/mapreduce directory.
- slf4j-api-1.7.5.jar and a slf4j-log4j12-1.7.5.jar in a $HADOOP_PREFIX/share/hadoop/common/lib directory, but these may not be required.
mvn install:install-file -Dfile=<path-to-file> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version> -Dpackaging=<packaging> -DgeneratePom=true
For example,
$ cd $HADOOP_PREFIX/share/hadoop/common
$ mvn install:install-file -Dfile=hadoop-common-2.6.0.jar -DgroupId=org.apache.hadoop -DartifactId=hadoop-common -Dversion=2.6.0 -Dpackaging=jar -DgeneratePom=true
(If you prefer, you can use your own values for the groupId and the artifactId, but you need to use the same values on the pom.xml shown below.)
2. Create a maven project.
This is my maven project in Eclipse.
Java code: It is from a book "Hadoop: The Definite Guide" written by Tom White.
quangle.txt: It is from a book "Hadoop: The Definite Guide" written by Tom White.
This is my maven project in Eclipse.
- pom.xml
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemalocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelversion>4.0.0</modelversion>
<groupid>com.jihwan.learn.hadoop</groupid>
<artifactid>chapterThree</artifactid>
<version>0.1</version>
<name>Chapter three examples</name>
<properties>
<hadoop.version>2.6.0</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupid>org.apache.hadoop</groupid>
<artifactid>hadoop-common</artifactid>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupid>org.apache.hadoop</groupid>
<artifactid>hadoop-nfs</artifactid>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupid>org.apache.hadoop</groupid>
<artifactid>hadoop-mapreduce-client-common</artifactid>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupid>com.hadoop.mapreduce</groupid>
<artifactid>hadoop-mapreduce-client-core</artifactid>
<version>${Hadoop.version}</version>
</dependency>
</dependencies>
</project>
package com.jihwan.learn.hadoop
import java.io.InputStream;
import java.net.URL;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;
public class URLCat {
static{
URL.setURLStreamHandlerFactory(
new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception{
InputStream in = null;
try{
in = new URL(args[0]).openStream();
IOUtils.copyBytes(in, System.out, 4096, false);
}finally{
IOUtils.closeStream(in);
}
}
}
On the top of the Crumpetty Tree The Quangle Wangle sat, But his face you could not see, On account of his Beaver Hat.
3. Create a jar file using a maven command
$ mvn package
4. It creates a chapterThree-0.1.jar file under the target directory
5. Before you run the URLCat, make sure the hadoop local server is running. This prerequisite steps are shown on the previous post.
$ cd $HADOOP_PREFIX
$ bin/hdfs nameode -format
$ sbin/start-dfs.sh
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/jihwan #jihwan is my user id
Then change a directory to your maven project directory for the next step.
6. After running the start-dfs.sh, you should be able to open http://localhost:50070/ and it should look like this.
7. Let's copy a local file 'quangle.txt' on your project home directory to Hadoop server.
$ hadoop fs -copyFromLocal quangle.txt quangleCopy.txt
8. Now, it is time to run the URLCat application.
$ cd target
$ export HADOOP_CLASSPATH=chapterThree-0.1.jar
$ hadoop com.jihwan.learn.hadoop.URLCat hdfs://localhost:9000/user/jihwan/quangleCopy.txt
This is it. Have fun!!!
$ hadoop fs -copyFromLocal quangle.txt quangleCopy.txt
8. Now, it is time to run the URLCat application.
$ cd target
$ export HADOOP_CLASSPATH=chapterThree-0.1.jar
$ hadoop com.jihwan.learn.hadoop.URLCat hdfs://localhost:9000/user/jihwan/quangleCopy.txt
This is it. Have fun!!!
No comments:
Post a Comment