Java Tutorial & more: How to Run Hadoop Code on Your Laptop.

On my previous post, I talked about how to install Hadoop.

Now, I will describe how to run a simple Hadoop code on your machine.

Since I am using a Maven and its pom.xml, hadoop jar files should be available on your (local) maven repository.

1. If you want to install hadop jar files from your local machine, use the command shown below:
(If you want to install the jar files from the Maven repository, you can skip this process and define jar info shown on the repository correctly on the pom.xml)

hadoop-common-2.6.0.jar, hadoop-nfs-2.6.0.jar in a $HADOOP_PREFIX/share/hadoop/common directory
hadoop-mapreduce-client-common-2.6.0.jar, hadoop-mapreduce-client-core-2.6.0.jar in a $HADOOP_PREFIX/share/hadoop/mapreduce directory.
slf4j-api-1.7.5.jar and a slf4j-log4j12-1.7.5.jar in a $HADOOP_PREFIX/share/hadoop/common/lib directory, but these may not be required.

Then,

mvn install:install-file -Dfile=<path-to-file> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version> -Dpackaging=<packaging> -DgeneratePom=true

For example,

$ cd $HADOOP_PREFIX/share/hadoop/common

$ mvn install:install-file -Dfile=hadoop-common-2.6.0.jar -DgroupId=org.apache.hadoop -DartifactId=hadoop-common -Dversion=2.6.0 -Dpackaging=jar -DgeneratePom=true

(If you prefer, you can use your own values for the groupId and the artifactId, but you need to use the same values on the pom.xml shown below.)

2. Create a maven project.
This is my maven project in Eclipse.

pom.xml

<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemalocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelversion>4.0.0</modelversion>
  <groupid>com.jihwan.learn.hadoop</groupid>
  <artifactid>chapterThree</artifactid>
  <version>0.1</version>
  <name>Chapter three examples</name>
  <properties>
     <hadoop.version>2.6.0</hadoop.version>
  </properties>
  <dependencies>
     <dependency>
        <groupid>org.apache.hadoop</groupid>
        <artifactid>hadoop-common</artifactid>
        <version>${hadoop.version}</version>
     </dependency>
     <dependency>
        <groupid>org.apache.hadoop</groupid>
        <artifactid>hadoop-nfs</artifactid>
        <version>${hadoop.version}</version>
     </dependency>
     <dependency>
        <groupid>org.apache.hadoop</groupid>
        <artifactid>hadoop-mapreduce-client-common</artifactid>
        <version>${hadoop.version}</version>
     </dependency>
     <dependency>
        <groupid>com.hadoop.mapreduce</groupid>
        <artifactid>hadoop-mapreduce-client-core</artifactid>
        <version>${Hadoop.version}</version>
     </dependency>
  </dependencies>
</project>

Java code: It is from a book "Hadoop: The Definite Guide" written by Tom White.

     package com.jihwan.learn.hadoop

     import java.io.InputStream;
     import java.net.URL;
     import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
     import org.apache.hadoop.io.IOUtils;

     public class URLCat {
        static{
           URL.setURLStreamHandlerFactory(
               new FsUrlStreamHandlerFactory());
        } 

        public static void main(String[] args) throws Exception{
           InputStream in = null;
           try{
              in = new URL(args[0]).openStream();
              IOUtils.copyBytes(in, System.out, 4096, false);
           }finally{
              IOUtils.closeStream(in);
           }
        }
     }

quangle.txt: It is from a book "Hadoop: The Definite Guide" written by Tom White.

     On the top of the Crumpetty Tree
     The Quangle Wangle sat,
     But his face you could not see,
     On account of his Beaver Hat.

3. Create a jar file using a maven command
$ mvn package

4. It creates a chapterThree-0.1.jar file under the target directory

5. Before you run the URLCat, make sure the hadoop local server is running. This prerequisite steps are shown on the previous post.
$ cd $HADOOP_PREFIX
$ bin/hdfs nameode -format
$ sbin/start-dfs.sh
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/jihwan #jihwan is my user id

Then change a directory to your maven project directory for the next step.

6. After running the start-dfs.sh, you should be able to open http://localhost:50070/ and it should look like this.

7. Let's copy a local file 'quangle.txt' on your project home directory to Hadoop server.
$ hadoop fs -copyFromLocal quangle.txt quangleCopy.txt

8. Now, it is time to run the URLCat application.
$ cd target
$ export HADOOP_CLASSPATH=chapterThree-0.1.jar
$ hadoop com.jihwan.learn.hadoop.URLCat hdfs://localhost:9000/user/jihwan/quangleCopy.txt

This is it. Have fun!!!

Java Tutorial & more

Thursday, July 2, 2015

How to Run Hadoop Code on Your Laptop.

No comments:

Post a Comment

Java 9: Flow - Reactive Programming