-
Notifications
You must be signed in to change notification settings - Fork 0
Building and Deploying Hadoop Solutions on Linux on Power
Apache Hadoop Version : 2.7.2
Apache Hive Version : 1.2.1
Apache Derby Version : 10.12.1.1
Apache Spark Version : 2.0.0 => ( Below experiment is true for Spark 1.6.0 as well )
Protobuf ( Google Protocol Buffer ) : Protoc 2.5.0
R : 3.2.5
Scala : 2.11.8
**Node.js :
Zeppelin : 0.6.0
Linux Distro : RHEL 7.2 PPC64LE
a) Build Protobuf ( pre-requisite dependency for hadoop build )
b) Build Hadoop – 2.7.2 :
-> wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2-src.tar.gz
-> tar -xvf hadoop-2.7.2-src.tar.gz
-> mvn package -Pdist,native -DskipTests –Dtar
Hadoop Dist Tar will be created at location below :
/tempdisk/software/hadoop/hadoop-2.7.2-src/hadoop-dist/target/hadoop-2.7.2.tar.gz
AS ROOT USER
1. Disable ipv6 from all nodes ( for both Master and all the slave )
vi /etc/sysctl.conf
add ---->
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
2. Disable selinux ( for both Master and all the slave )
install selinux if not already installed
-> yum install iptables-services -y
-> vi /etc/selinux/config
set ----->
-> SELINUX=disabled
also turn off firewall on boot
-> /etc/init.d/iptables save
-> /etc/init.d/iptables stop
-> chkconfig iptables off
3. Add proper host name( for both Master and all the slave )
vi /etc/sysconfig/network
HOSTNAME=bigdatahdfs1.ibm.com
4. Create a hadoop user
useradd -G hadoop hadoop
passwd hadoop
AS HADOOP USER
5. Add environment variables( for both Master and all the slave )
#JAVA ENVIRONMENT VARIABLES
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le"
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export JAVA_LDFLAGS="-L/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le/jre/lib/ppc64/server -R/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le/jre/lib/ppc64/server -ljvm"
export JAVA_CPPFLAGS="-I/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le/include -I/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le/include/linux"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le/jre/lib/ppc64/server
#HADOOP ENVIRONMENT VARIABLES
export HADOOP_HOME=/home/hadoop/hadoop-2.7.2
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$YARN_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
6. Setup passwordless ssh( for both Master and all the slave )
ssh-keygen -t rsa -P ""
cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
ssh localhost
7. Modify Hadoop Environment Variables( for both Master and all the slave )
vi /opt/hadoop-2.2.0/libexec/hadoop-config.sh
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le"
vi /opt/hadoop-2.2.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.31-2.b13.ael7b.ppc64le"
8. Check Hadoop Installation( for both Master and all the slave )
[hadoop@sys-77402 bin]$ ./hadoop version
Hadoop 2.7.2
Subversion Unknown -r Unknown
Compiled by root on 2016-07-15T07:02Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /home/hadoop/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
[hadoop@sys-77402 bin]$
9. Create Hadoop Temp Folder( for both Master and all the slave )
mkdir -p $HADOOP_HOME/tmp
10. Add Hadoop Slaves( for only Master )
Add entries of all data nodes in hadoop configuration on MNode ( for only Master )
vi /opt/hadoop-2.2.0/etc/hadoop/slaves
11. Hadoop Configuration( for Master Node as well as Slave nodes )
Add the following entry for core-site.xml ( $HADOOP_HOME/etc/hadoop/core-site.xml )
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://bigdatahdfs1.ibm.com:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.7.2/tmp</value>
</property>
</configuration>
Add the following entry for hdfs-site.xml ( $HADOOP_HOME/etc/hadoop/hdfs-site.xml )
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
Add the following entry for mapred-site.xml ( $HADOOP_HOME/etc/hadoop/mapred-site.xml )
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Add the following entry for yarn-site.xml ( $HADOOP_HOME/etc/hadoop/yarn-site.xml )
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>bigdatahdfs1.ibm.com:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>bigdatahdfs1.ibm.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>bigdatahdfs1.ibm.com:8088</value>
</property>
12 . Format the NameNode
cd $HADOOP_HOME/bin
./hadoop namenode -format
13. Start all the hadoop processes
su - hadoop
cd $HADOOP_HOME/sbin
./start-dfs.sh ==> to start NameNode and DataNode
./start-yarn.sh ===> to start YARN
mr-jobhistory-daemon.sh start historyserver ===> Start job history server
14. Stop all processes
a) Stop history server [ User : hadoop on existing node ]
./mr-jobhistory-daemon.sh stop historyserver
b) Stop YARN [ User : hadoop on existing node ]
./stop-yarn.sh
c) Stop HDFS daemons [ User : hadoop on existing node ]
./stop-dfs.sh
-> Hue is a lightweight Web server that lets you use Hadoop directly from your browser.
-> Hue is just a ‘view on top of any Hadoop distribution’ and can be installed on any machine.
a) Install pre-requisites
$ yum install java-1.8.0-openjdk*
$ yum install ant asciidoc
$ yum install cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libtidy libxml2-devel libxslt-devel make mvn mysql mysql-devel openldap-devel
$ yum install python-devel sqlite-devel openssl-devel gmp-devel
b) Build Hue
$ git clone https://github.com/cloudera/hue.git
$ cd hue
$ make apps
$ build/env/bin/hue runserver
c) Hue should be up and running against above build/deploy
http://localhost:8000
-> data warehouse infrastructure tool
-> is not a relational database
-> is not for OLTP
-> is not for Realtime
-> stores schema into database and processes data in hdfs
-> designed for OLAP
-> provides SQL type language support called HQL / HiveQL
a) Pre-Requisite :
Java 1.8
Hadoop 2+
b) Get the latest Hive Source
wget https://archive.apache.org/dist/hive/stable/apache-hive-1.2.1-src.tar.gz
tar -xzvf apache-hive-1.2.1-src.tar.gz
c) Build hive
cd hive-1.2.1
mvn clean install -Phadoop-2,dist -Dmaven.test.skip=true -e –X
d) Hive packages are built at below location
/tempdisk/software/hive/apache-hive-1.2.1-src/packaging/target/apache-hive-1.2.1-bin.tar.gz
e) Test Hive Setup
1. Setup HIVE Environment Variables :
#HIVE Environment Variables
export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/*:.
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/*:.
2. Setup DERBY Environment Variables :
#DERBY ENVIRONMENT VARIABLES
export DERBY_HOME=/home/hadoop/db-derby-10.12.1.1-src
export DERBY_INSTALL=$DERBY_HOME
export CLASSPATH=$CLASSPATH:$DERBY_INSTALL/lib/derby.jar:$DERBY_INSTALL/lib/derbytools.jar
3. Hive runs on top of Hadoop so we must have Hadoop on the path
#HADOOP_HOME Environment Variable
export HADOOP_HOME=/home/hadoop/hadoop-2.7.2
4. Create Hive Directories
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
5. Run Hive CLI
$HIVE_HOME/bin/hive
a) Get the latest Source of R
wget https://cran.r-project.org/src/base/R-3/R-3.2.5.tar.gz
tar zxvf R-3.2.5.tar.gz
cd /tempdisk/software/R/R-3.2.5
b) Install Dependency
yum install readline-devel
c) Build
./configure --with-x=no
./make
make install
e) Setup Environment Variable
export R_HOME=/tempdisk/software/R/R-3.2.5
d) Validate the build and install of R Packages
echo "sessionInfo()" | R --save
a) Checkout Spark Code
git clone https://github.com/apache/spark
cd spark
git checkout tags/v2.0.0
b) Export SPARK Environment Variable ( Optional . Applicable only for functional test against built Spark )
export SPARK_HOME=$HOME/spark
c) Invoke the maven build process ( it takes quite some time .... 5/6 hours in cases )
mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests package
d) Create Distributable package of Spark (SparkR will not be there in the package temporarily):
./dev/make-distribution.sh --name spark-2.0.0-hadoop2.7-ppc64le --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pyarn
f) Built package location :
/tempdisk/software/spark/spark/spark-2.0.0-bin-spark-2.0.0-hadoop2.7-ppc64le.tgz
g) Setup the Environment Variable for Spark
#SPARK ENVIRONMENT VARIABLES
export SPARK_HOME=/home/hadoop/spark-2.0.0-bin-spark-2.0.0-hadoop2.7-ppc64le
export R_HOME=/home/hadoop/R-3.2.5
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export CLASSPATH=$SPARK_HOME/jars/*:$CLASSPATH
export SPARK_CLASSPATH=$HADOOP_CLASSPATH:$SPARK_HOME/jars/*:$CLASSPATH
export SPARK_LOG_DIR=/home/hadoop/logs/spark
export SPARK_WORKER_DIR=/tmp/spark
h) Start Spark Master in Standalone Mode
cd $SPARK_HOME/sbin
./start-master.sh
i) Start Spark Worker in Standalone Mode
cd $SPARK_HOME/sbin
./start-slave.sh <master_url>
( eg : ./start-slave.sh spark://sys-77402.dal-ebis.ihost.com:7077 )
-> We do not need to build Scala Packages .
-> Pre-requisite for Scala Install is
-> install wget and openjdk 1.8
-> Install Scala
-> wget http://downloads.typesafe.com/scala/2.11.8/scala-2.11.8.rpm
-> sudo rpm -ivh scala-2.11.8.rpm
-> Test Scala
-> scala -version
[root@sys-77402 scala-2.11.8]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
[root@sys-77402 scala-2.11.8]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
[root@sys-77402 scala-2.11.8]#