Installation of Hadoop single node cluster 3.1.4 in ubuntu 20.04

Rupesh Kumar Singh
2 min readSep 15, 2020

Step 1: Installation of openJDK-8

$ sudo apt install openjdk-8-jdk openjdk-8-jre

java -version

Step 2: Adding the Jdk path to the path variable

$ sudo vim ~/.bashrc

#add below line at end of file and load modification

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export PATH=$PATH:$JAVA_HOME

$ source ~/.bashrc

#check path variable

$ echo $JAVA_HOME

$ echo $PATH

$ echo $JAVA_HOME

Step 3: create new user for the HADOOP, and generate ssh key for passwordless login from hadoop user

$ sudo adduser hadoop

$ sudo usermod -aG sudo hadoop

$ sudo su — hadoop

$ ssh-keygen -t rsa

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$ exit

Step 4: Download the latest Hadoop folder from https://hadoop.apache.org/releases.html

$ tar -xvzf hadoop-3.1.4.tar.gz

$ mv hadoop-3.1.4 /usr/local/hadoop

#create following folder in user/local/hadoop

“htemp”,”hdfs/datanode”,”hdfs/namenode”

Step 5: Configure path variable for hadoop user

$ sudo vim /etc/profile.d/hadoop_java.sh

#add following path and load modification

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

exportHADOOP_OPTS=``$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native``

#save and exit

$ source /etc/profile.d/hadoop_java.sh

#check hadoop version

$ hadoop version

$ hdfs version

Step 6: Hadoop Configuring

Change permission for the hadoop folder to hadoop user

$ sudo chown -R hadoop:hadoop /usr/local/hadoop

update JAVA_HOME in hadoop-env.sh (/usr/local/hadoop/etc/hadoop)

$ vim hadoop-env.sh

#add below content at line 54 and save exit

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Step 7: Update core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

<description>The default file system URI</description>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/htemp</value>

</property>

</configuration>

Step8: Update hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>file:/usr/local/hadoop/hdfs/namenode</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>file:/usr/local/hadoop/hdfs/datanode</value>

</property>

</configuration>

Step9: update mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>

</property>

</configuration>

Step10: update yarn-site.xml

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>

</property>

</configuration>

Step 11: Using below command Format the namenode

$ hdfs namenode -format

#all setup is done now check HDFS configuration (/usr/local/hadoop/sbin/)

$ start-all.sh

$ jps

For more info please connect & Follow me:

LinkedIn: https://www.linkedin.com/in/rupesh-singh

Email: rupeshdeoria@gmail.com

References:

  1. https://hadoop.apache.org/docs/r3.1.4/hadoop-project-dist/hadoop-common/SingleCluster.html

--

--

Rupesh Kumar Singh

An IT professional with 10+ years of experience, Python | pandas| Django | Flask | Superset | pyspark | FullStack | Hadoop | AWS | php | no-SQL | ETL | Data-pip