Installation of Hadoop single node cluster 3.1.4 in ubuntu 20.04
Step 1: Installation of openJDK-8
$ sudo apt install openjdk-8-jdk openjdk-8-jre
Step 2: Adding the Jdk path to the path variable
$ sudo vim ~/.bashrc
#add below line at end of file and load modification
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME
$ source ~/.bashrc
#check path variable
$ echo $JAVA_HOME
$ echo $PATH
$ echo $JAVA_HOME
Step 3: create new user for the HADOOP, and generate ssh key for passwordless login from hadoop user
$ sudo adduser hadoop
$ sudo usermod -aG sudo hadoop
$ sudo su — hadoop
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ exit
Step 4: Download the latest Hadoop folder from https://hadoop.apache.org/releases.html
$ tar -xvzf hadoop-3.1.4.tar.gz
$ mv hadoop-3.1.4 /usr/local/hadoop
#create following folder in user/local/hadoop
“htemp”,”hdfs/datanode”,”hdfs/namenode”
Step 5: Configure path variable for hadoop user
$ sudo vim /etc/profile.d/hadoop_java.sh
#add following path and load modification
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
exportHADOOP_OPTS=``$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native``
#save and exit
$ source /etc/profile.d/hadoop_java.sh
#check hadoop version
$ hadoop version
$ hdfs version
Step 6: Hadoop Configuring
Change permission for the hadoop folder to hadoop user
$ sudo chown -R hadoop:hadoop /usr/local/hadoop
update JAVA_HOME in hadoop-env.sh (/usr/local/hadoop/etc/hadoop)
$ vim hadoop-env.sh
#add below content at line 54 and save exit
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Step 7: Update core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The default file system URI</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/htemp</value>
</property>
</configuration>
Step8: Update hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:/usr/local/hadoop/hdfs/datanode</value>
</property>
</configuration>
Step9: update mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
Step10: update yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
Step 11: Using below command Format the namenode
$ hdfs namenode -format
#all setup is done now check HDFS configuration (/usr/local/hadoop/sbin/)
$ start-all.sh
$ jps
For more info please connect & Follow me:
LinkedIn: https://www.linkedin.com/in/rupesh-singh
Email: rupeshdeoria@gmail.com