Partitions Benefiting to Hadoop World

Anshika Sharma
5 min readOct 24, 2020

--

Hadoop Cluster

❗❗ Storage capacity is the major aspect while we move to the world of Big Data. We create Hadoop cluster in order to store our huge data in a distributive way !! But Have you ever thought that what if we can control this storage capacity ??🤔

Yeah, its an amazing way of utilizing the storage!! 🤩🤩

Hola Connection 🤗🤗

In this article, I am going to create such a setup where we will be sharing only limited storage to our Hadoop Cluster i.e., through Data Node.

For this setup, i am going to use a Hadoop Cluster having one Masternode, one Slavenode and one client. Although for this demonstration, we require only Master and Slave node. So lets start the setup , Initially i have one Master , Slave and Client node already created.

Master, Slave and client node

For setting up a complete working cluster, you may refer my separate article :

Since, we never use the storage of the Master Node to store the data so we doesn’t require any extra space for that !!

Lets move toward Slave Node, To add the concept of partitioning in it we follow some steps :

Step 1 : Create an EBS Volume

Creating an EBS Volume

Here, i am going to create an EBS volume of 20 GiB in size.

Creating a volume with tag externalVolume-data

I have added a tag here “externalVolume-data”, i given this name to just to identify it !!

successfully created volume

After successful creation, we will attach this volume to the slave node as per the requirement.

Attaching Volume to the instance

After attachment, it will show “in use” :

Attached Volume

Lets check that it is there or not ?? Login to the slave node using SSH PuTTY program.

the extra volume is attached

Suppose we wants to share only 5 GiB of space from these 20 GiB. So lets create a partition first.

Step 2 : Create a partition, of 5 GiB

To Go inside the volume, we use “fdisk volume_name” command. In my case, it is /dev/xvdf

Inside the volume
creating a partition of 5 GiB

Here we have choose the option to create a primary partition. After successful creation of the partition, we will write the option : w , to write all the change to the drive.

To check that it has been created or not ? we use the command “fdisk volume_name

partition successfully created

So the partition is successfully created!! But to use this we need to do some more things like

  1. Set the driver for this new partitioned drive.
  2. Format this partition, with specific type of file system(in my case, i am going to use ext4 format type)
  3. Mount it to the directory that we are planning to share to the cluster.

For setting the driver, we use the command “ udevadm settle ” :

Setting the driver

For formatting this partition with ext4 file system type, we use the command “mkfs.ext4 partition_name” (in my case, it is /dev/xvdf1) :

Formatting the partition

❗❗ In my case, it showing this because previously i was using this partition with some folder. But when you will freshly use this to format the partition, it will work you .

Now, lets mount this partition to the directory that we wants to share to the cluster. In my case, the directory is “/dn1” :

The slave node directory

Till now, our partition is not mounted anywhere, i.e.,

Don’t have /dev/xvdf1 block

To mount it, we use the command “mount partition_name directory_name” in my case partition_name : /dev/xvdf1 , directory_name : /dn1

mounted the partition to the directory

Lets check the list of blocks and their structure :

Successfully mounted

As we have successfully created, formatted and mounted our partitioned drive. Now lets move to start the slave node.

Step 3 : Starting the Slave Node

To start the Slave node,we use the command “hadoop-daemon.sh start datanode” :

Started the datanode

Hence, we have successfully started our Slave node. So lets check the status of live nodes in the cluster from the Master Node dashboard :

Master Node Dashboard

We can see that 1 live node is there, this is the one that we have started above. Lets see that how much space it is contributing to the cluster.

5 GiB contribution of this Slave node

Hence, we have successfully achieved our target 🎯 of providing a limited space to the cluster using the concept of Partitioning.

I hope, this integration may help you to create a cluster with (as per the demanding) storage. Any suggestions and queries are highly accepted.🤗

Thank You !! 😇😇

--

--

Anshika Sharma

I am a tech enthusiast, researcher and work for integrations. I love to explore and learn about the new technologies and their right concepts from its core.