Partitions Benefiting to Hadoop World
❗❗ Storage capacity is the major aspect while we move to the world of Big Data. We create Hadoop cluster in order to store our huge data in a distributive way !! But Have you ever thought that what if we can control this storage capacity ??🤔
Yeah, its an amazing way of utilizing the storage!! 🤩🤩
Hola Connection 🤗🤗
In this article, I am going to create such a setup where we will be sharing only limited storage to our Hadoop Cluster i.e., through Data Node.
For this setup, i am going to use a Hadoop Cluster having one Masternode, one Slavenode and one client. Although for this demonstration, we require only Master and Slave node. So lets start the setup , Initially i have one Master , Slave and Client node already created.
For setting up a complete working cluster, you may refer my separate article :
Since, we never use the storage of the Master Node to store the data so we doesn’t require any extra space for that !!
Lets move toward Slave Node, To add the concept of partitioning in it we follow some steps :
Step 1 : Create an EBS Volume
Here, i am going to create an EBS volume of 20 GiB in size.
I have added a tag here “externalVolume-data”, i given this name to just to identify it !!
After successful creation, we will attach this volume to the slave node as per the requirement.
After attachment, it will show “in use” :
Lets check that it is there or not ?? Login to the slave node using SSH PuTTY program.
Suppose we wants to share only 5 GiB of space from these 20 GiB. So lets create a partition first.
Step 2 : Create a partition, of 5 GiB
To Go inside the volume, we use “fdisk volume_name” command. In my case, it is /dev/xvdf
Here we have choose the option to create a primary partition. After successful creation of the partition, we will write the option : w , to write all the change to the drive.
To check that it has been created or not ? we use the command “fdisk volume_name”
So the partition is successfully created!! But to use this we need to do some more things like
- Set the driver for this new partitioned drive.
- Format this partition, with specific type of file system(in my case, i am going to use ext4 format type)
- Mount it to the directory that we are planning to share to the cluster.
For setting the driver, we use the command “ udevadm settle ” :
For formatting this partition with ext4 file system type, we use the command “mkfs.ext4 partition_name” (in my case, it is /dev/xvdf1) :
❗❗ In my case, it showing this because previously i was using this partition with some folder. But when you will freshly use this to format the partition, it will work you .
Now, lets mount this partition to the directory that we wants to share to the cluster. In my case, the directory is “/dn1” :
Till now, our partition is not mounted anywhere, i.e.,
To mount it, we use the command “mount partition_name directory_name” in my case partition_name : /dev/xvdf1 , directory_name : /dn1
Lets check the list of blocks and their structure :
As we have successfully created, formatted and mounted our partitioned drive. Now lets move to start the slave node.
Step 3 : Starting the Slave Node
To start the Slave node,we use the command “hadoop-daemon.sh start datanode” :
Hence, we have successfully started our Slave node. So lets check the status of live nodes in the cluster from the Master Node dashboard :
We can see that 1 live node is there, this is the one that we have started above. Lets see that how much space it is contributing to the cluster.
Hence, we have successfully achieved our target 🎯 of providing a limited space to the cluster using the concept of Partitioning.
I hope, this integration may help you to create a cluster with (as per the demanding) storage. Any suggestions and queries are highly accepted.🤗
Thank You !! 😇😇