Elasticity in the Storage of distributed Hadoop Cluster

Anshika Sharma
6 min readOct 30, 2020

--

Technologies Integration

🤔🤔Have you ever thought that how we can automate the storage size as and when required ?? Yes !! It is possible 🤩. We have already discussed about how to use a storage block, through the concept of partitioning.

Today we will implement such a powerful concept of Elasticity in the storage i.e., Logical Volume Management (LVM). Lets discuss about it with deep insights then we will move towards the implementation.

What is the Problem Statement ?

Suppose we have a requirement of extending the storage capacity of our datanote without loosing the previously stored data. It seems to be normal but it is not !! Because when we mount a new block of storage it will remove the previous format or file system of that directory.

Hint : We will be using the concept of LVM.

Logical Volume Management is the concept of Linux. In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

Complete Architecture that how LVM works

Lets move to the implementation. Right now i have already setup a hadoop cluster, where i have one Master and One Slave node. Here i am going to show how we can dynamically increase the storage size of this Slave node, further you can implement it on as many nodes as you can.

Step 1 : Check the running and configured Hadoop Cluster

Running MasterNode
Running SlaveNode
Configured Hadoop Cluster

Since, we do not require much storage in our MasterNode. So we will move to the Slave node, this time i have attached two external EBS volumes one of 20 GiB and other is 30 GiB.

Step 2 : Check the available volumes.

To check the available hard disk in our data node, we use “fdisk -l” command :

Available Hard disk

Now my plan is to use these highlighted EBS Volumes but not separately, in a combined form i.e., we want a combined capacity of 50 GiB and here we have that is 20 GiB + 30 GiB.

These separate volumes are also known as Physical Volumes(PV). When we combine them and create a single block (logically ) then that logical block will be known as Volume Group(VG).

This Volume Group will be treated as a new device. This will be called a Logical Volume (because it is not real).

Step 3: Create a Volume Group

To create a volume group, we require physical volume. To convert this block devices into the physical volume we use the command “pvcreate device_path

Before using this command, we need to install the tool/software that will implement this concept for us i.e., lvm2.

Command : “yum install lvm2 -y

Successfully installed lvm2

Now, create two physical volumes one of 20 GiB and other for 30 GiB.

Successfully created both the physical volumes

Check whether they have been created or not ?? Using command “pvdisplay

Yeah!! we have both of them

So we have two physical volumes, lets bind them in a single logical block. to create one , we use the command “vgcreate vg_name pv_name

Successfully created a Volume Group

Lets look its status, using command “vgdisplay vg_name

Detailed Info about the volume group

An interesting fact about volume group is that , we can as many volumes as we want.

So we are all set with the volume group. Now we are ready to create logical partitions and to use them. 🤩🤩We can create as many partitions as we want.

Step 4: Create a Logical volume of some specific size (as per the requirement).

To create a logical volume, we use the command

Successfully created a logical volume

To get the information about this logical volume, we use the command “lvdisplay vg_name/lv_name

Logical volume complete info

Note : We can have multiple Volume groups in a system.

Step 5: Using the created Logical Volume by mounting it to the Slave Node storage folder.

In my case, the storage folder is /dn.

Storage directory of Slave Node

To mount this logical block on this storage directory, we use the command “mount block_name directory_name” but before mounting we need to format this first!! to format i am using mkfs.ext4 file System.

Formatting the block storage

Now our volume is ready to be mounted!!

Mounting the device

Check whether it has been mounted or not, we use “lsblk” command

our logical volume block

Quickly, lets check that it has been updated in the cluster or not ?

Updated capacity

Now store some data into the cluster

To put the data, i have used “hadoop fs -put filename /

data in the cluster

Suppose there is a situation where we have completely used the capacity of this storage block and we want some more storage to be added in the same without loosing the previously stored data. Here comes the benefits of extending this storage size.

Step 6: Extend the storage size

To extend the storage, we use the command

Extending the logical volume

Lets check its status

Successfully Extended

Since in the details of the logical volume it is showing that its size is increased to 40 GB from 35 GB. But right now we have 35 GB , a formatted block and 5 GB , an unformat block. It is very much obvious that we cannot allocate the unformat block to store some data in it.

only allocated the formatted part

The actual structure of the logical block will be :

Actual structure of logical volume

Now this is the situation where we cannot use mkfs to format that part, because if we use this, then the complete device will be reformatted and we will loose our data that we have previously installed. So here we use “resize2fs path_of_lv” command.

Successfully resized

Now, if we check the allocation status we will get the updated block size

Successfully allocated the complete block

Lets check the status of hadoop cluster

Yeah!! it is updated

Wait a min 🤔!!we haven’t checked that our previous data is being there or not??

Data in hadoop cluster

🤩🤩yeah!!! we have complete data. You can match the timing of the data that was visited and its availability.

I hope you like this concept, and i believe that this would be helpful to you — to solve the great industry use cases. 😇

💫Keep Sharing, Keep Learning💫

Thank You !!!

--

--

Anshika Sharma
Anshika Sharma

Written by Anshika Sharma

I am a tech enthusiast, researcher and work for integrations. I love to explore and learn about the new technologies and their right concepts from its core.

No responses yet