Storage of Huge Data : A problem for Industries and its Solution.

Anshika Sharma
6 min readSep 16, 2020
Data Everywhere

Hello everyone,

In this blog, i am going to share my views on Data, its generation, problems due to this huge data and some Industry use cases what they adapt.

Introduction:

We all know that currently most of the Industries are Data oriented. Data is generating everywhere either through Social media or E-Commerce etc. The best example of huge data generation is through google search. We know that Data is very useful for many purpose like Data Analysis , Prediction and much more but is there any way with which we can manage this much of data in an efficient way?? Isn’t a problem of storing this huge data?

So, yes the generation of huge data is became a problem and it is named as Big Data. Now what is Big Data?

“Big Data is refers to the problem of huge amount of data that need to stored , managed in such a way that this data can make sense and can be used for different purposes like Analysis, predictions etc. It is also known as Tsunami of Big Data “

Sub-Problems:

This Big data is not a sole problem it has some sub problems also which lead to this great problem.

What are the Sub-Problems ?

Big data consists of some sub problems (represented by V’s ), that are also known as four V’s of Big Data i.e.,

  1. Volume
  2. Velocity
  3. Veracity
  4. Variety

Lets discuss what each of them represents.

Volume : This means the Scale of the data or simply we can say that how much data . The size of data which we wants to handle or process.

Velocity : This deals with Analysis of streaming data that how rapidly the data is generating or changing.

Veracity : This means Uncertainty of data, that means whatever data we have it need to be secure , Authenticate and valid.

Variety : This deals with different forms of data like Structured , Unstructured and Semi-structured data. Structured data involves Tables , Relational databases etc. Unstructured involves media like Pictures , posts ,videos etc. And Semi-structured data involves JSON, XML type files.

4 V’s of Big Data

Lets understand this problem with a real time use case of IBM that how these Industries handle huge amount of data , its storage and everything.

Case Study (What IBM say on Big data problems?)

This is also termed as the Tsunami of Big Data. We are witnessing a tsunami of huge volume of data of different types and formats that make managing, processing, storing safeguarding and securing, and transporting them a real challenge.

This is how data is increasing from different different sources.The sources are :

  1. Internet Traffic ,
  2. Social Media ,
  3. IoT (Internet of Things) and so on.
Data from different sources

Applications such as video surveillance, smart meters, digital health monitors and a host of other Machine-to-Machine services are creating new network requirements and incremental traffic increases.

In research it is estimated thatthe number of devices connected to IP networks will be three times as high as the global population in 2020. There will be 3.4 networked devices per capita by 2020, up from 2.2 networked devices per capita in 2015. Accelerated in part by the increase in devices and the capabilities of those devices, IP traffic per capita will reach 25 GB per capita by 2020, up from 10 GB per capita in 2015.

Global devices and connection devices

What industry say about this Data and How it is going to be useful ?

Data is the new Oil

“Data is the new oil.” Coined in 2006 by Clive Huby, a British data commercialization entrepreneur. This now famous phrase was embraced by the World Economic Forum in a 2011 report, which considered data to be an economic asset like oil. “Information is the oil of the 21st century, and analytics is the combustion engine”.

What about the Real time Streaming Data ?

Real-time processing of Big Data is called as Streaming Data. The actual meaning of Real time Streaming data when applied to Big Data.

  1. Sub-second Response
  2. Human Comfortable Response Time
  3. Event-Driven
  4. Streaming Data

Data at Rest: e.g., “Oceans of Data.” and the new term “Data Lakes” — the data has already arrived and is stored

Data in Motion: Streaming Data

How to manage this Huge data?

They use the concept of Distributed Storage System using Hadoop technology. Now What is Distributed Storage System(DSS)?

A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

With this distributed Storage system we can resolve the problem of Velocity and volume i.e., When we use the splited storage we need not to worry about the capacity because as we require more storage we will connect more server or data center as per the need. Also the velocity of transferring the data will be much faster than the normal transfer of data. We can think it as

Speed of data Storage (using DSS) =n * Speed of data Storage(to one device)

where , n = no. of storage devices connected in the cluster.

The cluster will be in the Master-Slave Architecture.

Master-Slave Architecture

About Hadoop :

Hadoop is a framework that allows you to first store Big Data in a distributed environment, so that, you can process it parallely.

How Hadoop is helpful to us ?

Hadoop helps us to solve these problems, these problems are :

  1. Storing huge and exponentially growing datasets.
  2. Processing data having complex structure .
  3. Solving the bottleneck of bringing huge amount data to computation unit.
Problems that Hadoop solves

Conclusion

We have discussed about the problem of huge amount of data generated from different sources like Social media , IoT , Medicine Precision etc.

Problem:  Manage the several Petabytes of data which is growing at 40–100% per year under increasing pressure to prevent frauds and complaints to regulators.

How big data analytics can help: 

  1. Fraud detection 
  2. Credit issuance
  3. Risk management 
  4. 360° view of the Customer
Services that Big Data and Data Analytics Provide

Hope this Article will help you all to know some interesting facts that how this huge data became a problem and at the same time a solution to large number of use cases. And how Big Data and Data Analytics can help to resolve the problems due to this huge amount of data.

If you find this post Informative then Please don’t hasitate to give a clap to this post. Also feel free to share .

Keep Sharing , Keep Learning !!!!

Thank You!!!!

--

--

Anshika Sharma

I am a tech enthusiast, researcher and work for integrations. I love to explore and learn about the new technologies and their right concepts from its core.