Big Data : Problem and It’s Solution

Big Data

What is Data?

Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things. Everything we have is Data. Data is present in various forms around us.

Some Facts about Data

  • 1.7MB of data is created every second by every person during 2020.
  • In the last two years alone, the astonishing 90% of the world’s data has
  • 463 exabytes of data will be generated each day by humans as of 2025.
  • 95 million photos and videos are shared every day on Instagram.
  • By the end of 2020, 44 zettabytes will make up the entire digital universe.
  • Every day, 306.4 billion emails are sent, and 5 million Tweets are made.
  • Google gets over 3.5 billion searches daily
  • Whats App users exchange up to 65 billion messages daily.
  • Internet users generate about 2.5 Quintilian bytes of data each day
  • In 2019, there are 2.3 billion active Facebook users, and they generate a lot of data.

What is Big Data?

From the above facts, you must have known how Big the Data is? Everyday we humans create about 2500000000000000000 (2.5 Quntilion)bytes of Data. This would be about about 10 million Blu Ray Disks, if you stack them it would be 4 times the height of Eiffel Tower. Isn’t that Amazing?
But have you ever thought how google still responds in some fraction of seconds? How are you able to login through facebook within seconds?

Have you ever thought how these Tech giants are storing this Big Data and how they are analyzing it?

Big Data Problems

There are mainly three problems associated with Big Data known as 3 V’s of Big Data:

  1. Volume: The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.
  2. Velocity: Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
  3. Variety: Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.

How Big Data Problems are Resolved?

Problem of Volume is Solved using Distributed Storage and Problem of Velocity is Solved using Distributed Computing.

Distributed Storage: Storing data on Various Machines instead of a Single Machine is known as Distributed Storage. Topology used in this is Master Slave Topology. In this there is a Master node known as Name Node and have various slave nodes called data nodes. Data is Shared among these nodes. This concept of Distributed Storage can be implemented by various software, one of them is Hadoop and File System used is HDFS.

Conclusion

Data is Present in huge amount, but it is very useful as well as hard to use it using traditional methods as it is present in vast amount. So we use concept of Distributed Storage to solve Big Data Problems.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why Data Science?

Which career is more assuring: data scientist or software developer?

The proof that your assumptions about traffic are right

Architecting a workable, yet secure data exploration environment on the Google Cloud Platform

ROP Summer 2022 — Weekly Reflection (4)

Mapping the land from space (in the Cloud)

How to make a wordcloud of your blog, programmatically?

Wordcloud of a fleet-Industry blog.

Remember linear regression as Walmart predicting their Spectre 32" TV sales based on ad spend

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Karan Agrawal

Karan Agrawal

More from Medium

How to install Airflow locally using Docker

Airflow Dashboard

How to use spark for churn prediction

Football Match Prediction Using Machine Learning In Real-Time

How to use Sentry-SDK in AWS Glue